Neural Distinguishers Expire on Carry Composition

Abstract

A hand-built carry-aware score measures a one-round advantage cliff on reduced SHA-256: it predicts a downstream output byte one adder layer away and loses all reach one round deeper. The natural objection is that a learned model might find local structure a human feature misses, as Gohr’s neural distinguishers did for round-reduced Speck. We translate that methodology to the mining read point. A residual network receives, as features, everything computed through an interior round (the state words, their carry-free derivations, the round’s modular sums, and the schedule words, in bit, value, and Fourier encodings) and is free to learn any function of them. The network exceeds the hand-built score one adder layer downstream ( $0.998$ versus $0.88$ retained advantage) and then collapses to the noise floor one round deeper, across independent stems, a shifted read point, and a $90\times$ capacity, $64\times$ data, and $8\times$ training-time scaling. On pure $k$ -operand modular sums, with no SHA structure present, the same network learns the top byte for $k \le 3$ and fails for $k \ge 4$ . We show the wall is not an artifact of finite feature precision (a seed-paired float32-versus-float64 comparison is null in both arms) and that it is the learner’s reach rather than an intrinsic boundary: the carry chain’s spectral gap is $\tfrac{1}{2}$ for every $k$ , so nothing in the carry’s mixing singles out $k = 4$ . We also identify a distinct second failure mode, feature isolation, and show it does not affect the main result.