What is the difference between End-to-End and Chain-of-Thought supervision?

End-to-End supervision provides only the final output token after a model generates T intermediate tokens. Chain-of-Thought supervision reveals all T intermediate tokens produced during generation. This paper shows that access to intermediate tokens eliminates the penalty of longer generation chains on sample complexity, while end-to-end learning may require more data as chains grow.

Does longer reasoning always require more training data?

Not necessarily. This paper proves that if you train with Chain-of-Thought supervision (intermediate steps visible), sample complexity stays constant regardless of generation length T. However, with End-to-End supervision (only final output visible), sample complexity can scale anywhere from constant to linear with T, depending on the problem structure.

Why does intermediate supervision help autoregressive learning?

Intermediate supervision provides direct feedback on each step of the generation process, allowing the model to learn the next-token predictor more efficiently. Without it, the learner must infer all intermediate steps from the final output alone, which becomes exponentially harder as the chain grows longer. This is formalized in PAC-learning theory as a reduction in the hypothesis class complexity.

← Content

AI · 8 min read · April 21, 2026

Chain-of-Thought Supervision Eliminates Sample Complexity Growth

New theoretical analysis shows intermediate reasoning steps remove dependence on generation length, while end-to-end learning scales unpredictably with sequence depth.

Source: arxiv/cs.LG · Steve Hanneke, Idan Mehalel, Shay Moran · open original ↗ ↗

Share: X LinkedIn

Chain-of-Thought supervision decouples sample complexity from generation length; end-to-end learning exhibits variable scaling.

— Autoregressive models learn by iterating a next-token generator T times; final token is the output.
— End-to-End supervision reveals only final outputs; Chain-of-Thought reveals all intermediate tokens.
— End-to-End sample complexity can scale anywhere from constant to linear with generation length T.
— Chain-of-Thought supervision makes sample complexity independent of T entirely.
— Intermediate reasoning access eliminates the penalty of longer generation chains.
— Analysis resolves open questions about how generation length affects learnability.
— New combinatorial tools introduced to characterize this taxonomy of scaling behaviors.
— Result applies to PAC-learning framework for next-token prediction systems.

Frequently asked

End-to-End supervision provides only the final output token after a model generates T intermediate tokens. Chain-of-Thought supervision reveals all T intermediate tokens produced during generation. This paper shows that access to intermediate tokens eliminates the penalty of longer generation chains on sample complexity, while end-to-end learning may require more data as chains grow.

#autoregressive #sample-complexity #chain-of-thought #pac-learning #reasoning

Chain-of-Thought Supervision Eliminates Sample Complexity Growth

Frequently asked

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs