AI · 8 min read · April 21, 2026
Chain-of-Thought Supervision Eliminates Sample Complexity Growth
New theoretical analysis shows intermediate reasoning steps remove dependence on generation length, while end-to-end learning scales unpredictably with sequence depth.
Chain-of-Thought supervision decouples sample complexity from generation length; end-to-end learning exhibits variable scaling.
- — Autoregressive models learn by iterating a next-token generator T times; final token is the output.
- — End-to-End supervision reveals only final outputs; Chain-of-Thought reveals all intermediate tokens.
- — End-to-End sample complexity can scale anywhere from constant to linear with generation length T.
- — Chain-of-Thought supervision makes sample complexity independent of T entirely.
- — Intermediate reasoning access eliminates the penalty of longer generation chains.
- — Analysis resolves open questions about how generation length affects learnability.
- — New combinatorial tools introduced to characterize this taxonomy of scaling behaviors.
- — Result applies to PAC-learning framework for next-token prediction systems.
Frequently asked
- End-to-End supervision provides only the final output token after a model generates T intermediate tokens. Chain-of-Thought supervision reveals all T intermediate tokens produced during generation. This paper shows that access to intermediate tokens eliminates the penalty of longer generation chains on sample complexity, while end-to-end learning may require more data as chains grow.