← Content
AI · 8 min read · April 21, 2026

Chain-of-Thought Supervision Eliminates Sample Complexity Growth

New theoretical analysis shows intermediate reasoning steps remove dependence on generation length, while end-to-end learning scales unpredictably with sequence depth.

Source: arxiv/cs.LG · Steve Hanneke, Idan Mehalel, Shay Moran · open original ↗ ↗
Share: X LinkedIn

Chain-of-Thought supervision decouples sample complexity from generation length; end-to-end learning exhibits variable scaling.

  • Autoregressive models learn by iterating a next-token generator T times; final token is the output.
  • End-to-End supervision reveals only final outputs; Chain-of-Thought reveals all intermediate tokens.
  • End-to-End sample complexity can scale anywhere from constant to linear with generation length T.
  • Chain-of-Thought supervision makes sample complexity independent of T entirely.
  • Intermediate reasoning access eliminates the penalty of longer generation chains.
  • Analysis resolves open questions about how generation length affects learnability.
  • New combinatorial tools introduced to characterize this taxonomy of scaling behaviors.
  • Result applies to PAC-learning framework for next-token prediction systems.

Frequently asked

  • End-to-End supervision provides only the final output token after a model generates T intermediate tokens. Chain-of-Thought supervision reveals all T intermediate tokens produced during generation. This paper shows that access to intermediate tokens eliminates the penalty of longer generation chains on sample complexity, while end-to-end learning may require more data as chains grow.

Related