Do correct Chain-of-Thought traces always lead to correct final answers?

No. In the study, correct intermediate reasoning steps led to correct final answers only 28% of the time. This suggests that LLMs may rely on patterns or memorization rather than genuinely following the reasoning steps. Trace correctness and final accuracy are weakly coupled.

Why do verbose reasoning traces perform well for model training but confuse users?

Verbose traces (like DeepSeek R1 outputs) contain dense, technical language optimized for statistical learning, not human comprehension. Models extract patterns from this verbosity, but users struggle to parse it. The paper found verbose traces scored 3.39/5 on interpretability while achieving the best model performance, revealing a fundamental trade-off.

Should I use the same trace format for training models and explaining to users?

No. The research suggests decoupling these objectives. Use verbose, detailed traces for fine-tuning models (they improve accuracy), but design separate, simplified explanations for end-users (they improve comprehension). A single trace format cannot optimize both simultaneously.

← İçerik

Yapay Zeka · 4 dk okuma · 20 Nisan 2026

Interpretable Traces Don't Guarantee Better LLM Reasoning

Research shows Chain-of-Thought traces improve model performance but confuse users, and correctness of intermediate steps barely predicts final accuracy.

Kaynak: arxiv/cs.AI · Siddhant Bhambri, Upasana Biswas, Subbarao Kambhampati · orijinali aç ↗ ↗

Paylaş: X LinkedIn

Correct reasoning traces don't reliably improve LLM accuracy, and verbose traces confuse users despite boosting model performance.

— Correct intermediate reasoning steps predicted correct final answers only 28% of the time in controlled QA experiments.
— Incorrect traces failed to consistently degrade model accuracy, suggesting trace semantics matter less than assumed.
— Verbose DeepSeek R1 traces yielded best model performance but scored lowest on user interpretability (3.39/5).
— Decomposed, human-readable traces were rated more interpretable but did not match verbose trace performance gains.
— High cognitive load (4.59/5) accompanied verbose traces despite their training effectiveness.
— Current practice conflates model supervision objectives with end-user-facing explanation design.
— Trace correctness and interpretability operate as separate, sometimes opposing, optimization targets.
— Fine-tuning datasets with verifiably correct versus incorrect traces revealed the disconnect empirically.

Sık sorulanlar

No. In the study, correct intermediate reasoning steps led to correct final answers only 28% of the time. This suggests that LLMs may rely on patterns or memorization rather than genuinely following the reasoning steps. Trace correctness and final accuracy are weakly coupled.

#llm #reasoning #interpretability #distillation #chainofthought

Interpretable Traces Don't Guarantee Better LLM Reasoning

Sık sorulanlar

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs