Yapay Zeka · 4 dk okuma · 20 Nisan 2026
Interpretable Traces Don't Guarantee Better LLM Reasoning
Research shows Chain-of-Thought traces improve model performance but confuse users, and correctness of intermediate steps barely predicts final accuracy.
Correct reasoning traces don't reliably improve LLM accuracy, and verbose traces confuse users despite boosting model performance.
- — Correct intermediate reasoning steps predicted correct final answers only 28% of the time in controlled QA experiments.
- — Incorrect traces failed to consistently degrade model accuracy, suggesting trace semantics matter less than assumed.
- — Verbose DeepSeek R1 traces yielded best model performance but scored lowest on user interpretability (3.39/5).
- — Decomposed, human-readable traces were rated more interpretable but did not match verbose trace performance gains.
- — High cognitive load (4.59/5) accompanied verbose traces despite their training effectiveness.
- — Current practice conflates model supervision objectives with end-user-facing explanation design.
- — Trace correctness and interpretability operate as separate, sometimes opposing, optimization targets.
- — Fine-tuning datasets with verifiably correct versus incorrect traces revealed the disconnect empirically.
Sık sorulanlar
- No. In the study, correct intermediate reasoning steps led to correct final answers only 28% of the time. This suggests that LLMs may rely on patterns or memorization rather than genuinely following the reasoning steps. Trace correctness and final accuracy are weakly coupled.