- AI · arxiv/cs.AI · 5 min
Transformers learn graph connectivity selectively, not universally
New research shows transformers can infer transitive relations on grid-structured graphs but fail on fragmented ones, with scaling helping only certain architectures.
April 23, 2026 Read → → - AI · arxiv/cs.LG · 8 min
Chain-of-Thought Supervision Eliminates Sample Complexity Growth
New theoretical analysis shows intermediate reasoning steps remove dependence on generation length, while end-to-end learning scales unpredictably with sequence depth.
April 21, 2026 Read → → - AI · arxiv/cs.AI · 4 min
Interpretable Traces Don't Guarantee Better LLM Reasoning
Research shows Chain-of-Thought traces improve model performance but confuse users, and correctness of intermediate steps barely predicts final accuracy.
April 20, 2026 Read → → - AI · arxiv/cs.AI · 4 min
MERRIN: Benchmark for Multimodal Search in Noisy Web Data
New benchmark reveals AI agents struggle with real-world web search, achieving only 22% accuracy when retrieving and reasoning across mixed media sources.
April 17, 2026 Read → → - AI · arxiv/cs.AI · 8 min
LLMs hit formal reasoning ceiling; Chomsky Hierarchy reveals efficiency gap
New benchmark shows large language models struggle with structured complexity tasks and require prohibitive compute to achieve reliability in formal reasoning.
April 17, 2026 Read → →