Tag

#interpretability

6 insights with this tag.

AI · arxiv/cs.LG · 8 min

Model Architecture Controls Whether Errors Stay Hidden

Transformer design determines if internal decision signals remain observable after training, independent of output confidence metrics.

April 29, 2026 Read → →
AI · arxiv/cs.LG · 4 min

LLMs use hidden confidence signals to detect and fix their own errors

Research shows large language models maintain a second-order evaluative signal that predicts error detection and self-correction beyond what their output probabilities reveal.

April 27, 2026 Read → →
AI · arxiv/cs.AI · 4 min

Cross-Entropy Loss Drives Neural Probe Performance, Not Architecture

Pre-registered study shows cross-entropy training inflates logit norms 15x, accounting for most K-way energy probe gains over softmax baselines.

April 24, 2026 Read → →
AI · arxiv/cs.LG · 8 min

Simple graph models match deep learning for molecular prediction

Classical topological indices enhanced with regularization and ensemble methods outperform neural networks on molecular property benchmarks without GPU requirements.

April 23, 2026 Read → →
AI · arxiv/cs.LG · 8 min

Concept Bottleneck Models Hit Hard Ceiling in Dermoscopy Data

Rough-set analysis reveals 16% of concept profiles in Derm7pt are internally inconsistent, capping model accuracy at 92% regardless of architecture.

April 22, 2026 Read → →
AI · arxiv/cs.AI · 4 min

Interpretable Traces Don't Guarantee Better LLM Reasoning

Research shows Chain-of-Thought traces improve model performance but confuse users, and correctness of intermediate steps barely predicts final accuracy.

April 20, 2026 Read → →

Model Architecture Controls Whether Errors Stay Hidden

LLMs use hidden confidence signals to detect and fix their own errors

Cross-Entropy Loss Drives Neural Probe Performance, Not Architecture

Simple graph models match deep learning for molecular prediction

Concept Bottleneck Models Hit Hard Ceiling in Dermoscopy Data

Interpretable Traces Don't Guarantee Better LLM Reasoning