AI · 8 min read · April 24, 2026
Supervised Learning Has Built-In Geometric Blindness
Mathematical proof shows empirical risk minimization must preserve sensitivity to label-correlated but test-irrelevant features—a structural constraint, not a training bug.
Supervised learning mathematically requires encoders to retain sensitivity to training-label correlations that don't generalize, creating an unavoidable geometric constraint.
- — ERM imposes necessary Jacobian sensitivity in directions correlated with labels but irrelevant at test time.
- — This constraint unifies four separate empirical phenomena: non-robust features, texture bias, corruption fragility, robustness-accuracy tradeoff.
- — Trajectory Deviation Index (TDI) directly measures this blind spot; standard metrics like Frobenius norm miss it.
- — PGD adversarial training achieves high Jacobian magnitude but poor clean-input geometry (TDI 1.336 vs PMH 0.904).
- — Blind spot worsens in larger language models (ratio 0.860→0.742 from 66M to 340M parameters).
- — Task-specific ERM fine-tuning amplifies the blind spot by 54%; PMH repairs it 11x with one Gaussian-form training term.
- — Defect appears at foundation-model scale across vision, NLP, and multimodal architectures (CLIP, DINO, SAM, ViT-B/16).
- — Proposition 5 proves the repair term is the unique perturbation law that uniformly penalizes encoder Jacobian.
Frequently asked
- It is a mathematical necessity of empirical risk minimization: any encoder trained to minimize supervised loss must retain non-zero sensitivity (Jacobian) in directions that correlate with training labels but are irrelevant at test time. This is not a bug in current methods but a structural property of the supervised objective itself, proven in Theorem 1.