Why does INT4 quantization fail after a model converges in FP32?

Post-convergence weight updates shift the weight distribution in ways that exceed INT4's 16-level quantization grid resolution. The divergence is not caused by learning rate decay magnitude alone, but by the specific pattern of weight changes after FP32 perplexity stops improving. INT8's finer grid (256 levels) remains robust, suggesting the failure is tied to INT4's coarseness.

How can I prevent INT4 quantization collapse in my model?

Use INT8 quantization if memory permits, as it is immune to the collapse. If INT4 is required, apply an Oscillatory Lock-In schedule with amplitude-calibrated cool phases during the plateau phase. Monitor INT4 gap via a calibration-free probe on every checkpoint after convergence begins. Select a checkpoint from the meta-stable plateau phase rather than after divergence onset.

Does this problem affect all language models or just Pythia-160m?

The study audits only Pythia-160m (160 million parameters). The three-phase pattern is reproducible across all 154 public checkpoints of that model, but generalization to larger models, different architectures, or other domains (vision, speech) is not yet established. Practitioners should test the pattern on their own models before assuming it applies universally.

← İçerik

Yapay Zeka · 8 dk okuma · 17 Nisan 2026

INT4 Quantization Fails After FP32 Convergence in Predictable Phases

Post-training quantization assumes converged models are ready to compress, but INT4 quantization collapses in a three-phase pattern tied to weight updates, not learning rate decay.

Kaynak: arxiv/cs.LG · Marcus Armstrong · orijinali aç ↗ ↗

Paylaş: X LinkedIn

INT4 quantization fails after FP32 convergence in three phases: improvement, plateau, then explosive divergence caused by post-convergence weight updates.

— Three-phase divergence: rapid learning, meta-stable plateau (~70k steps), explosive INT4 gap growth (11% to 517%).
— Divergence onset correlates with FP32 perplexity convergence, not learning rate decay schedule.
— INT8 quantization remains robust across all phases; failure is specific to INT4's 16-level grid coarseness.
— Weight outlier accumulation ruled out via kurtosis measurement; mechanism remains in weight distribution shift.
— Oscillatory Lock-In schedule reduces INT4 gap by 2.2 percentage points; SGDR accelerates divergence uniformly.
— Study audits all 154 Pythia-160m public checkpoints with calibration-free per-group INT4 probe.
— Post-convergence weight updates, not decay magnitude alone, are the proximate cause of quantization collapse.
— Schedule amplitude calibration determines whether perturbation helps or hurts quantization robustness.

Sık sorulanlar

Post-convergence weight updates shift the weight distribution in ways that exceed INT4's 16-level quantization grid resolution. The divergence is not caused by learning rate decay magnitude alone, but by the specific pattern of weight changes after FP32 perplexity stops improving. INT8's finer grid (256 levels) remains robust, suggesting the failure is tied to INT4's coarseness.

#quantization #compression #training #int4 #convergence #ptq

INT4 Quantization Fails After FP32 Convergence in Predictable Phases

Sık sorulanlar

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs