← İçerik
Yapay Zeka · 8 dk okuma · 22 Nisan 2026

Dataset Distillation Fails Without Hard Labels

Soft labels mask poor dataset quality in distillation methods, making random subsets nearly as effective as curated ones.

Kaynak: arxiv/cs.LG · Priyam Dey, Aditya Sahdev, Sunny Bhati, Konda Reddy Mopuri, R. Venkatesh Babu · orijinali aç ↗ ↗
Paylaş: X LinkedIn

Soft labels during training hide dataset quality differences, making dataset distillation methods perform no better than random sampling.

  • State-of-the-art distillation methods match random baselines when soft labels are used for downstream training.
  • Hard-label evaluation reveals distillation quality; soft labels obscure performance gaps between curated and random data.
  • In soft-label regimes, subset quality has negligible effect on final model performance regardless of size.
  • Only RDED reliably outperforms random baselines on ImageNet-1K under hard-label conditions.
  • CAD-Prune metric identifies samples of optimal difficulty for a given compute budget.
  • CA2D method combines compute-aware pruning to outperform existing distillation approaches on large-scale benchmarks.
  • Coreset literature findings do not transfer to soft-label settings used in modern distillation research.

Sık sorulanlar

  • When soft labels from a teacher model are used during training, the knowledge distillation loss regularizes the model and reduces its sensitivity to which samples are selected. This masks poor sample quality. Hard-label evaluation exposes this: distilled subsets often perform no better than random subsets under hard labels, revealing that the method did not actually select better data.

İlgili