Why do dataset distillation methods fail to beat random sampling?

When soft labels from a teacher model are used during training, the knowledge distillation loss regularizes the model and reduces its sensitivity to which samples are selected. This masks poor sample quality. Hard-label evaluation exposes this: distilled subsets often perform no better than random subsets under hard labels, revealing that the method did not actually select better data.

What is the difference between hard labels and soft labels in distillation?

Hard labels are discrete class assignments (e.g., 'cat' or 'dog'). Soft labels are probability distributions from a teacher model (e.g., 0.9 'cat', 0.1 'dog'). Soft labels provide richer training signals but also smooth over sample quality differences, making it harder to detect whether a distillation method truly selected better data.

How does CAD-Prune improve dataset distillation?

CAD-Prune is a compute-aware pruning metric that selects samples based on their difficulty relative to a given compute budget. Instead of assuming all high-quality samples are equally useful, it identifies samples that are optimally challenging for the available compute. This alignment between sample difficulty and compute budget improves final model performance compared to methods that ignore computational constraints.

← İçerik

Yapay Zeka · 8 dk okuma · 22 Nisan 2026

Dataset Distillation Fails Without Hard Labels

Soft labels mask poor dataset quality in distillation methods, making random subsets nearly as effective as curated ones.

Kaynak: arxiv/cs.LG · Priyam Dey, Aditya Sahdev, Sunny Bhati, Konda Reddy Mopuri, R. Venkatesh Babu · orijinali aç ↗ ↗

Paylaş: X LinkedIn

Soft labels during training hide dataset quality differences, making dataset distillation methods perform no better than random sampling.

— State-of-the-art distillation methods match random baselines when soft labels are used for downstream training.
— Hard-label evaluation reveals distillation quality; soft labels obscure performance gaps between curated and random data.
— In soft-label regimes, subset quality has negligible effect on final model performance regardless of size.
— Only RDED reliably outperforms random baselines on ImageNet-1K under hard-label conditions.
— CAD-Prune metric identifies samples of optimal difficulty for a given compute budget.
— CA2D method combines compute-aware pruning to outperform existing distillation approaches on large-scale benchmarks.
— Coreset literature findings do not transfer to soft-label settings used in modern distillation research.

Sık sorulanlar

When soft labels from a teacher model are used during training, the knowledge distillation loss regularizes the model and reduces its sensitivity to which samples are selected. This masks poor sample quality. Hard-label evaluation exposes this: distilled subsets often perform no better than random subsets under hard labels, revealing that the method did not actually select better data.

#distillation #datasets #coresets #labels #efficiency

Dataset Distillation Fails Without Hard Labels

Sık sorulanlar

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs