What is supervision drift and how does it differ from covariate shift?

Supervision drift occurs when the relationship between features and labels—the labeling mechanism itself—changes across contexts, such as over time. Covariate shift refers only to changes in the feature distribution P(x). A model can handle covariate shift but fail under supervision drift because the target function P(y|x,c) is no longer stable, making historical labels unreliable guides for new data.

Why did the CRISPR models perform well in-domain but fail over time?

In-domain performance was strong because the labeling mechanism (inferring guide efficacy from RNA-seq responses) was consistent within a cell line and timepoint. However, the same mechanism drifted over time—the relationship between RNA-seq features and true guide efficacy changed—causing temporal transfer to collapse. Feature-importance analysis confirmed the drift was in the supervision source, not model capacity.

How can I detect supervision drift before my model fails in production?

Monitor feature-importance stability over time. Compute feature importances for models trained on non-overlapping time windows and measure their correlation (e.g., Spearman rank). A sharp drop in correlation signals supervision drift. Additionally, audit your labeling mechanism regularly: if the process or data source feeding labels has changed, investigate whether the label-feature relationship has shifted.

← Content

AI · 4 min read · April 21, 2026

Weak Labels Fail Across Time Even When Domain Transfer Works

A study of CRISPR experiments reveals supervision drift—where the labeling mechanism itself shifts—causes model collapse in temporal transfer despite strong in-domain performance.

Source: arxiv/cs.LG · Mehrdad Shoeibi, Elias Hossain, Ivan Garibay, Niloofar Yousefi · open original ↗ ↗

Share: X LinkedIn

Weak supervision works within a domain but fails over time because the labeling mechanism itself drifts, not just the data distribution.

— Supervision drift occurs when P(y|x,c) changes across contexts, distinct from standard covariate shift.
— CRISPR-Cas13d transcriptomics benchmark shows in-domain weak-label learning achieves R² ≈ 0.36, Spearman ρ ≈ 0.44.
— Cross-cell-line transfer partially succeeds (ρ ≈ 0.40) but temporal transfer collapses (R² = −0.15 to −0.32).
— Feature-label associations remain stable across cell lines but shift sharply over time.
— Feature stability diagnostics can flag non-transferability before deployment without retraining.
— Strong in-domain metrics mask downstream failure risk under temporal distribution shift.
— Externally recomputed labels and shift-score analysis confirm supervision drift, not model capacity limits.

Frequently asked

Supervision drift occurs when the relationship between features and labels—the labeling mechanism itself—changes across contexts, such as over time. Covariate shift refers only to changes in the feature distribution P(x). A model can handle covariate shift but fail under supervision drift because the target function P(y|x,c) is no longer stable, making historical labels unreliable guides for new data.

#weaklabels #distributiondrift #supervision #robustness #transfer

Weak Labels Fail Across Time Even When Domain Transfer Works

Frequently asked

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs