AI · 8 min read · April 27, 2026
Poisoned Pretraining: Hidden Attacks Embedded in LLM Training Data
Researchers demonstrate how adversaries can plant dormant malicious logic in large language models by seeding poisoned content across obscure websites, evading detection until triggered.
Source: arxiv/cs.AI · Harsh Kumar, Rahul Maity, Tanmay Joshi, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das · open original ↗ ↗
Adversaries can inject tiny poisoned payloads into web-crawled pretraining data, creating latent vulnerabilities that activate only when triggered by specific prompts.
- — Stealth Pretraining Seeding distributes small malicious content across obscure websites exposed to web crawlers.
- — Individual payloads remain small, diffuse, and superficially harmless to evade dataset filtering.
- — Poisoned logic lies dormant during standard evaluation but activates via precise alphanumeric triggers.
- — PermaFrost-Attack framework uses geometric diagnostics to detect latent model behavior and vulnerabilities.
- — Attack succeeds across multiple model families and scales while often bypassing alignment defenses.
- — Threat exploits the dependency of LLMs on web-scale pretraining from sources like Common Crawl.
- — Standard safety evaluations miss dormant threats because they do not probe for hidden conditional logic.
Frequently asked
- Standard poisoning injects obvious malicious examples into training data. SPS distributes tiny, benign-looking payloads across many obscure websites, relying on web crawlers to absorb them into pretraining corpora. Each payload is individually undetectable, but collectively they embed dormant logic that activates only when triggered by specific prompts. This makes the attack harder to catch during dataset construction.