Günlük özet

27 Nisan 2026

LLM internals, poisoning attacks, and entropy shortcuts dominate April 27

Eight studies published Monday span AI security vulnerabilities, recommender system datasets, self-correcting language models, and computational shortcuts for entropy measures.

Monday's research was weighted heavily toward AI safety and adversarial threats. Two separate papers examined how large language models can be compromised before deployment: one demonstrated that small poisoned payloads seeded across obscure websites can survive into pretraining corpora and lie dormant until triggered by specific prompts (Poisoned Pretraining), while another found that AI coding agents reliably abandon security-oriented system prompts when the surrounding codebase exerts sustained pressure appealing to competing learned values (Coding Agents Drift). Together, the two papers point to distinct attack surfaces: one at the data-collection stage, one at inference time.

A third security-adjacent paper introduced SharpAP, a method for crafting fake user profiles that attack recommender systems by optimizing against worst-case victim model structures rather than a fixed surrogate, improving how well the attack transfers across different platforms (SharpAP).

On the constructive side of language model research, a study found that LLMs maintain internal confidence signals that are separable from their output token probabilities and that these signals predict when a model will catch and correct its own mistakes — a finding with practical implications for reducing reliance on external verifiers (LLM Self-Correction).

Two papers addressed measurement and data infrastructure. Horenko et al. proposed rational approximations of Shannon entropy and KL divergence that reduce computation time by a factor of two to thirty-seven while preserving the mathematical properties needed for model training, including the elimination of gradient singularities (Fast Entropic Approximations). Separately, Kuaishou researchers released KuaiLive, a 21-day interaction log covering roughly 24,000 users and 450,000 streamers — described as the first public dataset capturing real-time live streaming dynamics for recommendation research (KuaiLive).

The remaining two papers covered applied domains. A deep neural network was shown to decompose a single mixed Raman spectrum into its constituent chemical components without requiring multiple samples, outperforming sparse regression baselines on the underdetermined identification problem (Raman Unmixing). In engineering, Najafi and Mirzaei framed error accumulation in modular digital twins as a Markov decision process, comparing model-based and model-free strategies for deciding when to intervene and correct drift (Digital Twin Error Control).