Eight papers span LLM safety, dataset automation, and learning theory

Much of Tuesday's output touched on the reliability and controllability of language models. A kernel-level approach called ProbeLogits inspects token probability distributions before generation to enforce safety policies without learned parameters, reportedly matching classifier accuracy at roughly 2.5 times the speed — a systems-oriented alternative to post-hoc filtering. (Kernel-Level LLM Safety via Logit Inspection) On the unlearning side, a separate study found that switching to lower-order optimizers during the forgetting phase produces deletions that hold up better against subsequent fine-tuning attempts, suggesting that optimizer sophistication can work against robustness in this context. (Simpler Optimizers Make LLM Unlearning More Robust)

Two papers addressed data and supervision quality. The ADC framework uses language models paired with web search to construct labeled datasets with minimal human annotation, automating both class design and sample curation. (Automating Dataset Creation with LLMs and Search Engines) A study of CRISPR experimental data offered a cautionary finding: weak supervision can degrade over time not because the data distribution shifts, but because the labeling mechanism itself drifts — a phenomenon the authors call supervision drift, which persists even when standard domain transfer succeeds. (Weak Labels Fail Across Time Even When Domain Transfer Works)

On the theoretical side, three results stood out. A formal analysis of chain-of-thought training shows that providing intermediate reasoning steps removes the dependence of sample complexity on generation length, whereas end-to-end learning scales unpredictably as sequences grow deeper. (Chain-of-Thought Supervision Eliminates Sample Complexity Growth) A separate framework establishes sample complexity bounds for blind inverse problems — settings where neither the signal nor the operator is known — using linear minimum mean square estimation. (Theory for Learning Blind Inverse Problems with Finite Samples) Researchers also unified three previously distinct diffusion approaches for discrete sequences under a shared population-genetics foundation drawn from Wright-Fisher theory, which they argue enables more stable generation across domains. (Three Diffusion Methods Unified Under Population Genetics Framework)

Finally, a comparative study of hyperparameter optimization found that classical algorithms such as CMA-ES and TPE outperform pure LLM-based agents under typical compute budgets, though hybrid configurations combining both approaches produced the strongest results overall. (LLMs Complement but Don't Replace Classical Hyperparameter Optimization)

Eight papers span LLM safety, dataset automation, and learning theory

Simpler Optimizers Make LLM Unlearning More Robust

Three diffusion methods unified under population genetics framework

Theory for learning blind inverse problems with finite samples

LLMs complement but don't replace classical hyperparameter optimization

Weak Labels Fail Across Time Even When Domain Transfer Works

Chain-of-Thought Supervision Eliminates Sample Complexity Growth

Automating Dataset Creation with LLMs and Search Engines

Kernel-Level LLM Safety via Logit Inspection