- arxiv/cs.AI · 8 dk
AI agents reproduce social media form without generating social function
Analysis of 1.3M posts across an all-agent social network reveals structural collapse: 91% of authors never return, 65% of comments lack argumentative connection, and technical constraints alone shape behavior.
17 Nisan 2026 Oku → → - arxiv/cs.AI · 4 dk
MERRIN: Benchmark for Multimodal Search in Noisy Web Data
New benchmark reveals AI agents struggle with real-world web search, achieving only 22% accuracy when retrieving and reasoning across mixed media sources.
17 Nisan 2026 Oku → → - arxiv/cs.AI · 4 dk
Creo: Staged Image Generation Restores User Control
Multi-stage text-to-image system scaffolds creation from sketch to final output, letting users lock decisions and avoid premature commitment.
17 Nisan 2026 Oku → → - arxiv/cs.AI · 8 dk
Token Importance in On-Policy Distillation: Entropy and Disagreement
Research identifies two regions of high-value tokens in knowledge distillation: high-entropy positions and low-entropy positions where student and teacher disagree, enabling 50–80% token reduction.
17 Nisan 2026 Oku → → - arxiv/cs.AI · 8 dk
Formal framework for multi-agent AI system safety and coordination
Researchers propose unified semantic models and 30 temporal-logic properties to verify behavior, detect coordination failures, and prevent vulnerabilities in agentic AI systems.
17 Nisan 2026 Oku → → - arxiv/cs.AI · 4 dk
LLM scripting brings petascale climate visualization to laptops
Researchers demonstrate a framework that lets domain scientists animate massive NASA climate datasets on commodity hardware using natural-language prompts instead of specialized graphics expertise.
17 Nisan 2026 Oku → → - arxiv/cs.AI · 8 dk
Small Models Match Large Ones via Inference Scaffolding
McClendon et al. show that role-based prompt structuring at inference time doubles small-model performance on complex tasks without retraining.
17 Nisan 2026 Oku → → - arxiv/cs.AI · 8 dk
Automating Feature Preprocessing Beats Manual Tuning for Tabular ML
Study of 15 search algorithms on 45 datasets reveals evolution and random search outperform complex surrogate models for automated feature pipeline construction.
17 Nisan 2026 Oku → → - arxiv/cs.AI · 8 dk
LLMs show human-like trust bias toward people, with demographic blind spots
Study of 43,200 experiments reveals language models develop trust patterns similar to humans, including susceptibility to age, religion, and gender bias in financial decisions.
17 Nisan 2026 Oku → → - arxiv/cs.AI · 4 dk
Hybrid segmentation model improves fundus lesion detection accuracy
Combining classical clustering methods with deep learning achieves 89.7% accuracy on high-resolution eye images, outperforming standard U-Net approaches.
17 Nisan 2026 Oku → → - arxiv/cs.AI · 5 dk
Verifiable model unlearning on edge devices without retraining
ZK-APEX combines sparse masking and zero-knowledge proofs to let providers verify that personalized models forget targeted data while preserving local utility.
17 Nisan 2026 Oku → → - arxiv/cs.AI · 6 dk
Measuring Where Chatbots Beat Humans on Tests
Researchers apply psychometric methods to identify test items where LLMs systematically outperform human learners, revealing assessment vulnerabilities.
17 Nisan 2026 Oku → → - arxiv/cs.AI · 8 dk
LLMs hit formal reasoning ceiling; Chomsky Hierarchy reveals efficiency gap
New benchmark shows large language models struggle with structured complexity tasks and require prohibitive compute to achieve reliability in formal reasoning.
17 Nisan 2026 Oku → → - arxiv/cs.AI · 8 dk
Vision-Language Models Fail on Dense Visual Grids
A new benchmark reveals VLMs collapse sharply on simple grid-reading tasks, exposing a gap between visual encoding and language output called Digital Agnosia.
17 Nisan 2026 Oku → → - arxiv/cs.AI · 8 dk
Modular Neural Networks Learn Three-Valued Logic Without Symbolic Solvers
THEIA demonstrates that dedicated domain engines enable neural networks to master Kleene three-valued logic and generalize compositionally to sequences 100x longer than training.
17 Nisan 2026 Oku → → - arxiv/cs.LG · 3 dk
Framework uses AI outputs as features, not proxies, for labeled data
Generative Augmented Inference treats LLM predictions as informative signals rather than direct substitutes, reducing human labeling needs by 75–90% across operations tasks.
17 Nisan 2026 Oku → → - arxiv/cs.LG · 8 dk
Foundation Models vs. Task-Specific ML in Electricity Price Forecasting
Time series foundation models outperform traditional deep learning on probabilistic forecasts, but well-tuned conventional models remain competitive at lower computational cost.
17 Nisan 2026 Oku → → - arxiv/cs.LG · 8 dk
LLM Panels Match Expert Clinicians in Medical Diagnosis Scoring
A study of three frontier AI models scoring real hospital cases shows calibrated LLM juries can reliably replace human expert panels for medical AI evaluation.
17 Nisan 2026 Oku → → - arxiv/cs.LG · 5 dk
Rejection-Gated Policy Optimization replaces importance weighting with learned gates
A new reinforcement learning method selects trustworthy samples via differentiable gates instead of reweighting all samples, reducing variance and improving RLHF alignment.
17 Nisan 2026 Oku → → - arxiv/cs.LG · 8 dk
INT4 Quantization Fails After FP32 Convergence in Predictable Phases
Post-training quantization assumes converged models are ready to compress, but INT4 quantization collapses in a three-phase pattern tied to weight updates, not learning rate decay.
17 Nisan 2026 Oku → → - arxiv/cs.LG · 8 dk
Distilling Transformers into Mamba via Linearized Attention
A two-stage knowledge transfer method preserves Transformer performance in State Space Models by routing through linearized attention as an intermediate step.
17 Nisan 2026 Oku → → - arxiv/cs.LG · 8 dk
Three-Phase Transformer: Structural Prior for Decoder Efficiency
A residual-stream architecture using cyclic channel partitioning and phase-aligned rotations achieves 7% perplexity gains with minimal parameter overhead.
17 Nisan 2026 Oku → → - arxiv/cs.LG · 6 dk
Speech Models Fail Safety Tests That Text Passes
VoxSafeBench reveals speech language models recognize social norms in text but ignore them when cues arrive through voice, speaker identity, or environment.
17 Nisan 2026 Oku → → - arxiv/cs.LG · 6 dk
Speech Models Fail Safety Tests That Text Models Pass
A new benchmark reveals that speech language models drop safety, fairness, and privacy protections when cues arrive as audio rather than text.
17 Nisan 2026 Oku → →