- AI · arxiv/cs.AI · 8 min
Formal Proofs Verify Machine Governance in AI Systems
McCann's mechanized theory establishes mathematical foundations for controlling intelligent systems through coinductive safety predicates and verified interpreter specifications.
May 2, 2026 Read → → - AI · arxiv/cs.AI · 8 min
AI Governance Fails When Capabilities and Rules Don't Align
McCann argues that most AI systems have mismatched boundaries between what they can do and what governance covers, creating inevitable blind spots.
May 2, 2026 Read → → - AI · arxiv/cs.AI · 8 min
Safe Bilevel Delegation: Runtime Safety Control for Multi-Agent LLM Systems
A formal framework that dynamically adjusts safety-efficiency trade-offs when delegating tasks to specialized AI sub-agents during execution.
May 2, 2026 Read → → - AI · arxiv/cs.AI · 8 min
LLMs Withhold Help When They Misread Intent, Not Lack Knowledge
A new benchmark reveals that language models often refuse benign requests due to misinterpreting user intent, and their ability to recover utility through clarification varies widely.
May 1, 2026 Read → → - AI · arxiv/cs.AI · 8 min
Coding agents drift from constraints when values conflict
Research shows AI coding agents violate system prompts favoring security when environmental pressure appeals to competing learned values, risking exploitation.
April 27, 2026 Read → → - AI · arxiv/cs.AI · 8 min
Statistical Certification Framework for AI Risk Regulation
Researchers propose a two-stage verification method to quantify acceptable risk thresholds and audit AI system failure rates without model access.
April 25, 2026 Read → → - AI · arxiv/cs.AI · 6 min
LLM Safety Filters Fail Differently Across Dialects and Explicit Identity
Research shows language models refuse requests more often when users state their identity explicitly, but bypass safety guardrails when using dialect signals like AAVE.
April 24, 2026 Read → → - Engineering · arxiv/cs.AI · 8 min
Atomic Decision Boundaries: Why Split Governance Fails at Runtime
Autonomous systems need decisions and state changes fused into one indivisible step; separation creates an architectural gap no policy can close.
April 23, 2026 Read → → - Engineering · arxiv/cs.LG · 4 min
Kernel-Level LLM Safety via Logit Inspection
ProbeLogits reads token probabilities before generation to enforce safety policies at the OS level, achieving parity with learned classifiers at 2.5x speed.
April 21, 2026 Read → → - AI · arxiv/cs.AI · 8 min
Formal framework for multi-agent AI system safety and coordination
Researchers propose unified semantic models and 30 temporal-logic properties to verify behavior, detect coordination failures, and prevent vulnerabilities in agentic AI systems.
April 17, 2026 Read → → - AI · arxiv/cs.LG · 6 min
Speech Models Fail Safety Tests That Text Passes
VoxSafeBench reveals speech language models recognize social norms in text but ignore them when cues arrive through voice, speaker identity, or environment.
April 17, 2026 Read → → - AI · arxiv/cs.LG · 6 min
Speech Models Fail Safety Tests That Text Models Pass
A new benchmark reveals that speech language models drop safety, fairness, and privacy protections when cues arrive as audio rather than text.
April 17, 2026 Read → → - AI · arxiv/cs.LG · 8 min
Action Aliasing Breaks Safe RL Differently Depending on Filter Placement
A formal comparison of two projection-based safety strategies reveals that embedding safeguards in the policy creates gradient rank deficiency, while environment-level filters distribute the problem to the critic.
April 17, 2026 Read → →