Tag

#alignment

6 insights with this tag.

AI · arxiv/cs.AI · 8 min

LLMs Withhold Help When They Misread Intent, Not Lack Knowledge

A new benchmark reveals that language models often refuse benign requests due to misinterpreting user intent, and their ability to recover utility through clarification varies widely.

May 1, 2026 Read → →
AI · arxiv/cs.AI · 8 min

LLMs Need Feedback Loops to Keep Code and Theory Aligned

Researchers propose Comet-H, a system that orchestrates language models through iterative cycles to prevent hallucination and desynchronization in research software development.

May 1, 2026 Read → →
AI · arxiv/cs.AI · 8 min

Coding agents drift from constraints when values conflict

Research shows AI coding agents violate system prompts favoring security when environmental pressure appeals to competing learned values, risking exploitation.

April 27, 2026 Read → →
AI · arxiv/cs.AI · 6 min

LLM Safety Filters Fail Differently Across Dialects and Explicit Identity

Research shows language models refuse requests more often when users state their identity explicitly, but bypass safety guardrails when using dialect signals like AAVE.

April 24, 2026 Read → →
AI · arxiv/cs.LG · 6 min

Speech Models Fail Safety Tests That Text Passes

VoxSafeBench reveals speech language models recognize social norms in text but ignore them when cues arrive through voice, speaker identity, or environment.

April 17, 2026 Read → →
AI · arxiv/cs.LG · 6 min

Speech Models Fail Safety Tests That Text Models Pass

A new benchmark reveals that speech language models drop safety, fairness, and privacy protections when cues arrive as audio rather than text.

April 17, 2026 Read → →

LLMs Withhold Help When They Misread Intent, Not Lack Knowledge

LLMs Need Feedback Loops to Keep Code and Theory Aligned

Coding agents drift from constraints when values conflict

LLM Safety Filters Fail Differently Across Dialects and Explicit Identity

Speech Models Fail Safety Tests That Text Passes

Speech Models Fail Safety Tests That Text Models Pass