What is the Completeness Verifier and how does it work?

The Completeness Verifier is a mandatory component that checks whether a GUI task is truly complete before the agent stops. It uses an agent-level verifier to cross-examine the agent's completion claim against decision rules, rejecting claims that lack direct visual evidence from the UI. This prevents agents from declaring success prematurely.

How does the Loop Breaker prevent agents from repeating failed actions?

The Loop Breaker uses multi-tier filtering: it switches the agent's interaction mode after repeated failures, forces strategy changes when the screen state repeats persistently, and ties reflection signals to strategy shifts. This breaks cycles where agents repeat the same failing action without adapting.

What performance improvements does VLAA-GUI achieve on benchmarks?

VLAA-GUI achieves 77.5% on OSWorld and 61.0% on WindowsAgentArena. Three of the five tested model backbones surpass human performance (72.4%) on OSWorld in a single pass. Ablation studies show the Loop Breaker nearly halves wasted steps for models prone to repetitive loops.

← İçerik

Yapay Zeka · 3 dk okuma · 24 Nisan 2026

VLAA-GUI: Framework Stops Agents from Looping and Guessing

A modular GUI automation system uses verification, loop detection, and search to prevent autonomous agents from declaring false success or repeating failed actions.

Kaynak: arxiv/cs.AI · Qijun Han, Haoqin Tu, Zijun Wang, Haoyue Dai, Yiyang Zhou, Nancy Lau, Alvaro A. Cardenas, Yuhui Xu, Ran Xu, Caiming Xiong, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie · orijinali aç ↗ ↗

Paylaş: X LinkedIn

VLAA-GUI adds mandatory verification, loop-breaking, and search modules to prevent GUI agents from premature success claims and repetitive failures.

— Completeness Verifier enforces visual evidence before agents declare task completion.
— Loop Breaker detects repeated failures and forces strategy or interaction mode changes.
— Search Agent queries LLMs for unfamiliar workflows when agents encounter unknown tasks.
— Coding Agent and Grounding Agent handle specialized actions on demand.
— Achieves 77.5% on OSWorld and 61.0% on WindowsAgentArena benchmarks.
— Three of five tested backbones exceed human performance (72.4%) on OSWorld.
— Ablation shows Loop Breaker cuts wasted steps by roughly half for loop-prone models.

Sık sorulanlar

The Completeness Verifier is a mandatory component that checks whether a GUI task is truly complete before the agent stops. It uses an agent-level verifier to cross-examine the agent's completion claim against decision rules, rejecting claims that lack direct visual evidence from the UI. This prevents agents from declaring success prematurely.

#gui-automation #agents #verification #loop-detection #agentic-systems

VLAA-GUI: Framework Stops Agents from Looping and Guessing

Sık sorulanlar

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs