← Content
AI · 4 min read · April 30, 2026

Evergreen: Cost-Efficient Verification of LLM-Generated Claims

A system that recasts claim verification as semantic queries, reducing LLM costs by 3.2x while maintaining accuracy on aggregated data.

Source: arxiv/cs.AI · Alexander W. Lee, Benjamin Han, Shayak Sen, Sam Yeom, Ugur Cetintemel, Anupam Datta · open original ↗ ↗
Share: X LinkedIn

Evergreen verifies claims in LLM-generated summaries by compiling them into semantic queries, cutting costs 3.2x via targeted optimizations.

  • LLM semantic aggregation produces natural language summaries that may contain ungrounded claims requiring verification.
  • Evergreen converts each claim into a declarative semantic query executed on the same engine that generated the aggregate.
  • Verification-aware optimizations include early stopping, relevance sorting, and confidence sequences to minimize LLM calls.
  • General semantic query optimizations include operator fusion, similarity filtering, and prompt caching.
  • Provenance tracking identifies minimal tuple sets justifying each verdict using semiring-based first-order logic semantics.
  • Benchmarks show F1=1.00 with strong LLMs and 3.2x cost reduction, 4.0x latency reduction versus baseline.
  • Weak LLM performance exceeds strong LLM-as-judge baselines at 48x lower cost and 2.3x lower latency.

Frequently asked

  • Evergreen avoids redundant LLM calls by using symbolic query execution on the underlying data. It only invokes the LLM when semantic reasoning is necessary, and it caches prompts across similar claims. Early stopping halts verification once sufficient evidence is found, and relevance sorting prioritizes high-impact tuples, reducing the data the LLM must process.

Related