← İçerik
Yapay Zeka · 4 dk okuma · 17 Nisan 2026

Retrieval beats memorization for clinical code selection

A two-stage retrieval-then-classify method outperforms direct LLM generation for assembling clinical value sets from large standardized vocabularies.

Kaynak: arxiv/cs.LG · Sumit Mukherjee, Juan Shu, Nairwita Mazumder, Tate Kernell, Celena Wheeler, Shannon Hastings, Chris Sidey-Gibbons · orijinali aç ↗ ↗
Paylaş: X LinkedIn

Retrieve similar existing value sets, then classify candidates to build clinical code lists more accurately than direct LLM generation.

  • Clinical value set authoring identifies all codes defining a medical concept in standardized vocabularies.
  • LLMs fail to reliably recall large, versioned clinical vocabularies learned during pretraining.
  • RASC retrieves K similar existing value sets, then applies a classifier to rank candidate codes.
  • Cross-encoder on SAPBert achieves AUROC 0.852 and F1 0.298, beating zero-shot GPT-4o at F1 0.105.
  • Retrieval-then-classify reduces irrelevant candidates per true positive from 12.3 to 3.2–4.4.
  • Performance gap widens as value set size increases, confirming theoretical advantage of shrinking output space.
  • Benchmark created from 11,803 publicly available VSAC value sets, first large-scale dataset for task.
  • Gains replicate across SAPBert cross-encoder, LightGBM, and other classifier architectures.

Sık sorulanlar

  • LLMs do not reliably memorize large, versioned clinical vocabularies. RASC retrieves similar existing value sets to form a candidate pool, shrinking the effective output space. A classifier then ranks candidates against ground truth, avoiding hallucinated codes. This two-stage approach reduces irrelevant candidates per true positive from 12.3 to 3.2–4.4.

İlgili