Yapay Zeka · 4 dk okuma · 17 Nisan 2026
Retrieval-Augmented Set Completion for Clinical Code Authoring
A two-stage approach retrieves similar clinical value sets then classifies candidates, outperforming direct LLM generation on standardized medical vocabularies.
Kaynak: arxiv/cs.LG · Sumit Mukherjee, Juan Shu, Nairwita Mazumder, Tate Kernell, Celena Wheeler, Shannon Hastings, Chris Sidey-Gibbons · orijinali aç ↗ ↗
Retrieve similar clinical value sets, then classify candidates with a fine-tuned model, reducing hallucination and improving code selection accuracy.
- — Clinical value set authoring identifies all codes representing a medical concept in standardized vocabularies.
- — Direct LLM prompting fails because vocabularies are large, versioned, and not reliably memorized.
- — RASC retrieves K similar existing sets from a corpus, then applies a classifier to each candidate.
- — Cross-encoder fine-tuned on SAPBert achieves AUROC 0.852, outperforming MLP (0.799) and GPT-4o zero-shot (F1 0.105).
- — GPT-4o returns 48.6% codes absent from the official vocabulary, indicating hallucination.
- — Retrieval-only baseline produces 12.3 irrelevant codes per true positive; classifiers reduce this to 3.2–4.4.
- — Performance gap widens as value set size increases, confirming theoretical advantage of shrinking output space.
- — Benchmark dataset of 11,803 VSAC value sets enables reproducible evaluation.
Sık sorulanlar
- Large language models are not reliably trained on the full, versioned clinical vocabularies (e.g., SNOMED CT, ICD-10). They hallucinate codes that do not exist in official systems, creating compliance and data integrity risks. Retrieval-augmented approaches ground the model in a curated corpus of real codes, eliminating out-of-vocabulary hallucinations.