AI · 8 min read · April 25, 2026
FHIR Format Choice Shifts LLM Medication Safety by 19 Points
How you serialize patient data to language models dramatically changes reconciliation accuracy, with smaller models favoring narrative text and large models preferring raw JSON.
Data format choice significantly impacts LLM medication reconciliation accuracy, with strategy effectiveness varying by model size.
- — Clinical Narrative format outperforms Raw JSON by up to 19 F1 points on models ≤8B parameters.
- — Raw JSON achieves best performance (F1 0.9956) on 70B-parameter models, reversing smaller-model trends.
- — All model-format combinations show precision exceeding recall; omission is the dominant failure mode.
- — Models plateau at 7–10 concurrent medications, systematically underserving polypharmacy patients.
- — BioMistral-7B produced zero usable output despite domain pretraining, indicating instruction tuning is essential.
- — Tested five open-weight models across four serialization strategies on 200 synthetic patients (4,000 inferences).
- — Practical deployment rule: use Clinical Narrative for ≤8B models, Raw JSON for 70B+.
- — Pipeline reproducible on single AWS g6e.xlarge instance with 48 GB VRAM.
Frequently asked
- Yes, significantly. According to this study, Clinical Narrative format outperforms Raw JSON by up to 19 F1 points on smaller models (≤8B parameters), while Raw JSON performs best on 70B-parameter models. The choice of format is not cosmetic; it directly impacts whether the model catches all active medications, which is critical for patient safety.