Does the format I use to serialize FHIR data really affect LLM medication reconciliation accuracy?

Yes, significantly. According to this study, Clinical Narrative format outperforms Raw JSON by up to 19 F1 points on smaller models (≤8B parameters), while Raw JSON performs best on 70B-parameter models. The choice of format is not cosmetic; it directly impacts whether the model catches all active medications, which is critical for patient safety.

Which serialization format should I use for my medication reconciliation LLM?

Use Clinical Narrative format if your model has 8 billion parameters or fewer. Use Raw JSON if your model has 70 billion parameters or more. The study tested five open-weight models and found this pattern holds consistently. However, you should benchmark both formats on your own data and model to confirm, especially if your patient population has complex medication histories.

Why do LLMs miss medications more often than they invent fake ones?

Across all tested model-format combinations, precision exceeded recall, meaning omission (missing a real medication) was the dominant failure mode. This suggests LLMs are conservative in medication extraction, preferring to skip uncertain entries rather than guess. Clinically, this is safer than hallucination, but it means safety audits should prioritize catching false negatives (missed medications) over false positives.

← Content

AI · 8 min read · April 25, 2026

FHIR Format Choice Shifts LLM Medication Safety by 19 Points

How you serialize patient data to language models dramatically changes reconciliation accuracy, with smaller models favoring narrative text and large models preferring raw JSON.

Source: arxiv/cs.AI · Sanjoy Pator · open original ↗ ↗

Share: X LinkedIn

Data format choice significantly impacts LLM medication reconciliation accuracy, with strategy effectiveness varying by model size.

— Clinical Narrative format outperforms Raw JSON by up to 19 F1 points on models ≤8B parameters.
— Raw JSON achieves best performance (F1 0.9956) on 70B-parameter models, reversing smaller-model trends.
— All model-format combinations show precision exceeding recall; omission is the dominant failure mode.
— Models plateau at 7–10 concurrent medications, systematically underserving polypharmacy patients.
— BioMistral-7B produced zero usable output despite domain pretraining, indicating instruction tuning is essential.
— Tested five open-weight models across four serialization strategies on 200 synthetic patients (4,000 inferences).
— Practical deployment rule: use Clinical Narrative for ≤8B models, Raw JSON for 70B+.
— Pipeline reproducible on single AWS g6e.xlarge instance with 48 GB VRAM.

Frequently asked

Yes, significantly. According to this study, Clinical Narrative format outperforms Raw JSON by up to 19 F1 points on smaller models (≤8B parameters), while Raw JSON performs best on 70B-parameter models. The choice of format is not cosmetic; it directly impacts whether the model catches all active medications, which is critical for patient safety.

#llm #healthcare #fhir #medication #serialization

FHIR Format Choice Shifts LLM Medication Safety by 19 Points

Frequently asked

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs