Yapay Zeka · 8 dk okuma · 17 Nisan 2026
Small Models Match Large Ones via Inference Scaffolding
McClendon et al. show that role-based prompt structuring at inference time doubles small-model performance on complex tasks without retraining.
Kaynak: arxiv/cs.AI · S. Aaron McClendon, Jorge Gallego-Feliciano, Stavros Zervoudakis, Antonios Saravanos · orijinali aç ↗ ↗
Structured inference-time prompting with role assignment doubles small-model task completion without training overhead.
- — Qwen3-8B with three-role scaffolding reaches 8.9% task completion, up from 5.4% baseline.
- — Three roles: summarizer (compress history), agent (reason), corrector (fix code without context).
- — No retraining required; same frozen weights deployed three times with different prompts.
- — 8B model with scaffolding outperforms unscaffolded 33B DeepSeek-Coder on AppWorld benchmark.
- — 4-bit quantized version improves from 3.0% to 5.9%, showing gains persist under compression.
- — Strongest gains on difficulty-1 tasks: 15.8% to 26.3% (FP16), 5.3% to 14.0% (4-bit).
- — Approach formalizes as test-time compute scaling and action-space shaping from RL theory.
Sık sorulanlar
- No. McClendon et al. apply the three-role structure to a frozen Qwen3-8B model without any fine-tuning or additional training. The improvement comes entirely from inference-time prompt engineering and role assignment, making it immediately applicable to existing models.