← İçerik
Yapay Zeka · 8 dk okuma · 17 Nisan 2026

Small Models Match Large Ones via Inference Scaffolding

McClendon et al. show that role-based prompt structuring at inference time doubles small-model performance on complex tasks without retraining.

Kaynak: arxiv/cs.AI · S. Aaron McClendon, Jorge Gallego-Feliciano, Stavros Zervoudakis, Antonios Saravanos · orijinali aç ↗ ↗
Paylaş: X LinkedIn

Structured inference-time prompting with role assignment doubles small-model task completion without training overhead.

  • Qwen3-8B with three-role scaffolding reaches 8.9% task completion, up from 5.4% baseline.
  • Three roles: summarizer (compress history), agent (reason), corrector (fix code without context).
  • No retraining required; same frozen weights deployed three times with different prompts.
  • 8B model with scaffolding outperforms unscaffolded 33B DeepSeek-Coder on AppWorld benchmark.
  • 4-bit quantized version improves from 3.0% to 5.9%, showing gains persist under compression.
  • Strongest gains on difficulty-1 tasks: 15.8% to 26.3% (FP16), 5.3% to 14.0% (4-bit).
  • Approach formalizes as test-time compute scaling and action-space shaping from RL theory.

Sık sorulanlar

  • No. McClendon et al. apply the three-role structure to a frozen Qwen3-8B model without any fine-tuning or additional training. The improvement comes entirely from inference-time prompt engineering and role assignment, making it immediately applicable to existing models.

İlgili