Yapay Zeka · 4 dk okuma · 22 Nisan 2026
Automated quantization shrinks spike-driven language models for edge devices
QSLM framework compresses neural network models by up to 86.5% while preserving accuracy, enabling deployment on resource-constrained embedded hardware.
Kaynak: arxiv/cs.AI · Rachmad Vidya Wicaksana Putra, Pasindu Wickramasinghe, Muhammad Shafique · orijinali aç ↗ ↗
QSLM automates quantization of spike-driven language models, reducing memory footprint by 86.5% while maintaining task accuracy.
- — Spike-driven language models reduce energy use but retain large memory footprints unsuitable for embedded devices.
- — Manual quantization is labor-intensive and does not scale across different network architectures and constraints.
- — QSLM uses tiered quantization strategy operating at global, block, and module levels to compress models.
- — Framework analyzes layer sensitivity to quantization before selecting final compression settings.
- — Achieves 86.5% memory reduction, 20% power savings, and maintains 84.4% accuracy on sentiment classification.
- — Multi-objective trade-off function balances performance and memory constraints simultaneously.
- — Tested on text generation and classification tasks with minimal accuracy loss versus baseline.
Sık sorulanlar
- Spike-driven language models use neuromorphic computing principles to reduce energy consumption. Quantization compresses these models by reducing the precision of weights and activations (e.g., from 32-bit floats to 8-bit integers). QSLM automates this process by testing different quantization levels across network layers to find the best balance between model size and accuracy.