- Yapay Zeka · arxiv/cs.AI · 4 dk
Automated quantization shrinks spike-driven language models for edge devices
QSLM framework compresses neural network models by up to 86.5% while preserving accuracy, enabling deployment on resource-constrained embedded hardware.
22 Nisan 2026 Oku → → - Yapay Zeka · arxiv/cs.AI · 6 dk
OjaKV: Online Low-Rank Compression for LLM Key-Value Caches
A hybrid storage and adaptive subspace method reduces KV cache memory by compressing intermediate tokens while preserving critical anchors, compatible with FlashAttention.
20 Nisan 2026 Oku → → - Yapay Zeka · arxiv/cs.LG · 8 dk
INT4 Quantization Fails After FP32 Convergence in Predictable Phases
Post-training quantization assumes converged models are ready to compress, but INT4 quantization collapses in a three-phase pattern tied to weight updates, not learning rate decay.
17 Nisan 2026 Oku → →