Tag

#compression

3 insights with this tag.

AI · arxiv/cs.AI · 4 min

Automated quantization shrinks spike-driven language models for edge devices

QSLM framework compresses neural network models by up to 86.5% while preserving accuracy, enabling deployment on resource-constrained embedded hardware.

April 22, 2026 Read → →
AI · arxiv/cs.AI · 6 min

OjaKV: Online Low-Rank Compression for LLM Key-Value Caches

A hybrid storage and adaptive subspace method reduces KV cache memory by compressing intermediate tokens while preserving critical anchors, compatible with FlashAttention.

April 20, 2026 Read → →
AI · arxiv/cs.LG · 8 min

INT4 Quantization Fails After FP32 Convergence in Predictable Phases

Post-training quantization assumes converged models are ready to compress, but INT4 quantization collapses in a three-phase pattern tied to weight updates, not learning rate decay.

April 17, 2026 Read → →

astrobobo

Bu site JavaScript gerektirir. Tarayıcında JavaScript'i etkinleştir.

This site requires JavaScript. Please enable it in your browser.