AI · 4 min read · May 3, 2026
Selective-Update RNNs Match Transformers While Using Less Memory
A new RNN architecture learns when to update internal state, preserving memory across long sequences and reducing computational waste on redundant input.
Source: arxiv/cs.LG · Bojian Yin, Shurong Wang, Haoyu Tan, Sander Bohte, Federico Corradi, Guoqi Li · open original ↗ ↗
Selective-Update RNNs preserve memory during low-information periods, matching Transformer accuracy with lower computational cost.
- — Standard RNNs update state at every step, causing memory decay and wasting computation on static input.
- — suRNNs use neuron-level binary switches that activate only for informative events, decoupling updates from sequence length.
- — Preserved memory during silence or noise creates direct gradient paths to distant past events.
- — Experiments show suRNNs match or exceed Transformer performance on Long Range Arena and WikiText benchmarks.
- — Each neuron learns its own update timescale, aligning model behavior with actual information density in data.
- — Approach maintains exact memory unchanged during low-information intervals, reducing overwriting and signal loss.
- — More efficient long-term storage than Transformers while retaining competitive accuracy on long-range dependencies.
Frequently asked
- suRNNs use neuron-level binary switches that only activate when the input contains new information. During low-information periods (silence, noise, or static input), the switches remain closed and the internal memory stays unchanged. This prevents the model from overwriting past information and creates a direct path for gradients to flow backward across time, solving the memory decay problem that standard RNNs face.