How do Selective-Update RNNs avoid memory decay in long sequences?

suRNNs use neuron-level binary switches that only activate when the input contains new information. During low-information periods (silence, noise, or static input), the switches remain closed and the internal memory stays unchanged. This prevents the model from overwriting past information and creates a direct path for gradients to flow backward across time, solving the memory decay problem that standard RNNs face.

Why are suRNNs more efficient than Transformers for long sequences?

Transformers compute attention over the entire sequence length, so their cost grows quadratically with sequence size. suRNNs decouple updates from sequence length by only updating when informative events occur. If a sequence is 10,000 steps long but contains only 100 informative events, suRNNs perform roughly 100 updates while Transformers still process all 10,000 steps, making suRNNs significantly more efficient for sparse, long-range data.

Do suRNNs match Transformer accuracy on standard benchmarks?

Yes. According to experiments on Long Range Arena and WikiText benchmarks, suRNNs match or exceed the accuracy of Transformers while using less memory and computation. This is significant because it shows that RNNs, when designed with selective updating, can achieve Transformer-level performance without the quadratic cost, making them a viable alternative for long-range sequence modeling.

← Content

AI · 4 min read · May 3, 2026

Selective-Update RNNs Match Transformers While Using Less Memory

A new RNN architecture learns when to update internal state, preserving memory across long sequences and reducing computational waste on redundant input.

Source: arxiv/cs.LG · Bojian Yin, Shurong Wang, Haoyu Tan, Sander Bohte, Federico Corradi, Guoqi Li · open original ↗ ↗

Share: X LinkedIn

Selective-Update RNNs preserve memory during low-information periods, matching Transformer accuracy with lower computational cost.

— Standard RNNs update state at every step, causing memory decay and wasting computation on static input.
— suRNNs use neuron-level binary switches that activate only for informative events, decoupling updates from sequence length.
— Preserved memory during silence or noise creates direct gradient paths to distant past events.
— Experiments show suRNNs match or exceed Transformer performance on Long Range Arena and WikiText benchmarks.
— Each neuron learns its own update timescale, aligning model behavior with actual information density in data.
— Approach maintains exact memory unchanged during low-information intervals, reducing overwriting and signal loss.
— More efficient long-term storage than Transformers while retaining competitive accuracy on long-range dependencies.

Frequently asked

suRNNs use neuron-level binary switches that only activate when the input contains new information. During low-information periods (silence, noise, or static input), the switches remain closed and the internal memory stays unchanged. This prevents the model from overwriting past information and creates a direct path for gradients to flow backward across time, solving the memory decay problem that standard RNNs face.

#rnn #sequence-modeling #efficiency #memory #transformers

Selective-Update RNNs Match Transformers While Using Less Memory

Frequently asked

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs