← İçerik
Yapay Zeka · 8 dk okuma · 24 Nisan 2026

GEM activation functions match ReLU speed with smoother gradients

Krause proposes rational activation functions with tunable smoothness that reduce optimization friction in deep networks while maintaining computational efficiency.

Kaynak: arxiv/cs.AI · Eylon E. Krause · orijinali aç ↗ ↗
Paylaş: X LinkedIn

Krause introduces GEM, a family of smooth rational activation functions that approximate ReLU performance with better gradient flow for deep architectures.

  • GEM uses log-logistic CDF gating to achieve C^{2N}-smoothness without sacrificing ReLU-like behavior.
  • Three variants: base GEM, E-GEM (epsilon-parameterized for arbitrary L^p approximation), SE-GEM (piecewise with smooth junctions).
  • N=1 smoothness optimal for standard CNNs; N=2 preferred for transformers, revealing architecture-dependent tradeoffs.
  • On CIFAR-100 + ResNet-56, E-GEM closes GELU gap from 6.10% to 0.62% accuracy deficit.
  • SE-GEM surpasses GELU on CIFAR-10 (92.51% vs 92.44%), first GEM variant to outperform GELU baseline.
  • GPT-2 (124M) achieves lowest perplexity with GEM (72.57 vs 73.76 GELU); BERT-small validation loss improves to 6.656.
  • Epsilon parameter reveals scale-dependent optimum: small epsilon for deep CNNs, large epsilon for shallow transformers.
  • Purely rational arithmetic enables efficient hardware implementation without transcendental operations.

Sık sorulanlar

  • GEM (Geometric Monomial) is a family of smooth activation functions using log-logistic gating that approximates ReLU behavior while maintaining continuous derivatives up to order 2N. Unlike ReLU's sharp kink at zero, GEM's smoothness reduces gradient discontinuities, improving optimization in deep networks. It uses only rational arithmetic, avoiding expensive transcendental operations.

İlgili