← İçerik
Yapay Zeka · 8 dk okuma · 17 Nisan 2026

Action Aliasing Breaks Safe RL Differently Depending on Filter Placement

A formal comparison of two projection-based safety strategies reveals that embedding safeguards in the policy creates gradient rank deficiency, while environment-level filters distribute the problem to the critic.

Kaynak: arxiv/cs.LG · Hannah Markgraf, Shambhuraj Sawant, Hanna Krasowski, Lukas Sch\"afer, Sebastien Gros, Matthias Althoff · orijinali aç ↗ ↗
Paylaş: X LinkedIn

Projection-based safety filters degrade policy learning differently when placed in the environment versus embedded in the policy due to action aliasing.

  • Two integration strategies exist: safeguard as environment wrapper (SE-RL) or as differentiable layer in policy (SP-RL).
  • Action aliasing occurs when multiple unsafe actions map to one safe action, causing information loss in gradient signals.
  • SE-RL distributes aliasing effects implicitly through the critic; SP-RL manifests it as rank-deficient Jacobians during backpropagation.
  • SP-RL suffers more from aliasing than SE-RL without mitigation, but penalty-based improvements can equalize or reverse this.
  • Choice between approaches depends on task structure and whether gradient flow through the safeguard matters.
  • Empirical validation confirms theoretical predictions across multiple environments.
  • Mitigation strategies borrowed from SE-RL practices improve SP-RL performance substantially.

Sık sorulanlar

  • Action aliasing occurs when a projection-based safety filter maps multiple different unsafe actions to the same safe action. This causes information loss because the policy gradient cannot distinguish between the original unsafe actions, making it harder for the policy to learn which actions to avoid. The severity depends on the constraint set geometry and action space dimensionality.

İlgili