Add regression test for scale_by_rms zero-gradient stability by TanmayThakur2209 · Pull Request #1553 · google-deepmind/optax

TanmayThakur2209 · 2026-01-07T17:40:39Z

This PR adds a regression test to ensure scale_by_rms does not produce NaN or infinite updates when gradients are zero. This guards against potential future numerical stability regressions without changing optimizer behavior.

rdyro · 2026-01-07T22:33:03Z

This test only passes if the eps is non-zero. It's a bit of an interesting question what we should return if the eps == 0?

One possibility is to always do a safe division / safe sqrt, but it adds an additional max operation, perhaps we should have an extra argument that enables a "safe" version of this transform when setting eps == 0. What do you think?

TanmayThakur2209 · 2026-01-08T13:57:05Z

Thank you for highlighting this, this test assumes eps>0 by default and would fail if eps==0. I propose updating the test to explicitly use a non-zero epsilon:, tx = transform.scale_by_rms(eps=1e-8) where this guarantees no division by zero and match the current optax semantics. This test then asserts numerical stability in the recommended configuration (eps > 0) and does not define the behavior for eps == 0.

Along with that I really liked the idea of having an optional safe version for users who want eps == 0 but still prefer numerical stability. It would work by clamping the RMS denominator to a small positive value so division by zero never occurs, at the cost of changing the math and adding extra operations. This can be done by putting an explicit flag to it scale_by_rms(eps=0.0, safe=True), or by separate compositional transform, for example:

optax.chain(
    optax.scale_by_rms(eps=0.0),
    optax.zero_nans(),
)

They both do same math but differ in API philosophy. In this way it will have explicit user intent. It won't change default behaviour and allows users to opt into safely.

For this PR, I’m happy to scope the change to the test update only and leave the safe variant as a potential future design discussion. I’d be interested in exploring such an option further if you think it would be useful.

TanmayThakur2209 added 4 commits January 7, 2026 22:52

Add regression test for scale_by_rms zero-gradient stability

ebecbb6

Fix trailing whitespace in rms regression test

60e1482

Fix formatting issues in RMS regression test

df9f634

Fix formatting per pre-commit hooks

1ca9a63

TanmayThakur2209 added 2 commits January 8, 2026 18:57

Make scale_by_rms zero-gradient test explicit about eps > 0

096beab

Fix typo in Lion modes test

060ab98

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add regression test for scale_by_rms zero-gradient stability#1553

Add regression test for scale_by_rms zero-gradient stability#1553
TanmayThakur2209 wants to merge 6 commits intogoogle-deepmind:mainfrom
TanmayThakur2209:add-rms-zero-grad-regression-test

TanmayThakur2209 commented Jan 7, 2026

Uh oh!

rdyro commented Jan 7, 2026

Uh oh!

TanmayThakur2209 commented Jan 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TanmayThakur2209 commented Jan 7, 2026

Uh oh!

rdyro commented Jan 7, 2026

Uh oh!

TanmayThakur2209 commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TanmayThakur2209 commented Jan 8, 2026 •

edited

Loading