Add regression test for scale_by_rms zero-gradient stability#1553
Add regression test for scale_by_rms zero-gradient stability#1553TanmayThakur2209 wants to merge 6 commits intogoogle-deepmind:mainfrom
Conversation
|
This test only passes if the One possibility is to always do a safe division / safe sqrt, but it adds an additional max operation, perhaps we should have an extra argument that enables a "safe" version of this transform when setting |
|
Thank you for highlighting this, this test assumes Along with that I really liked the idea of having an optional safe version for users who want They both do same math but differ in API philosophy. In this way it will have explicit user intent. It won't change default behaviour and allows users to opt into safely. For this PR, I’m happy to scope the change to the test update only and leave the safe variant as a potential future design discussion. I’d be interested in exploring such an option further if you think it would be useful. |
This PR adds a regression test to ensure
scale_by_rmsdoes not produce NaN or infinite updates when gradients are zero. This guards against potential future numerical stability regressions without changing optimizer behavior.