Fix: Square atom_norm in non-Huber energy and virial loss calculations.#5332
Fix: Square atom_norm in non-Huber energy and virial loss calculations.#5332anyangml wants to merge 2 commits intodeepmodeling:masterfrom
Conversation
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughThe PR updates the normalization factor in energy and virial loss calculations across four framework implementations (dpmodel, pd, pt, tf). For non-Huber MSE paths, the atomic normalization factor changes from Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Fixes normalization in energy/virial MSE loss terms so that per-atom scaling is consistent with the Huber-loss code paths and the reported per-atom RMSE metrics.
Changes:
- Square
atom_norm/atom_norm_enerin non-Huber (MSE) energy loss accumulation. - Square
atom_normin non-Huber (MSE) virial loss accumulation. - Apply the same normalization fix across TF, PyTorch, Paddle, and
dpmodelbackends for the updated loss implementations.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
deepmd/tf/loss/ener.py |
Squares atom normalization in non-Huber MSE energy/virial loss accumulation (EnerStdLoss path). |
deepmd/pt/loss/ener.py |
Squares atom_norm in non-Huber MSE energy and virial loss accumulation. |
deepmd/pd/loss/ener.py |
Squares atom_norm in non-Huber MSE energy and virial loss accumulation. |
deepmd/dpmodel/loss/ener.py |
Squares atom normalization in non-Huber MSE energy and virial loss accumulation in the array-API backend. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if not self.use_huber: | ||
| loss += atom_norm * (pref_e * l2_ener_loss) | ||
| loss += atom_norm**2 * (pref_e * l2_ener_loss) | ||
| else: |
There was a problem hiding this comment.
There are tests comparing losses across backends, but they don’t assert the intended natoms-scaling explicitly. Since this change fixes a subtle normalization bug (MSE energy/virial terms should scale with 1/natoms^2), consider adding a small regression test that checks the loss contribution scales correctly when natoms changes, so future refactors can’t accidentally revert to 1/natoms.
| @@ -370,7 +370,9 @@ def build( | |||
| ) | |||
| if self.has_v: | |||
| if not self.use_huber: | |||
| loss += global_cvt_2_ener_float(atom_norm * (pref_v * l2_virial_loss)) | |||
| loss += global_cvt_2_ener_float( | |||
| atom_norm**2 * (pref_v * l2_virial_loss) | |||
| ) | |||
There was a problem hiding this comment.
This updates EnerStdLoss’ non-Huber MSE scaling to use atom_norm2 / atom_norm_ener2, but the same file still has other non-Huber MSE loss accumulations using a linear atom_norm factor (e.g., EnerSpinLoss build around l2_loss += atom_norm_ener * ... and virial term with atom_norm * ..., and EnerDipoleLoss build around l2_loss += atom_norm_ener * ...). That leaves inconsistent normalization depending on which loss class is used, and also conflicts with the per-atom RMSE reporting (rmse_e computed as sqrt(l2_ener_loss)/natoms). Please update those remaining energy/virial MSE paths to square the atom normalization as well, or clarify why those classes should behave differently.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
deepmd/dpmodel/loss/ener.py (1)
248-248: Add a regression test for the new normalization rule.Because the displayed RMSE stays unchanged, this
1/Nvs1/N^2mistake is easy to reintroduce. A tinyEnergyLoss.call()case withnatoms > 1and fixed energy/virial deltas would lock it down.Also applies to: 315-315
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/dpmodel/loss/ener.py` at line 248, The energy normalization is currently prone to a 1/N vs 1/N^2 regression; add a unit/regression test targeting EnergyLoss.call() in deepmd/dpmodel/loss/ener.py that constructs a small batch with natoms > 1, fixed per-atom energy deltas and virials, and asserts the computed energy loss scales with 1/N (not 1/N^2) by comparing expected loss values for different natom counts; reference EnergyLoss.call() and the atom_norm_ener accumulation (the expression involving atom_norm_ener**2 * (pref_e * l2_ener_loss)) to ensure the test fails if the normalization is squared incorrectly and include an explicit check for both energy-only and energy+virial modes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepmd/tf/loss/ener.py`:
- Line 349: EnerSpinLoss.build and EnerDipoleLoss.build still use the old
single-factor 1/N normalization; update both to use the same squared per-atom
normalization pattern used in EnerStdLoss (i.e., multiply the loss term by
atom_norm_ener**2 * (pref_e * l2_ener_loss) instead of a single atom_norm_ener
factor) so all energy loss variants apply the corrected normalization
consistently; locate the energy-loss accumulation in EnerSpinLoss.build and
EnerDipoleLoss.build and replace the old single-factor scaling with the squared
atom_norm_ener**2 * (pref_e * l2_ener_loss) pattern used in EnerStdLoss.
---
Nitpick comments:
In `@deepmd/dpmodel/loss/ener.py`:
- Line 248: The energy normalization is currently prone to a 1/N vs 1/N^2
regression; add a unit/regression test targeting EnergyLoss.call() in
deepmd/dpmodel/loss/ener.py that constructs a small batch with natoms > 1, fixed
per-atom energy deltas and virials, and asserts the computed energy loss scales
with 1/N (not 1/N^2) by comparing expected loss values for different natom
counts; reference EnergyLoss.call() and the atom_norm_ener accumulation (the
expression involving atom_norm_ener**2 * (pref_e * l2_ener_loss)) to ensure the
test fails if the normalization is squared incorrectly and include an explicit
check for both energy-only and energy+virial modes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a73a40cd-4fda-4353-8c33-d0685030359b
📒 Files selected for processing (4)
deepmd/dpmodel/loss/ener.pydeepmd/pd/loss/ener.pydeepmd/pt/loss/ener.pydeepmd/tf/loss/ener.py
| if self.has_e: | ||
| if not self.use_huber: | ||
| loss += atom_norm_ener * (pref_e * l2_ener_loss) | ||
| loss += atom_norm_ener**2 * (pref_e * l2_ener_loss) |
There was a problem hiding this comment.
Finish the TensorFlow normalization fix in the remaining loss variants.
EnerStdLoss is corrected here, but EnerSpinLoss.build() still uses the old single-factor normalization at Line 757 and Line 770, and EnerDipoleLoss.build() still does the same at Line 1039. Those paths will keep the old 1/N scaling after this PR.
Minimal follow-up diff
- if self.has_e:
- l2_loss += atom_norm_ener * (pref_e * l2_ener_loss)
+ if self.has_e:
+ l2_loss += atom_norm_ener**2 * (pref_e * l2_ener_loss)
@@
- if self.has_v:
- l2_loss += global_cvt_2_ener_float(atom_norm * (pref_v * l2_virial_loss))
+ if self.has_v:
+ l2_loss += global_cvt_2_ener_float(atom_norm**2 * (pref_v * l2_virial_loss))
@@
- l2_loss += atom_norm_ener * (pref_e * l2_ener_loss)
+ l2_loss += atom_norm_ener**2 * (pref_e * l2_ener_loss)Also applies to: 373-375
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepmd/tf/loss/ener.py` at line 349, EnerSpinLoss.build and
EnerDipoleLoss.build still use the old single-factor 1/N normalization; update
both to use the same squared per-atom normalization pattern used in EnerStdLoss
(i.e., multiply the loss term by atom_norm_ener**2 * (pref_e * l2_ener_loss)
instead of a single atom_norm_ener factor) so all energy loss variants apply the
corrected normalization consistently; locate the energy-loss accumulation in
EnerSpinLoss.build and EnerDipoleLoss.build and replace the old single-factor
scaling with the squared atom_norm_ener**2 * (pref_e * l2_ener_loss) pattern
used in EnerStdLoss.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5332 +/- ##
=======================================
Coverage 82.37% 82.37%
=======================================
Files 780 780
Lines 78242 78242
Branches 3675 3675
=======================================
Hits 64454 64454
Misses 12616 12616
Partials 1172 1172 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This is to normalize all L2 loss so that in multitask training, the scale of gradients becomes independent of system size. Do we want to update pref in examples? @iProzd @njzjz
Summary by CodeRabbit
Bug Fixes