Skip to content

Fix: Square atom_norm in non-Huber energy and virial loss calculations.#5332

Draft
anyangml wants to merge 2 commits intodeepmodeling:masterfrom
anyangml:fix/rmse-loss-normalization
Draft

Fix: Square atom_norm in non-Huber energy and virial loss calculations.#5332
anyangml wants to merge 2 commits intodeepmodeling:masterfrom
anyangml:fix/rmse-loss-normalization

Conversation

@anyangml
Copy link
Collaborator

@anyangml anyangml commented Mar 23, 2026

This is to normalize all L2 loss so that in multitask training, the scale of gradients becomes independent of system size. Do we want to update pref in examples? @iProzd @njzjz

Summary by CodeRabbit

Bug Fixes

  • Adjusted loss normalization factors for energy and virial predictions across all model training backends to improve training convergence and model accuracy.

Copilot AI review requested due to automatic review settings March 23, 2026 04:27
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

The PR updates the normalization factor in energy and virial loss calculations across four framework implementations (dpmodel, pd, pt, tf). For non-Huber MSE paths, the atomic normalization factor changes from atom_norm to atom_norm**2, applied to both energy and virial loss terms consistently across all implementations.

Changes

Cohort / File(s) Summary
Energy Loss Normalization Updates
deepmd/dpmodel/loss/ener.py, deepmd/pd/loss/ener.py, deepmd/pt/loss/ener.py, deepmd/tf/loss/ener.py
Updated non-Huber MSE loss normalization factor from atom_norm * (pref_* * l2_*_loss) to atom_norm**2 * (pref_* * l2_*_loss) for both energy and virial loss terms. Huber and MAE loss paths remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • njzjz
  • wanghan-iapcm
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: squaring atom_norm in non-Huber energy and virial loss calculations across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes normalization in energy/virial MSE loss terms so that per-atom scaling is consistent with the Huber-loss code paths and the reported per-atom RMSE metrics.

Changes:

  • Square atom_norm / atom_norm_ener in non-Huber (MSE) energy loss accumulation.
  • Square atom_norm in non-Huber (MSE) virial loss accumulation.
  • Apply the same normalization fix across TF, PyTorch, Paddle, and dpmodel backends for the updated loss implementations.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
deepmd/tf/loss/ener.py Squares atom normalization in non-Huber MSE energy/virial loss accumulation (EnerStdLoss path).
deepmd/pt/loss/ener.py Squares atom_norm in non-Huber MSE energy and virial loss accumulation.
deepmd/pd/loss/ener.py Squares atom_norm in non-Huber MSE energy and virial loss accumulation.
deepmd/dpmodel/loss/ener.py Squares atom normalization in non-Huber MSE energy and virial loss accumulation in the array-API backend.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 241 to 243
if not self.use_huber:
loss += atom_norm * (pref_e * l2_ener_loss)
loss += atom_norm**2 * (pref_e * l2_ener_loss)
else:
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tests comparing losses across backends, but they don’t assert the intended natoms-scaling explicitly. Since this change fixes a subtle normalization bug (MSE energy/virial terms should scale with 1/natoms^2), consider adding a small regression test that checks the loss contribution scales correctly when natoms changes, so future refactors can’t accidentally revert to 1/natoms.

Copilot uses AI. Check for mistakes.
Comment on lines 348 to +375
@@ -370,7 +370,9 @@ def build(
)
if self.has_v:
if not self.use_huber:
loss += global_cvt_2_ener_float(atom_norm * (pref_v * l2_virial_loss))
loss += global_cvt_2_ener_float(
atom_norm**2 * (pref_v * l2_virial_loss)
)
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This updates EnerStdLoss’ non-Huber MSE scaling to use atom_norm2 / atom_norm_ener2, but the same file still has other non-Huber MSE loss accumulations using a linear atom_norm factor (e.g., EnerSpinLoss build around l2_loss += atom_norm_ener * ... and virial term with atom_norm * ..., and EnerDipoleLoss build around l2_loss += atom_norm_ener * ...). That leaves inconsistent normalization depending on which loss class is used, and also conflicts with the per-atom RMSE reporting (rmse_e computed as sqrt(l2_ener_loss)/natoms). Please update those remaining energy/virial MSE paths to square the atom normalization as well, or clarify why those classes should behave differently.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
deepmd/dpmodel/loss/ener.py (1)

248-248: Add a regression test for the new normalization rule.

Because the displayed RMSE stays unchanged, this 1/N vs 1/N^2 mistake is easy to reintroduce. A tiny EnergyLoss.call() case with natoms > 1 and fixed energy/virial deltas would lock it down.

Also applies to: 315-315

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/loss/ener.py` at line 248, The energy normalization is
currently prone to a 1/N vs 1/N^2 regression; add a unit/regression test
targeting EnergyLoss.call() in deepmd/dpmodel/loss/ener.py that constructs a
small batch with natoms > 1, fixed per-atom energy deltas and virials, and
asserts the computed energy loss scales with 1/N (not 1/N^2) by comparing
expected loss values for different natom counts; reference EnergyLoss.call() and
the atom_norm_ener accumulation (the expression involving atom_norm_ener**2 *
(pref_e * l2_ener_loss)) to ensure the test fails if the normalization is
squared incorrectly and include an explicit check for both energy-only and
energy+virial modes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/tf/loss/ener.py`:
- Line 349: EnerSpinLoss.build and EnerDipoleLoss.build still use the old
single-factor 1/N normalization; update both to use the same squared per-atom
normalization pattern used in EnerStdLoss (i.e., multiply the loss term by
atom_norm_ener**2 * (pref_e * l2_ener_loss) instead of a single atom_norm_ener
factor) so all energy loss variants apply the corrected normalization
consistently; locate the energy-loss accumulation in EnerSpinLoss.build and
EnerDipoleLoss.build and replace the old single-factor scaling with the squared
atom_norm_ener**2 * (pref_e * l2_ener_loss) pattern used in EnerStdLoss.

---

Nitpick comments:
In `@deepmd/dpmodel/loss/ener.py`:
- Line 248: The energy normalization is currently prone to a 1/N vs 1/N^2
regression; add a unit/regression test targeting EnergyLoss.call() in
deepmd/dpmodel/loss/ener.py that constructs a small batch with natoms > 1, fixed
per-atom energy deltas and virials, and asserts the computed energy loss scales
with 1/N (not 1/N^2) by comparing expected loss values for different natom
counts; reference EnergyLoss.call() and the atom_norm_ener accumulation (the
expression involving atom_norm_ener**2 * (pref_e * l2_ener_loss)) to ensure the
test fails if the normalization is squared incorrectly and include an explicit
check for both energy-only and energy+virial modes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a73a40cd-4fda-4353-8c33-d0685030359b

📥 Commits

Reviewing files that changed from the base of the PR and between b97ad98 and 247f053.

📒 Files selected for processing (4)
  • deepmd/dpmodel/loss/ener.py
  • deepmd/pd/loss/ener.py
  • deepmd/pt/loss/ener.py
  • deepmd/tf/loss/ener.py

if self.has_e:
if not self.use_huber:
loss += atom_norm_ener * (pref_e * l2_ener_loss)
loss += atom_norm_ener**2 * (pref_e * l2_ener_loss)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Finish the TensorFlow normalization fix in the remaining loss variants.

EnerStdLoss is corrected here, but EnerSpinLoss.build() still uses the old single-factor normalization at Line 757 and Line 770, and EnerDipoleLoss.build() still does the same at Line 1039. Those paths will keep the old 1/N scaling after this PR.

Minimal follow-up diff
-        if self.has_e:
-            l2_loss += atom_norm_ener * (pref_e * l2_ener_loss)
+        if self.has_e:
+            l2_loss += atom_norm_ener**2 * (pref_e * l2_ener_loss)
@@
-        if self.has_v:
-            l2_loss += global_cvt_2_ener_float(atom_norm * (pref_v * l2_virial_loss))
+        if self.has_v:
+            l2_loss += global_cvt_2_ener_float(atom_norm**2 * (pref_v * l2_virial_loss))
@@
-        l2_loss += atom_norm_ener * (pref_e * l2_ener_loss)
+        l2_loss += atom_norm_ener**2 * (pref_e * l2_ener_loss)

Also applies to: 373-375

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/tf/loss/ener.py` at line 349, EnerSpinLoss.build and
EnerDipoleLoss.build still use the old single-factor 1/N normalization; update
both to use the same squared per-atom normalization pattern used in EnerStdLoss
(i.e., multiply the loss term by atom_norm_ener**2 * (pref_e * l2_ener_loss)
instead of a single atom_norm_ener factor) so all energy loss variants apply the
corrected normalization consistently; locate the energy-loss accumulation in
EnerSpinLoss.build and EnerDipoleLoss.build and replace the old single-factor
scaling with the squared atom_norm_ener**2 * (pref_e * l2_ener_loss) pattern
used in EnerStdLoss.

@codecov
Copy link

codecov bot commented Mar 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.37%. Comparing base (b97ad98) to head (247f053).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5332   +/-   ##
=======================================
  Coverage   82.37%   82.37%           
=======================================
  Files         780      780           
  Lines       78242    78242           
  Branches     3675     3675           
=======================================
  Hits        64454    64454           
  Misses      12616    12616           
  Partials     1172     1172           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@deepmodeling deepmodeling deleted a comment Mar 23, 2026
@anyangml anyangml requested a review from iProzd March 23, 2026 07:29
@anyangml anyangml marked this pull request as draft March 23, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants