Fix: Square atom_norm in non-Huber energy and virial loss calculations. by anyangml · Pull Request #5332 · deepmodeling/deepmd-kit

anyangml · 2026-03-23T04:27:38Z

This is to normalize all L2 loss so that in multitask training, the scale of gradients becomes independent of system size. Do we want to update pref in examples? @iProzd @njzjz

Summary by CodeRabbit

Bug Fixes

Adjusted loss normalization factors for energy and virial predictions across all model training backends to improve training convergence and model accuracy.

for more information, see https://pre-commit.ci

coderabbitai · 2026-03-23T04:29:39Z

📝 Walkthrough

Walkthrough

The PR updates the normalization factor in energy and virial loss calculations across four framework implementations (dpmodel, pd, pt, tf). For non-Huber MSE paths, the atomic normalization factor changes from atom_norm to atom_norm**2, applied to both energy and virial loss terms consistently across all implementations.

Changes

Cohort / File(s)	Summary
Energy Loss Normalization Updates `deepmd/dpmodel/loss/ener.py`, `deepmd/pd/loss/ener.py`, `deepmd/pt/loss/ener.py`, `deepmd/tf/loss/ener.py`	Updated non-Huber MSE loss normalization factor from `atom_norm * (pref_* * l2__loss)` to `atom_norm2 (pref_* * l2_*_loss)` for both energy and virial loss terms. Huber and MAE loss paths remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

feat: add huber loss #4684: Added Huber-loss branches and use_huber flag to energy/virial loss accumulation; this PR modifies the non-Huber path behavior directly affected by that feature.
fix(dpmodel): rmse_e loss normalized by n_atoms #5124: Modifies atom-normalization handling in deepmd/dpmodel/loss/ener.py's EnergyLoss.call with related loss scaling adjustments.
feat(dp, pt): add force l2 norm loss & mae loss #5294: Modifies loss computation in the same energy/virial loss implementations with atomic-normalization-based scaling changes.

Suggested reviewers

njzjz
wanghan-iapcm

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: squaring atom_norm in non-Huber energy and virial loss calculations across multiple files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Fixes normalization in energy/virial MSE loss terms so that per-atom scaling is consistent with the Huber-loss code paths and the reported per-atom RMSE metrics.

Changes:

Square atom_norm / atom_norm_ener in non-Huber (MSE) energy loss accumulation.
Square atom_norm in non-Huber (MSE) virial loss accumulation.
Apply the same normalization fix across TF, PyTorch, Paddle, and dpmodel backends for the updated loss implementations.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`deepmd/tf/loss/ener.py`	Squares atom normalization in non-Huber MSE energy/virial loss accumulation (EnerStdLoss path).
`deepmd/pt/loss/ener.py`	Squares `atom_norm` in non-Huber MSE energy and virial loss accumulation.
`deepmd/pd/loss/ener.py`	Squares `atom_norm` in non-Huber MSE energy and virial loss accumulation.
`deepmd/dpmodel/loss/ener.py`	Squares atom normalization in non-Huber MSE energy and virial loss accumulation in the array-API backend.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T04:34:36Z

deepmd/pt/loss/ener.py

                if not self.use_huber:
-                    loss += atom_norm * (pref_e * l2_ener_loss)
+                    loss += atom_norm**2 * (pref_e * l2_ener_loss)
                else:


There are tests comparing losses across backends, but they don’t assert the intended natoms-scaling explicitly. Since this change fixes a subtle normalization bug (MSE energy/virial terms should scale with 1/natoms^2), consider adding a small regression test that checks the loss contribution scales correctly when natoms changes, so future refactors can’t accidentally revert to 1/natoms.

Copilot · 2026-03-23T04:34:36Z

deepmd/tf/loss/ener.py

@@ -370,7 +370,9 @@ def build(
            )
        if self.has_v:
            if not self.use_huber:
-                loss += global_cvt_2_ener_float(atom_norm * (pref_v * l2_virial_loss))
+                loss += global_cvt_2_ener_float(
+                    atom_norm**2 * (pref_v * l2_virial_loss)
+                )


This updates EnerStdLoss’ non-Huber MSE scaling to use atom_norm2 / atom_norm_ener2, but the same file still has other non-Huber MSE loss accumulations using a linear atom_norm factor (e.g., EnerSpinLoss build around l2_loss += atom_norm_ener * ... and virial term with atom_norm * ..., and EnerDipoleLoss build around l2_loss += atom_norm_ener * ...). That leaves inconsistent normalization depending on which loss class is used, and also conflicts with the per-atom RMSE reporting (rmse_e computed as sqrt(l2_ener_loss)/natoms). Please update those remaining energy/virial MSE paths to square the atom normalization as well, or clarify why those classes should behave differently.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

deepmd/dpmodel/loss/ener.py (1)

248-248: Add a regression test for the new normalization rule.

Because the displayed RMSE stays unchanged, this 1/N vs 1/N^2 mistake is easy to reintroduce. A tiny EnergyLoss.call() case with natoms > 1 and fixed energy/virial deltas would lock it down.

Also applies to: 315-315

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/loss/ener.py` at line 248, The energy normalization is
currently prone to a 1/N vs 1/N^2 regression; add a unit/regression test
targeting EnergyLoss.call() in deepmd/dpmodel/loss/ener.py that constructs a
small batch with natoms > 1, fixed per-atom energy deltas and virials, and
asserts the computed energy loss scales with 1/N (not 1/N^2) by comparing
expected loss values for different natom counts; reference EnergyLoss.call() and
the atom_norm_ener accumulation (the expression involving atom_norm_ener**2 *
(pref_e * l2_ener_loss)) to ensure the test fails if the normalization is
squared incorrectly and include an explicit check for both energy-only and
energy+virial modes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/tf/loss/ener.py`:
- Line 349: EnerSpinLoss.build and EnerDipoleLoss.build still use the old
single-factor 1/N normalization; update both to use the same squared per-atom
normalization pattern used in EnerStdLoss (i.e., multiply the loss term by
atom_norm_ener**2 * (pref_e * l2_ener_loss) instead of a single atom_norm_ener
factor) so all energy loss variants apply the corrected normalization
consistently; locate the energy-loss accumulation in EnerSpinLoss.build and
EnerDipoleLoss.build and replace the old single-factor scaling with the squared
atom_norm_ener**2 * (pref_e * l2_ener_loss) pattern used in EnerStdLoss.

---

Nitpick comments:
In `@deepmd/dpmodel/loss/ener.py`:
- Line 248: The energy normalization is currently prone to a 1/N vs 1/N^2
regression; add a unit/regression test targeting EnergyLoss.call() in
deepmd/dpmodel/loss/ener.py that constructs a small batch with natoms > 1, fixed
per-atom energy deltas and virials, and asserts the computed energy loss scales
with 1/N (not 1/N^2) by comparing expected loss values for different natom
counts; reference EnergyLoss.call() and the atom_norm_ener accumulation (the
expression involving atom_norm_ener**2 * (pref_e * l2_ener_loss)) to ensure the
test fails if the normalization is squared incorrectly and include an explicit
check for both energy-only and energy+virial modes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a73a40cd-4fda-4353-8c33-d0685030359b

📥 Commits

Reviewing files that changed from the base of the PR and between b97ad98 and 247f053.

📒 Files selected for processing (4)

deepmd/dpmodel/loss/ener.py
deepmd/pd/loss/ener.py
deepmd/pt/loss/ener.py
deepmd/tf/loss/ener.py

coderabbitai · 2026-03-23T04:39:17Z

deepmd/tf/loss/ener.py

        if self.has_e:
            if not self.use_huber:
-                loss += atom_norm_ener * (pref_e * l2_ener_loss)
+                loss += atom_norm_ener**2 * (pref_e * l2_ener_loss)


⚠️ Potential issue | 🟠 Major

Finish the TensorFlow normalization fix in the remaining loss variants.

EnerStdLoss is corrected here, but EnerSpinLoss.build() still uses the old single-factor normalization at Line 757 and Line 770, and EnerDipoleLoss.build() still does the same at Line 1039. Those paths will keep the old 1/N scaling after this PR.

Minimal follow-up diff

- if self.has_e: - l2_loss += atom_norm_ener * (pref_e * l2_ener_loss) + if self.has_e: + l2_loss += atom_norm_ener**2 * (pref_e * l2_ener_loss) @@ - if self.has_v: - l2_loss += global_cvt_2_ener_float(atom_norm * (pref_v * l2_virial_loss)) + if self.has_v: + l2_loss += global_cvt_2_ener_float(atom_norm**2 * (pref_v * l2_virial_loss)) @@ - l2_loss += atom_norm_ener * (pref_e * l2_ener_loss) + l2_loss += atom_norm_ener**2 * (pref_e * l2_ener_loss)

Also applies to: 373-375

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@deepmd/tf/loss/ener.py` at line 349, EnerSpinLoss.build and EnerDipoleLoss.build still use the old single-factor 1/N normalization; update both to use the same squared per-atom normalization pattern used in EnerStdLoss (i.e., multiply the loss term by atom_norm_ener**2 * (pref_e * l2_ener_loss) instead of a single atom_norm_ener factor) so all energy loss variants apply the corrected normalization consistently; locate the energy-loss accumulation in EnerSpinLoss.build and EnerDipoleLoss.build and replace the old single-factor scaling with the squared atom_norm_ener**2 * (pref_e * l2_ener_loss) pattern used in EnerStdLoss.

codecov · 2026-03-23T05:06:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.37%. Comparing base (b97ad98) to head (247f053).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #5332   +/-   ##
=======================================
  Coverage   82.37%   82.37%           
=======================================
  Files         780      780           
  Lines       78242    78242           
  Branches     3675     3675           
=======================================
  Hits        64454    64454           
  Misses      12616    12616           
  Partials     1172     1172

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

fix: Square atom_norm in non-Huber energy and virial loss calculations.

f261151

Copilot AI review requested due to automatic review settings March 23, 2026 04:27

github-actions bot added the Python label Mar 23, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

247f053

for more information, see https://pre-commit.ci

dosubot bot added the bug label Mar 23, 2026

Copilot started reviewing on behalf of anyangml March 23, 2026 04:28 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

deepmodeling deleted a comment Mar 23, 2026

anyangml requested a review from iProzd March 23, 2026 07:29

anyangml marked this pull request as draft March 23, 2026 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Square atom_norm in non-Huber energy and virial loss calculations.#5332

Fix: Square atom_norm in non-Huber energy and virial loss calculations.#5332
anyangml wants to merge 2 commits intodeepmodeling:masterfrom
anyangml:fix/rmse-loss-normalization

anyangml commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 23, 2026

Uh oh!

codecov bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anyangml commented Mar 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Bug Fixes

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anyangml commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

codecov bot commented Mar 23, 2026 •

edited

Loading