Skip to content

D0708 dpa3 default fparam#5340

Open
liuyuxiang92 wants to merge 27 commits intodeepmodeling:masterfrom
liuyuxiang92:D0708_dpa3_default_fparam
Open

D0708 dpa3 default fparam#5340
liuyuxiang92 wants to merge 27 commits intodeepmodeling:masterfrom
liuyuxiang92:D0708_dpa3_default_fparam

Conversation

@liuyuxiang92
Copy link

@liuyuxiang92 liuyuxiang92 commented Mar 25, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for default frame parameters in fitting modules.
    • Introduced weighted force metrics for force accuracy evaluation.
    • Added new WSD learning rate scheduler for training.
    • Implemented edge-based readout fitting module for energy predictions.
    • Extended DPA3 with optional charge/spin and case embeddings.
    • Added frame-averaging for property loss computation.
  • Improvements

    • Enhanced descriptor interface with norm factor and additional output methods.
    • Improved statistics handling with file persistence for fitting parameters.
  • Bug Fixes

    • Fixed HDF5 file locking behavior.

iProzd and others added 27 commits July 8, 2025 16:15
      "numb_fparam": 2,
      "default_fparam": [0.0, 1.0],
* Add fparam/aparam stat

* Add fparam/aparam stat in share_fitting

* Add fparam default value if default_fparam is not None

* Add model_prob in share_fitting_params

* Add protection when share_params of fitting net
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- New Features
- Added per-atom weighting for force evaluation: computes and reports
weighted MAE/RMSE alongside unweighted metrics, includes weighted
metrics in system-average summaries, logs weighted force metrics, and
safely handles zero-weight cases. Also propagates the per-atom weight
field into reporting.

- Tests
- Added end-to-end tests validating weighted vs unweighted force
MAE/RMSE and verifying evaluator outputs when using per-atom weight
masks.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 25, 2026 09:26
@dosubot dosubot bot added the enhancement label Mar 25, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends DeePMD-kit’s DPA3/PyTorch training & inference stack with support for default fparam (so fparam.npy can be omitted), adds force weighted error reporting using atom_pref, and introduces new learning-rate/descriptor options (e.g., wsd, gated MLP/layernorm knobs).

Changes:

  • Add default_fparam plumbing across fitting nets / atomic models / arg schema, and make fparam optional when a default is provided.
  • Add atom_pref to dp test data requirements and compute weighted force MAE/RMSE.
  • Add new LR schedule type (wsd) and new DPA3/repflow options (gated MLP, layernorm, extra outputs for fitting).

Reviewed changes

Copilot reviewed 49 out of 49 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
source/tests/pt/test_dp_test.py Adds a PT test validating force-weighted metrics via atom_pref.
deepmd/utils/path.py Changes HDF5 open behavior by passing locking=False.
deepmd/utils/env_mat_stat.py Adds scalar multiplication to StatItem for weighted stat aggregation.
deepmd/utils/argcheck.py Adds new config args: default_fparam, learning rate wsd, DPA3 options, etc.
deepmd/pt/utils/stat.py Adjusts statistics input packing when fparam is “missing by design”.
deepmd/pt/utils/learning_rate.py Exposes LearningRateWSD in PT utils.
deepmd/pt/train/wrapper.py Extends multitask sharing to pass model probabilities and stat protection.
deepmd/pt/train/training.py Adds LR type switch, default-fparam requirements, and multitask prob logic changes.
deepmd/pt/model/task/property.py Threads default_fparam and bumps serialization version.
deepmd/pt/model/task/polarizability.py Threads default_fparam and bumps serialization version.
deepmd/pt/model/task/invar_fitting.py Extends forward signature to accept optional extra inputs.
deepmd/pt/model/task/fitting.py Adds default-fparam handling, stat caching to files, and prob-weighted sharing.
deepmd/pt/model/task/ener.py Adds ener_readout fitting type with edge readout path.
deepmd/pt/model/task/dos.py Threads default_fparam and bumps serialization version.
deepmd/pt/model/task/dipole.py Threads default_fparam and bumps serialization version.
deepmd/pt/model/network/mlp.py Introduces GatedMLP and a normalization helper.
deepmd/pt/model/model/make_model.py Exposes has_default_fparam() (and getter) on scripted model wrapper.
deepmd/pt/model/model/init.py Wires ener_readout, descriptor dim_case_embd, and normalization factors.
deepmd/pt/model/descriptor/repflows.py Adds extra outputs for fitting and norm factor accessors.
deepmd/pt/model/descriptor/repflow_layer.py Adds gated MLP + optional layernorm path for updates.
deepmd/pt/model/descriptor/dpa3.py Adds charge/spin/case embedding options and forwards fparam/case_embd.
deepmd/pt/model/descriptor/dpa1.py Adds get_norm_fact() / get_additional_output_for_fitting() stubs.
deepmd/pt/model/atomic_model/dp_atomic_model.py Uses default fparam when missing; forwards descriptor extra outputs to fitting.
deepmd/pt/model/atomic_model/base_atomic_model.py Adds base has_default_fparam() hook.
deepmd/pt/loss/property.py Adds softmax batch-averaging logic and enforces batch size 3.
deepmd/pt/loss/ener.py Adds force-weight knobs (use_default_pf, f_use_norm, use_l1_all changes).
deepmd/pt/infer/deep_eval.py Adds has_default_fparam() with backward compatibility handling.
deepmd/pd/model/task/invar_fitting.py Bumps deserialize compatibility version.
deepmd/pd/model/task/fitting.py Adds default_fparam to Paddle fitting serialization (not supported).
deepmd/pd/model/task/ener.py Bumps deserialize compatibility version.
deepmd/jax/fitting/fitting.py Allows setting default_fparam_tensor when converting to JAX arrays.
deepmd/infer/deep_property.py Adds frame softmax-averaging logic in inference path.
deepmd/infer/deep_eval.py Adds has_default_fparam() to inference API and forwards in wrapper.
deepmd/entrypoints/test.py Adds atom_pref support and weighted force MAE/RMSE reporting; makes fparam optional with default.
deepmd/dpmodel/utils/learning_rate.py Implements LearningRateWSD.
deepmd/dpmodel/model/make_model.py Adds has_default_fparam() forwarding on dpmodel wrapper.
deepmd/dpmodel/infer/deep_eval.py Adds has_default_fparam() forwarding for dpmodel inference.
deepmd/dpmodel/fitting/property_fitting.py Threads default_fparam, adds output_def, bumps version.
deepmd/dpmodel/fitting/polarizability_fitting.py Threads default_fparam, bumps version.
deepmd/dpmodel/fitting/make_base_fitting.py Adds need_additional_input() default implementation.
deepmd/dpmodel/fitting/invar_fitting.py Threads default_fparam, bumps version.
deepmd/dpmodel/fitting/general_fitting.py Adds default-fparam support + serialization of it; handles fparam None with default.
deepmd/dpmodel/fitting/ener_fitting.py Threads default_fparam, bumps version.
deepmd/dpmodel/fitting/dos_fitting.py Threads default_fparam, bumps version.
deepmd/dpmodel/fitting/dipole_fitting.py Threads default_fparam, bumps version.
deepmd/dpmodel/descriptor/make_base_descriptor.py Adds required descriptor APIs for norm factors / extra outputs.
deepmd/dpmodel/descriptor/dpa3.py Adds new repflow args to dpmodel DPA3 descriptor config.
deepmd/dpmodel/atomic_model/dp_atomic_model.py Adds has_default_fparam() forwarding.
deepmd/dpmodel/atomic_model/base_atomic_model.py Adds base has_default_fparam() hook.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +142 to +145
# --- softmax-weighted averaging over frames (minimal) ---
print(f"Nframes == {nframes}")
if nframes != 3:
raise RuntimeError(f"Expected nframes == 3, got {nframes}")
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This debug print(...) and the hard requirement nframes == 3 introduce unexpected stdout noise and will fail for normal inference where nframes can be any value. If frame-aggregation is desired, it should be optional (e.g., a method parameter) and should not enforce a fixed frame count.

Copilot uses AI. Check for mistakes.
Comment on lines +146 to +150
scores = property.mean(axis=1) # (3,)
# If you want to favor *smaller* values (e.g., energies), use: scores = -scores
w = np.exp(scores - scores.max()); w /= w.sum() # (3,)
avg = (w[:, None] * property).sum(axis=0, keepdims=True) # (1, D)
property[:] = np.repeat(avg, nframes, axis=0) # (3, D)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aggregation block uses the local name property before it is assigned (it’s only defined at line 153). This will raise UnboundLocalError at runtime. If you intended to aggregate the reduced property output, compute property = results[...] first (and base scores/w/avg on that), or base the aggregation on atomic_property explicitly.

Copilot uses AI. Check for mistakes.
Comment on lines +350 to +354
def find_normalization(name: str, dim: int | None = None) -> nn.Module | None:
"""Return an normalization function using name."""
if name is None:
return None
return {
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_normalization uses the PEP 604 union syntax (int | None, nn.Module | None), which is invalid on Python 3.9. This repository supports Python >=3.9 (pyproject.toml), so this will cause a SyntaxError at import time. Use Optional[int] / Optional[nn.Module] (or Union[...]) instead.

Copilot uses AI. Check for mistakes.
Comment on lines +365 to +369
def serialize(self) -> dict:
raise NotImplementedError

@classmethod
def deserialize(cls, data: dict) -> "EnergyFittingNetReadout":
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnergyFittingNetReadout is registered as a fitting type (ener_readout), but serialize() / deserialize() currently raise NotImplementedError. This will break model serialization flows (e.g., DPAtomicModel.serialize() stores fitting_net.serialize()). Either implement serialization for this fitting, or don’t register/expose it until serialization is supported.

Copilot uses AI. Check for mistakes.
fparam
The frame-level parameters. shape: nf x nfparam
case_embd
The case (dataset) embedding for multitask training with shared fitting. shape: nf x dim_case_embd
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says case_embd has shape nf x dim_case_embd, but the implementation later adds case_embd_out.unsqueeze(0).unsqueeze(0) to node_ebd_ext (nf x nall x tebd_dim). This only broadcasts correctly if case_embd is a single 1D embedding (dim_case_embd,), not per-frame. Please either (1) update the docstring/validation to match the supported shape, or (2) update the broadcasting logic to correctly handle per-frame embeddings.

Suggested change
The case (dataset) embedding for multitask training with shared fitting. shape: nf x dim_case_embd
The case (dataset) embedding for multitask training with shared fitting. shape: dim_case_embd

Copilot uses AI. Check for mistakes.
)
else:
l1_pref_force_loss = (torch.abs(diff_f) * atom_pref_reshape).mean()
more_loss["mae_f"] = self.display_if_exist(
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the use_l1_all + atom_pref branch, this assigns more_loss["mae_f"] to the preference-weighted force error, overwriting the unweighted mae_f reported earlier in the force-loss block. This makes logs/metrics ambiguous. Consider reporting this under a distinct key (e.g. mae_pf/mae_fw) and keep mae_f for the unweighted force MAE.

Suggested change
more_loss["mae_f"] = self.display_if_exist(
more_loss["mae_pf"] = self.display_if_exist(

Copilot uses AI. Check for mistakes.
Comment on lines 329 to +332
# this method has cache to avoid duplicated
# loading from different DPH5Path
# However the file will be never closed?
return h5py.File(path, mode)
return h5py.File(path, mode, locking=False)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing locking=False to h5py.File may not be supported on all h5py versions that this project allows (dependencies includes an unpinned h5py). This can raise TypeError: __init__() got an unexpected keyword argument 'locking' at runtime. Consider guarding with a try/except fallback to h5py.File(path, mode) or pinning a minimum h5py version that supports the argument.

Copilot uses AI. Check for mistakes.
Comment on lines +95 to 115
# ---- Softmax-weighted averaging over the batch added by YL----
# model_pred[var_name]: (nbz, task_dim)
# 1) get a scalar score per sample (mean over task_dim)
# (If you want to favor smaller values, use `score_per_sample = -model_pred[var_name].mean(dim=1)`.)
score_per_sample = model_pred[var_name].mean(dim=1) # (nbz,)
weights = F.softmax(score_per_sample, dim=0) # (nbz,)
# 2) weighted average vector (1, task_dim)
avg_vec = (weights.unsqueeze(1) * model_pred[var_name]).sum(dim=0, keepdim=True)
# 3) replace all predictions with the averaged vector (broadcast over batch)
model_pred[var_name] = avg_vec.expand_as(model_pred[var_name])
# ----------------------------------------------------

nbz = model_pred[var_name].shape[0]
#=======Raise error when nbz!=3=======
if nbz != 3:
raise RuntimeError(
f"[PropertyLoss] Expected batch size nbz == 3 for softmax-avg, got nbz == {nbz}. "
"Ensure your DataLoader yields triples (batch_size=3, drop_last=True)."
)

nbz = model_pred[var_name].shape[0]
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new softmax-weighted averaging mutates model_pred[var_name] based on the batch itself, changing the training objective and making the loss depend on other samples in the batch. If this behavior is intended, it should be behind an explicit configuration flag and documented; otherwise it should be removed so the loss evaluates predictions per-sample without cross-sample coupling.

Suggested change
# ---- Softmax-weighted averaging over the batch added by YL----
# model_pred[var_name]: (nbz, task_dim)
# 1) get a scalar score per sample (mean over task_dim)
# (If you want to favor smaller values, use `score_per_sample = -model_pred[var_name].mean(dim=1)`.)
score_per_sample = model_pred[var_name].mean(dim=1) # (nbz,)
weights = F.softmax(score_per_sample, dim=0) # (nbz,)
# 2) weighted average vector (1, task_dim)
avg_vec = (weights.unsqueeze(1) * model_pred[var_name]).sum(dim=0, keepdim=True)
# 3) replace all predictions with the averaged vector (broadcast over batch)
model_pred[var_name] = avg_vec.expand_as(model_pred[var_name])
# ----------------------------------------------------
nbz = model_pred[var_name].shape[0]
#=======Raise error when nbz!=3=======
if nbz != 3:
raise RuntimeError(
f"[PropertyLoss] Expected batch size nbz == 3 for softmax-avg, got nbz == {nbz}. "
"Ensure your DataLoader yields triples (batch_size=3, drop_last=True)."
)
nbz = model_pred[var_name].shape[0]
nbz = model_pred[var_name].shape[0]

Copilot uses AI. Check for mistakes.
Comment on lines +108 to 115
#=======Raise error when nbz!=3=======
if nbz != 3:
raise RuntimeError(
f"[PropertyLoss] Expected batch size nbz == 3 for softmax-avg, got nbz == {nbz}. "
"Ensure your DataLoader yields triples (batch_size=3, drop_last=True)."
)

nbz = model_pred[var_name].shape[0]
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coding nbz == 3 and raising a RuntimeError will break standard training/inference setups where batch size is configurable (and may be 1 for validation). If a special "triplet" batching mode is needed, validate it at DataLoader construction time or make it optional; otherwise remove this constraint.

Suggested change
#=======Raise error when nbz!=3=======
if nbz != 3:
raise RuntimeError(
f"[PropertyLoss] Expected batch size nbz == 3 for softmax-avg, got nbz == {nbz}. "
"Ensure your DataLoader yields triples (batch_size=3, drop_last=True)."
)
nbz = model_pred[var_name].shape[0]

Copilot uses AI. Check for mistakes.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

This PR adds default frame parameter (fparam) support across the model stack, enhances DPA3 descriptor with charge/spin and case embeddings, introduces a new piecewise learning rate scheduler, implements gated MLP layers, adds per-atom force weighting metrics, and updates fitting/descriptor APIs with new configuration options and serialization formats.

Changes

Cohort / File(s) Summary
Default Frame Parameter Support (Base Classes)
deepmd/dpmodel/atomic_model/base_atomic_model.py, deepmd/pt/model/atomic_model/base_atomic_model.py
Added has_default_fparam() method returning False to base atomic models, establishing API for downstream query of default fparam availability.
Default Frame Parameter Support (Implementations)
deepmd/dpmodel/atomic_model/dp_atomic_model.py, deepmd/pt/model/atomic_model/dp_atomic_model.py
Added has_default_fparam() and get_default_fparam() methods delegating to fitting nets; PT version also updated forward_atomic to auto-supply fparam when missing and handle case embeddings.
Default Frame Parameter Support (Inference/Evaluation)
deepmd/dpmodel/infer/deep_eval.py, deepmd/infer/deep_eval.py, deepmd/pt/infer/deep_eval.py
Added has_default_fparam() method to evaluation wrappers to query underlying model capability.
Default Frame Parameter Support (Model Wrapper)
deepmd/dpmodel/model/make_model.py, deepmd/pt/model/model/make_model.py
Added has_default_fparam() and get_default_fparam() methods to composite model classes.
Fitting Infrastructure (General)
deepmd/dpmodel/fitting/general_fitting.py, deepmd/pt/model/task/fitting.py, deepmd/pd/model/task/fitting.py
Added default_fparam parameter storage, has_default_fparam() accessor, and logic to use default fparam when input fparam is None; bumped serialization version from 3 to 4; extended share_params with model probability and protection parameters; added file-based stats persistence methods.
Fitting Infrastructure (Base/Invar)
deepmd/dpmodel/fitting/invar_fitting.py, deepmd/pt/model/task/invar_fitting.py, deepmd/pd/model/task/invar_fitting.py
Added default_fparam parameter to constructor; updated deserialization version compatibility from 3.1 to 4.1; added optional sw and edge_index parameters to forward.
Fitting Infrastructure (Base Definition)
deepmd/dpmodel/fitting/make_base_fitting.py
Added need_additional_input() method returning False to base fitting class interface.
Energy Fitting
deepmd/dpmodel/fitting/ener_fitting.py, deepmd/pt/model/task/ener.py, deepmd/pd/model/task/ener.py
Added default_fparam parameter; updated deserialization version from 3.1 to 4.1; PT version added new EnergyFittingNetReadout class with edge-based readout and norm_fact support.
DOS/Dipole/Polarizability/Property Fitting
deepmd/dpmodel/fitting/dos_fitting.py, deepmd/dpmodel/fitting/dipole_fitting.py, deepmd/dpmodel/fitting/polarizability_fitting.py, deepmd/pt/model/task/dos.py, deepmd/pt/model/task/dipole.py, deepmd/pt/model/task/polarizability.py, deepmd/pt/model/task/property.py
Added default_fparam parameter to constructors; updated deserialization version compatibility (DOS/Dipole 3.1→4.1, Polar/Property 4.1→5.1); property fitting now includes output_def() method.
Descriptor Base/Utilities
deepmd/dpmodel/descriptor/make_base_descriptor.py
Added abstract methods get_norm_fact() and get_additional_output_for_fitting() to base descriptor interface.
DPA3 Descriptor Configuration
deepmd/dpmodel/descriptor/dpa3.py
Updated RepFlowArgs constructor to accept update_use_layernorm, use_gated_mlp, and gated_mlp_norm parameters.
DPA1 Descriptor
deepmd/pt/model/descriptor/dpa1.py
Added get_norm_fact() and get_additional_output_for_fitting() methods to expose descriptor interface requirements.
DPA3 Descriptor (PT)
deepmd/pt/model/descriptor/dpa3.py
Added charge/spin and case embedding support via new constructor parameters; added get_norm_fact() and get_additional_output_for_fitting() methods; updated forward to conditionally augment embeddings from fparam and case_embd inputs.
RepFlow Layer
deepmd/pt/model/descriptor/repflow_layer.py
Added gated MLP support, layer normalization support, and new configuration parameters (update_use_layernorm, use_gated_mlp, gated_mlp_norm).
RepFlows
deepmd/pt/model/descriptor/repflows.py
Added dynamic selection factors, get_additional_output_for_fitting() method, and get_norm_fact() method; tracks additional outputs (edge_index) for fitting use.
Gated MLP Module
deepmd/pt/model/network/mlp.py
Added new GatedMLP class implementing gated linear units via two parallel MLP paths; added find_normalization() helper for batch/layer normalization selection.
Model Initialization
deepmd/pt/model/model/__init__.py
Updated model construction to recognize ener_readout fitting type and propagate embedding_width, norm_fact, and dim_case_embd parameters.
Learning Rate Scheduling
deepmd/dpmodel/utils/learning_rate.py, deepmd/pt/utils/learning_rate.py
Added LearningRateWSD class for piecewise linear decay schedule with configurable start/stop rates; re-exported from PT utilities.
Loss Functions
deepmd/pt/loss/ener.py
Extended EnergyStdLoss with force weighting (use_default_pf, f_use_norm) and L1 loss variants (use_l1_all); refactored virial loss computation; broadened force label requirement logic.
Property Loss
deepmd/pt/loss/property.py
Added softmax-normalized frame weighting to average property predictions across batch dimension; enforces batch size of 3.
Deep Property Inference
deepmd/infer/deep_property.py
Added frame averaging via softmax weighting of per-frame mean scores; enforces batch size of 3.
JAX Fitting Utilities
deepmd/jax/fitting/fitting.py
Updated attribute conversion to include default_fparam_tensor in array conversion/wrapping path.
Statistics and Path Utilities
deepmd/utils/env_mat_stat.py, deepmd/utils/path.py
Added StatItem.__mul__() for scalar multiplication; disabled HDF5 locking in file operations.
Training Framework
deepmd/pt/train/training.py
Added LearningRateWSD dispatch in learning rate initialization; modified fparam data requirement logic to respect has_default_fparam(); refactored multi-task model probability and shared parameter linking with stat protection support; enhanced fine-tuning checkpoint recovery.
Model Wrapper
deepmd/pt/train/wrapper.py
Extended share_params signature to accept model_key_prob_map and data_stat_protect for weighted parameter linking across multi-task branches.
Statistics Utilities
deepmd/pt/utils/stat.py
Updated make_stat_input to filter out fparam entries when find_fparam == 0.0.
Test Entrypoint
deepmd/entrypoints/test.py
Added per-atom force weighting evaluation via atom_pref and find_atom_pref; made fparam requirement conditional on has_default_fparam(); updated force error metrics to use diff_f.size denominator; added weighted MAE/RMSE reporting.
Configuration Arguments
deepmd/utils/argcheck.py
Added DPA3 embedding options (add_chg_spin_ebd, add_case_embd); added repflow configuration options; added default_fparam to fitting nets; added LearningRateWSD variant; added energy loss flags (use_default_pf, f_use_norm, use_l1_all); registered new ener_readout fitting type.
Testing
source/tests/pt/test_dp_test.py
Added new test class TestDPTestForceWeight validating per-atom weighted force error computation against unweighted baseline.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • PR #5295: Modifies DPA3 descriptor and atomic-model APIs for frame parameter handling—directly related code-level changes.
  • PR #5270: Implements default frame-parameter handling across base atomic models and sampling wrappers—parallel feature implementation.
  • PR #5038: Extends frame-parameter/statistics APIs with persistence and default fparam querying—overlapping feature scope.

Suggested labels

Python, new feature, enhancement

Suggested reviewers

  • iProzd
  • njzjz
  • wanghan-iapcm
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.26% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'D0708 dpa3 default fparam' refers to the core feature: adding default frame parameter (fparam) support to DPA3 descriptor and related fitting components.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch D0708_dpa3_default_fparam

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
deepmd/pd/model/task/invar_fitting.py (1)

87-134: ⚠️ Potential issue | 🟠 Major

Missing default_fparam parameter in Paddle backend __init__ signature.

The __init__ method lacks an explicit default_fparam parameter, creating an inconsistency with the PyTorch backend (deepmd/pt/model/task/invar_fitting.py line 110) which includes default_fparam: Optional[list[float]] = None. While **kwargs at line 108 allows the parent class to receive default_fparam if passed, the parameter should be explicit in the signature for consistency, clarity, and proper type checking.

Add default_fparam to the signature and forward it to the parent class:

Proposed fix
 def __init__(
     self,
     var_name: str,
     ntypes: int,
     dim_descrpt: int,
     dim_out: int,
     neuron: list[int] = [128, 128, 128],
     bias_atom_e: Optional[paddle.Tensor] = None,
     resnet_dt: bool = True,
     numb_fparam: int = 0,
     numb_aparam: int = 0,
     dim_case_embd: int = 0,
     activation_function: str = "tanh",
     precision: str = DEFAULT_PRECISION,
     mixed_types: bool = True,
     rcond: Optional[float] = None,
     seed: Optional[Union[int, list[int]]] = None,
     exclude_types: list[int] = [],
     atom_ener: Optional[list[Optional[paddle.Tensor]]] = None,
     type_map: Optional[list[str]] = None,
     use_aparam_as_mask: bool = False,
+    default_fparam: Optional[list[float]] = None,
     **kwargs,
 ):

Then forward to super().__init__() as default_fparam=default_fparam,.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pd/model/task/invar_fitting.py` around lines 87 - 134, Add an explicit
default_fparam parameter to the Paddle backend __init__ signature (e.g.,
default_fparam: Optional[list[float]] = None) in
deepmd/pd/model/task/invar_fitting.py and forward it into the parent constructor
call by adding default_fparam=default_fparam to the super().__init__(...)
invocation; this makes the signature consistent with the PyTorch backend (the
__init__ method and its super().__init__ call are the unique identifiers to
update).
deepmd/entrypoints/test.py (2)

705-712: ⚠️ Potential issue | 🟡 Minor

Missing find_fparam check before using fparam data.

In test_ener, the fparam handling was updated to check test_data["find_fparam"] != 0.0 (line 343) before using fparam. However, test_dos still unconditionally accesses test_data["fparam"] when get_dim_fparam() > 0.

When has_default_fparam() is True and fparam.npy is absent, test_data["fparam"] contains zeros, which will be passed to the model instead of allowing the model to use its default frame parameters.

Proposed fix
-    if dp.get_dim_fparam() > 0:
-        fparam = test_data["fparam"][:numb_test]
-    else:
+    if dp.get_dim_fparam() > 0 and test_data.get("find_fparam", 0) != 0.0:
+        fparam = test_data["fparam"][:numb_test]
+    else:
         fparam = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/entrypoints/test.py` around lines 705 - 712, In test_dos, avoid
unconditionally reading test_data["fparam"] when dp.get_dim_fparam() > 0;
instead mirror test_ener's logic and first check test_data["find_fparam"] != 0.0
(and/or dp.has_default_fparam()) before slicing and passing fparam into the
model. Update the fparam assignment in test_dos so that fparam is set to None
when find_fparam is zero (or when the dp indicates default fparam should be
used), otherwise slice test_data["fparam"][:numb_test]; reference
variables/functions: test_dos, fparam, test_data["find_fparam"],
dp.get_dim_fparam(), dp.has_default_fparam().

877-884: ⚠️ Potential issue | 🟡 Minor

Missing find_fparam check before using fparam data.

Same issue as in test_dos: the find_fparam check added to test_ener is missing here.

Proposed fix
-    if dp.get_dim_fparam() > 0:
-        fparam = test_data["fparam"][:numb_test]
-    else:
+    if dp.get_dim_fparam() > 0 and test_data.get("find_fparam", 0) != 0.0:
+        fparam = test_data["fparam"][:numb_test]
+    else:
         fparam = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/entrypoints/test.py` around lines 877 - 884, The snippet uses
test_data["fparam"] without checking whether fparam exists for this dataset; add
the same guard used in other tests by checking dp.find_fparam() (in addition to
dp.get_dim_fparam() > 0) before assigning fparam so that fparam =
test_data["fparam"][:numb_test] only when dp.find_fparam() is true, otherwise
set fparam = None; keep the existing aparam logic unchanged and update any
downstream uses accordingly (refer to dp.get_dim_fparam, dp.find_fparam, fparam,
aparam, and test_data to locate the code).
deepmd/pt/model/descriptor/repflow_layer.py (2)

1223-1263: ⚠️ Potential issue | 🟠 Major

Missing serialization of new parameters update_use_layernorm, use_gated_mlp, and gated_mlp_norm.

The serialize() method doesn't include the new constructor parameters:

  • update_use_layernorm
  • use_gated_mlp
  • gated_mlp_norm

This will cause deserialization to fail or use incorrect defaults when loading a saved model that was trained with these options enabled.

🔧 Proposed fix to add serialization
             "use_dynamic_sel": self.use_dynamic_sel,
             "sel_reduce_factor": self.sel_reduce_factor,
+            "update_use_layernorm": self.update_use_layernorm,
+            "use_gated_mlp": self.use_gated_mlp,
+            "gated_mlp_norm": self.gated_mlp_norm,
             "node_self_mlp": self.node_self_mlp.serialize(),

Note: The @version should also be bumped to indicate the schema change.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/model/descriptor/repflow_layer.py` around lines 1223 - 1263, The
serialize() method of RepFlowLayer is missing the new fields
update_use_layernorm, use_gated_mlp, and gated_mlp_norm; update serialize() to
include these keys (e.g. "update_use_layernorm": self.update_use_layernorm,
"use_gated_mlp": self.use_gated_mlp, "gated_mlp_norm": self.gated_mlp_norm) so
deserialization preserves their values, and bump the "@version" (e.g. increment
to 3) to reflect the schema change; ensure these attribute names match the
constructor/instance attributes used elsewhere in the class.

1322-1338: ⚠️ Potential issue | 🔴 Critical

Serialization fails and deserialization type dispatch is missing for GatedMLP.

When use_gated_mlp=True, node_edge_linear and edge_angle_linear1 are GatedMLP instances. However:

  1. GatedMLP lacks serialize() and deserialize() methods, so serialization (lines 1261, 1267) will raise AttributeError.
  2. Deserialization (lines 1324, 1331) unconditionally calls MLPLayer.deserialize(), failing to restore the correct module type.

Implement serialize() and deserialize() methods on GatedMLP, then update deserialization to dispatch based on use_gated_mlp.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/model/descriptor/repflow_layer.py` around lines 1322 - 1338, Add
proper serialization support and type-dispatch for GatedMLP: implement
serialize() and classmethod deserialize() on the GatedMLP class so it returns a
dict with a unique type tag and its params, and update the deserialization sites
that currently call MLPLayer.deserialize() (specifically restoring
node_edge_linear and edge_angle_linear1/edge_angle_linear2 and any a_compress_*
when use_gated_mlp=True) to inspect the saved type tag (or the flag
use_gated_mlp) and call GatedMLP.deserialize(...) when the tag/flag indicates a
gated MLP, otherwise call MLPLayer.deserialize(...); ensure the produced dict
format from GatedMLP.serialize matches what GatedMLP.deserialize expects so
round-trip works without raising AttributeError.
deepmd/pt/model/descriptor/repflows.py (1)

258-269: ⚠️ Potential issue | 🟡 Minor

Move the sel_reduce_factor validation before the division.

dynamic_e_sel / dynamic_a_sel are computed before the <= 0 guard. With sel_reduce_factor=0, this fails with ZeroDivisionError instead of the intended ValueError.

Suggested fix
         self.use_exp_switch = use_exp_switch
         self.use_dynamic_sel = use_dynamic_sel
         self.sel_reduce_factor = sel_reduce_factor
+        if self.sel_reduce_factor <= 0:
+            raise ValueError(
+                f"`sel_reduce_factor` must be > 0, got {self.sel_reduce_factor}"
+            )
         self.dynamic_e_sel = self.nnei / self.sel_reduce_factor
         self.dynamic_a_sel = self.a_sel / self.sel_reduce_factor
         if self.use_dynamic_sel and not self.smooth_edge_update:
             raise NotImplementedError(
                 "smooth_edge_update must be True when use_dynamic_sel is True!"
             )
-        if self.sel_reduce_factor <= 0:
-            raise ValueError(
-                f"`sel_reduce_factor` must be > 0, got {self.sel_reduce_factor}"
-            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/model/descriptor/repflows.py` around lines 258 - 269, Move the
check that validates sel_reduce_factor (>0) to before any use of it so
dynamic_e_sel and dynamic_a_sel aren't computed when sel_reduce_factor is zero;
specifically, in the initializer validate self.sel_reduce_factor and raise the
ValueError if <=0 before computing self.dynamic_e_sel = self.nnei /
self.sel_reduce_factor and self.dynamic_a_sel = self.a_sel /
self.sel_reduce_factor, while keeping the existing check that if
self.use_dynamic_sel and not self.smooth_edge_update raise NotImplementedError.
🧹 Nitpick comments (14)
deepmd/pt/model/task/fitting.py (1)

746-751: Verify assertion behavior when default_fparam is not configured.

The assertion at line 750 will fail at runtime if numb_fparam > 0, fparam is None, and default_fparam_tensor is also None. This enforces that users must either provide fparam data or configure default_fparam when numb_fparam > 0.

This is likely intentional, but the error message from a bare assert may not be helpful to users. Consider a more descriptive error:

♻️ Suggested improvement for clearer error message
         if self.numb_fparam > 0 and fparam is None:
             # use default fparam
-            assert self.default_fparam_tensor is not None
+            if self.default_fparam_tensor is None:
+                raise ValueError(
+                    f"fparam is required but not provided, and no default_fparam is configured. "
+                    f"Either provide fparam data or set default_fparam in the fitting configuration."
+                )
             fparam = torch.tile(self.default_fparam_tensor.unsqueeze(0), [nf, 1])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/model/task/fitting.py` around lines 746 - 751, The current bare
assert in the block handling self.numb_fparam > 0 (when fparam is None) can
produce an unhelpful AssertionError if self.default_fparam_tensor is also None;
replace the assert with an explicit check that raises a clear ValueError
indicating that default_fparam must be configured or fparam provided (reference
symbols: self.numb_fparam, fparam, self.default_fparam_tensor), then proceed to
create fparam by tiling self.default_fparam_tensor via unsqueeze and torch.tile
as before.
deepmd/pt/model/network/mlp.py (1)

350-358: Silent fallback for unrecognized normalization names.

The find_normalization function silently returns None for unrecognized normalization names (via .get(name.lower(), None)). This could mask typos in configuration (e.g., "batchh" instead of "batch").

Consider raising an error or logging a warning for unknown names:

♻️ Suggested improvement
 def find_normalization(name: str, dim: int | None = None) -> nn.Module | None:
     """Return an normalization function using name."""
     if name is None:
         return None
-    return {
+    norm_map = {
         "batch": nn.BatchNorm1d(dim),
         "layer": nn.LayerNorm(dim),
         "none": None,
-    }.get(name.lower(), None)
+    }
+    key = name.lower()
+    if key not in norm_map:
+        raise ValueError(f"Unknown normalization type: {name}. Choose from {list(norm_map.keys())}")
+    return norm_map[key]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/model/network/mlp.py` around lines 350 - 358, The function
find_normalization currently silently returns None for unrecognized names;
change it to validate the lowercase name against the allowed keys ("batch",
"layer", "none") and raise a descriptive ValueError if the name is unknown
(keeping the existing behavior of returning None when name is None or "none");
reference the find_normalization function and the mapping for "batch" ->
nn.BatchNorm1d(dim) and "layer" -> nn.LayerNorm(dim) when implementing this
check and error message.
deepmd/pt/model/task/property.py (1)

95-95: Inconsistent type annotation for default_fparam.

The type annotation Optional[list] is less specific than Optional[list[float]] used in InvarFitting (line 110 in invar_fitting.py). For consistency and type safety, consider using the more specific type.

♻️ Proposed fix
-        default_fparam: Optional[list] = None,
+        default_fparam: Optional[list[float]] = None,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/model/task/property.py` at line 95, The parameter default_fparam in
the function/class signature currently uses Optional[list]; change its type
annotation to the more specific Optional[list[float]] to match InvarFitting's
annotation (see invar_fitting.py) so type checkers and readers know it holds
floats; locate the default_fparam declaration in
deepmd/pt/model/task/property.py and update its annotation to
Optional[list[float]] accordingly.
deepmd/pt/model/task/invar_fitting.py (1)

179-195: Unused parameters sw and edge_index in forward() method.

The parameters sw and edge_index are added to the method signature but are not used in the method body. Based on the AI summary, this is for signature alignment with fitting nets that need these tensors from the descriptor's additional outputs. If this is intentional for interface consistency, consider adding a brief comment or docstring entry explaining why these parameters exist but are unused in this particular implementation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/model/task/invar_fitting.py` around lines 179 - 195, The forward
method currently accepts sw and edge_index but never uses them; update the
method to either consume or explicitly acknowledge these parameters for
interface compatibility by adding a short inline comment or docstring note in
forward() stating that sw and edge_index are intentionally unused (kept for
signature alignment with other fitting nets), and ensure the implementation
still calls self._forward_common(descriptor, atype, gr, g2, h2, fparam, aparam)
and returns {self.var_name: out.to(env.GLOBAL_PT_FLOAT_PRECISION)} unchanged;
reference symbols: forward(), sw, edge_index, _forward_common, self.var_name,
env.GLOBAL_PT_FLOAT_PRECISION.
deepmd/dpmodel/fitting/dos_fitting.py (1)

49-50: Consider using more precise type hint Optional[list[float]].

The type hint Optional[list] is less precise than Optional[list[float]] used in other fitting classes (e.g., InvarFitting, PolarFitting, DipoleFitting). Consistent typing improves IDE support and type checking.

Proposed fix
-        default_fparam: Optional[list] = None,
+        default_fparam: Optional[list[float]] = None,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/fitting/dos_fitting.py` around lines 49 - 50, Update the type
annotation for the parameter default_fparam from Optional[list] to
Optional[list[float]] in the function signature (the constructor or initializer
in dos_fitting.py that declares default_fparam) so it matches other fitting
classes (InvarFitting, PolarFitting, DipoleFitting); no runtime changes
required—just change the type hint to Optional[list[float]] to improve precision
and IDE/type-checker support.
deepmd/pt/model/task/dipole.py (1)

100-101: Consider using more precise type hint Optional[list[float]].

For consistency with the dpmodel fitting classes and improved type checking, use Optional[list[float]] instead of Optional[list].

Proposed fix
-        default_fparam: Optional[list] = None,
+        default_fparam: Optional[list[float]] = None,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/model/task/dipole.py` around lines 100 - 101, Change the broad type
hint Optional[list] for the default_fparam parameter to the more precise
Optional[list[float]] in the dipole task signature (the parameter named
default_fparam in the function/constructor in deepmd/pt/model/task/dipole.py);
update any other occurrences of default_fparam type annotations in that file to
match and ensure typing imports still provide Optional (no code logic changes
needed).
deepmd/dpmodel/fitting/ener_fitting.py (1)

49-50: Consider using more precise type hint Optional[list[float]].

For consistency with InvarFitting (which uses Optional[list[float]]), consider updating the type hint.

Proposed fix
-        default_fparam: Optional[list] = None,
+        default_fparam: Optional[list[float]] = None,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/fitting/ener_fitting.py` around lines 49 - 50, Update the type
annotation for the parameter default_fparam to use Optional[list[float]] instead
of Optional[list] for precise typing (match InvarFitting); locate the function
or constructor in ener_fitting.py that declares default_fparam and change its
signature to default_fparam: Optional[list[float]] = None, and update any
related type hints or docstrings in the same module mentioning default_fparam to
reflect list[float].
deepmd/dpmodel/utils/learning_rate.py (1)

72-73: Missing blank line before method definition.

PEP8 recommends two blank lines before top-level function/class definitions and one blank line before method definitions inside a class.

🔧 Fix: Add blank line
         self.decay_end_rate = (self.decay_mode[0] + self.decay_mode[1]) / sum(
             self.decay_mode
         )
+
     def value(self, step) -> np.float64:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/utils/learning_rate.py` around lines 72 - 73, Add a single
blank line before the method definition "def value(self, step) -> np.float64:"
inside its class to comply with PEP8 method-spacing rules; locate the "value"
method in deepmd/dpmodel/utils/learning_rate.py and insert one blank line
immediately above that def so the method is separated from the preceding code
block.
deepmd/pd/model/task/fitting.py (1)

152-165: Missing validation for default_fparam length.

Unlike the dpmodel implementation (deepmd/dpmodel/fitting/general_fitting.py lines 186-190), this code doesn't validate that len(default_fparam) == numb_fparam when both are provided. This could cause issues during cross-backend serialization or inference.

♻️ Proposed fix to add validation
         self.dim_case_embd = dim_case_embd
         self.default_fparam = default_fparam
+        if self.default_fparam is not None and self.numb_fparam > 0:
+            assert len(self.default_fparam) == self.numb_fparam, (
+                "default_fparam length mismatch!"
+            )
         self.activation_function = activation_function
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pd/model/task/fitting.py` around lines 152 - 165, The constructor in
deepmd/pd/model/task/fitting.py sets default_fparam but lacks validation: add a
check in __init__ to ensure that when default_fparam is not None and numb_fparam
is not None/defined, len(default_fparam) == numb_fparam; if not, raise a
ValueError with a clear message referencing the mismatch. Mirror the validation
logic used in deepmd/dpmodel/fitting/general_fitting.py (lines around the
existing check) so cross-backend serialization/inference won't fail due to
mismatched default_fparam length.
deepmd/dpmodel/fitting/general_fitting.py (1)

186-193: Validation allows default_fparam when numb_fparam == 0.

The validation at line 187 only checks length match when numb_fparam > 0. If numb_fparam == 0 and default_fparam is provided (non-empty list), the code creates default_fparam_tensor but it will never be used since _call_common only enters the fparam block when numb_fparam > 0.

This is technically harmless but could silently ignore user misconfiguration. Consider adding a warning or assertion.

💡 Optional: Add validation when numb_fparam is 0
         if self.default_fparam is not None:
-            if self.numb_fparam > 0:
-                assert len(self.default_fparam) == self.numb_fparam, (
-                    "default_fparam length mismatch!"
-                )
+            assert self.numb_fparam > 0, (
+                "default_fparam provided but numb_fparam is 0"
+            )
+            assert len(self.default_fparam) == self.numb_fparam, (
+                "default_fparam length mismatch!"
+            )
             self.default_fparam_tensor = np.array(self.default_fparam, dtype=self.prec)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/fitting/general_fitting.py` around lines 186 - 193, The code
allows a non-None default_fparam when numb_fparam == 0 which will be ignored
later; update validation in the constructor (around default_fparam handling) to
detect this mismatch and either raise an assertion or emit a warning: if
self.numb_fparam == 0 and default_fparam is not None and
len(self.default_fparam) > 0 then raise an AssertionError (or log a warning via
warnings.warn) indicating that default_fparam will be ignored, referencing the
symbols default_fparam, numb_fparam, default_fparam_tensor and _call_common so
reviewers can see the relation.
deepmd/utils/argcheck.py (4)

2522-2531: Missing documentation for decay_mode parameter.

The decay_mode argument lacks documentation, and the default value "85:10:5" is not self-explanatory. Users need to understand what this format represents (presumably percentages of training steps for warmup:stable:decay phases).

📝 Proposed fix to add documentation
 def learning_rate_wsd():
     doc_start_lr = "The learning rate at the start of the training."
     doc_stop_lr = "The desired learning rate at the end of the training. "
+    doc_decay_mode = (
+        "The decay mode specification in the format 'warmup:stable:decay', "
+        "where each value represents the percentage of total training steps "
+        "allocated to that phase. For example, '85:10:5' means 85% warmup, "
+        "10% stable, and 5% decay phases."
+    )

     args = [
         Argument("start_lr", float, optional=True, default=1e-3, doc=doc_start_lr),
         Argument("stop_lr", float, optional=True, default=1e-5, doc=doc_stop_lr),
-        Argument("decay_mode", str, optional=True, default="85:10:5"),
+        Argument("decay_mode", str, optional=True, default="85:10:5", doc=doc_decay_mode),
     ]
     return args
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/utils/argcheck.py` around lines 2522 - 2531, The learning_rate_wsd()
function's Argument for "decay_mode" lacks a doc string; update the
Argument("decay_mode", ...) in learning_rate_wsd to include a clear doc
explaining the format and semantics (e.g., a colon-separated percentage triplet
"warmup:stable:decay" that sums to 100 and controls the fraction of training
steps in each phase, with the default "85:10:5" meaning 85% warmup, 10% stable,
5% decay), and mention that values are interpreted as percentages of total
training steps and should be integers summing to 100.

1686-1703: Missing documentation for new repflow arguments.

The new arguments update_use_layernorm, use_gated_mlp, and gated_mlp_norm lack the doc parameter, unlike all other arguments in this function. This inconsistency makes it harder for users to understand how to configure these options.

📝 Proposed fix to add documentation
+    doc_update_use_layernorm = "Whether to use layer normalization in the update step."
+    doc_use_gated_mlp = "Whether to use gated MLP layers."
+    doc_gated_mlp_norm = "The normalization type for gated MLP. Supported options are 'none', 'layer', etc."
+
     return [
         # repflow args
         ...
         Argument(
             "update_use_layernorm",
             bool,
             optional=True,
             default=False,
+            doc=doc_update_use_layernorm,
         ),
         Argument(
             "use_gated_mlp",
             bool,
             optional=True,
             default=False,
+            doc=doc_use_gated_mlp,
         ),
         Argument(
             "gated_mlp_norm",
             str,
             optional=True,
             default="none",
+            doc=doc_gated_mlp_norm,
         ),
     ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/utils/argcheck.py` around lines 1686 - 1703, Add missing documentation
for the three new repflow arguments by supplying a descriptive doc= string for
each Argument call: update_use_layernorm, use_gated_mlp, and gated_mlp_norm in
the Argument block. Follow the pattern used by other arguments in this function
(brief one-line description, expected behavior/defaults and valid values for
gated_mlp_norm like "none"|"pre"|"post" if applicable) so users can understand
when to enable layernorm, what gated MLP toggles, and accepted values for
gated_mlp_norm.

2701-2754: Missing documentation for new loss arguments.

The new arguments use_default_pf, f_use_norm, and use_l1_all lack the doc parameter, unlike all other arguments in the loss_ener() function. This makes it unclear what these options control.

📝 Proposed fix to add documentation
+    doc_use_default_pf = "Whether to use default per-atom force weighting."
+    doc_f_use_norm = "Whether to use normalized force loss."
+    doc_use_l1_all = "Whether to use L1 loss for all terms instead of L2."
+
     ...
         Argument(
             "use_default_pf",
             bool,
             optional=True,
             default=False,
+            doc=doc_use_default_pf,
         ),
     ...
         Argument(
             "f_use_norm",
             bool,
             optional=True,
             default=False,
+            doc=doc_f_use_norm,
         ),
         Argument(
             "use_l1_all",
             bool,
             optional=True,
             default=False,
+            doc=doc_use_l1_all,
         ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/utils/argcheck.py` around lines 2701 - 2754, The new loss arguments
use_default_pf, f_use_norm, and use_l1_all in the loss_ener() argument list are
missing doc strings; add descriptive doc parameters for each following the
existing pattern (e.g., define doc_use_default_pf, doc_f_use_norm,
doc_use_l1_all near other doc_* variables and pass them into the corresponding
Argument(...) calls) so each Argument("use_default_pf", ...),
Argument("f_use_norm", ...), and Argument("use_l1_all", ...) includes a doc=...
parameter that clearly explains the option's behavior and defaults.

2958-2970: Missing documentation for add_edge_readout and slim_edge_readout arguments.

These new arguments lack the doc parameter, unlike all other arguments in this function. Users won't understand what these options control.

📝 Proposed fix to add documentation
+    doc_add_edge_readout = "Whether to add edge readout in the energy fitting."
+    doc_slim_edge_readout = "Whether to use a slim (lighter) edge readout network."
+
     ...
         Argument(
             "add_edge_readout",
             bool,
             optional=True,
             default=True,
+            doc=doc_add_edge_readout,
         ),
         Argument(
             "slim_edge_readout",
             bool,
             optional=True,
             default=False,
+            doc=doc_slim_edge_readout,
         ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/utils/argcheck.py` around lines 2958 - 2970, The two Argument entries
for "add_edge_readout" and "slim_edge_readout" are missing their doc strings;
update the Argument(...) calls for add_edge_readout and slim_edge_readout to
include a doc= parameter that briefly explains what each flag controls (e.g.,
whether to include an edge-level readout in the model and whether to use a
slim/compact edge readout), and mention their default behavior (True for
add_edge_readout, False for slim_edge_readout) to match other arguments' style
and clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/dpmodel/descriptor/dpa3.py`:
- Around line 180-182: RepFlowArgs now has three new fields
(update_use_layernorm, use_gated_mlp, gated_mlp_norm) but they are not included
in RepFlowArgs.serialize() nor forwarded to DescrptBlockRepflows; update
RepFlowArgs.serialize() to include these three keys so they persist, and then
modify the call site where a RepFlowArgs instance is passed to construct
DescrptBlockRepflows to forward these values (or update
DescrptBlockRepflows.__init__ to accept and store them), ensuring the
constructor/attribute names match (update_use_layernorm, use_gated_mlp,
gated_mlp_norm) so saved models round-trip correctly.

In `@deepmd/dpmodel/utils/learning_rate.py`:
- Around line 80-86: Detect the zero-duration decay case before computing
decay_rate: if self.decay_end_rate == self.decay_start_rate (i.e., decay phase
length is zero) avoid the division and return an instantaneous value—return
self.start_lr when step <= self.decay_start_rate * self.stop_steps else return
self.stop_lr; update the block that computes decay_rate and the return to
short-circuit on this equality to prevent division by zero in decay_rate and
ensure correct boundary behavior for step.

In `@deepmd/infer/deep_property.py`:
- Around line 142-150: The softmax-weighted averaging block in
DeepProperty.eval() wrongly assumes nframes==3, references the local name
property before it's assigned, and uses multiple statements per line; remove or
refactor this block so property is computed exactly once and then weighted:
compute the per-frame values into a new local (e.g., prop_vals) before any
conditional, replace the nframes check with a guard that handles arbitrary frame
counts (or early-return if unsupported), perform the softmax weighting using
separate statements (no nondescript inline semicolons) and assign the final
weighted result back to property once (avoiding overwriting earlier results),
and then run ruff check/format to remove F823/E702 violations; look for
DeepProperty.eval, the variables nframes, scores, w, avg and property in the
shown diff to locate the code to change.

In `@deepmd/pd/model/task/fitting.py`:
- Around line 98-102: The docstring incorrectly says default_fparam is "not
supported" while the class stores and serializes it; fix by making runtime
behavior consistent: update _forward_common to detect missing/None fparam and
substitute self.default_fparam (or a tensor/array converted from it) before
further processing, ensure the instance attribute default_fparam remains used
for serialization, and remove or update the "not supported" phrase in the
docstring of the class so it reflects actual runtime support; reference symbols:
default_fparam and _forward_common.

In `@deepmd/pt/loss/ener.py`:
- Around line 56-61: The new boolean flags use_default_pf and f_use_norm (and
any other loss-semantics flags like use_huber/huber_delta if relevant) must be
added to the object's persisted state so they survive serialize()/deserialize()
round-trips: update the class constructor parameters (the signature that
includes inference, use_huber, use_default_pf, f_use_norm, huber_delta) and the
serialize() method to include these flags in the returned state, and update
deserialize()/from_state() to read and restore them into the instance; ensure
the same keys are used in both serialize and deserialize so restored instances
preserve loss behavior (also mirror this change where similar state handling
occurs around the other block mentioned at lines ~137-153).
- Around line 306-320: The displayed MAE (more_loss["mae_f"]) is computed from
the L1 loss before you switch to the vector-norm objective (l1_force_loss) when
f_use_norm=True, and the atom-pref branch reuses the same key causing the
weighted value to overwrite the unweighted one; update the code so that when
self.f_use_norm is True you compute the vector-norm MAE first and pass that
value to self.display_if_exist (use the vector-norm result for
more_loss["mae_f"] rather than the original L1), and in the atom-preference
branch (has_pf handling around the same pattern at the 359-370 region) write the
weighted metric to a separate key (e.g. "mae_f_w" or similar) instead of reusing
"mae_f" so the unweighted and weighted values do not clobber each other; locate
the display/update calls around l1_force_loss, force_label/force_pred,
display_if_exist, self.f_use_norm and has_pf to apply these changes.

In `@deepmd/pt/loss/property.py`:
- Around line 95-104: The code in PropertyLoss.forward() currently computes
score_per_sample, weights, avg_vec and then overwrites model_pred[var_name] with
avg_vec.expand_as(...), which couples every sample in the batch and forces the
nbz==3 guard to be required; to fix, stop mutating model_pred[var_name] — either
remove the softmax-weighted replacement entirely so per-sample losses use the
original model_pred[var_name], or move the triplet/aggregated logic into a
dedicated loss path or an explicit flag/mode (e.g., a new aggregated_loss
branch) that computes the aggregated prediction (avg_vec) only for that mode and
computes loss against targets without expanding back into model_pred; also
remove or relocate the nbz==3 special-case (lines referencing the guard) so it
is only used by the dedicated aggregation path, not the generic
PropertyLoss.forward().

In `@deepmd/pt/model/atomic_model/dp_atomic_model.py`:
- Around line 239-261: The descriptor forward signatures are missing
fparam/case_embd and will raise TypeError when called from dp_atomic_model.py;
update each descriptor's forward method (DescrptDPA1.forward,
DescrptDPA2.forward, DescrptHybrid.forward, DescrptSeA.forward,
DescrptSeR.forward, DescrptSeT.forward, DescrptSeTTebd.forward) to accept the
new keyword arguments (either add explicit parameters fparam=None,
case_embd=None to the signature or include **kwargs) and ensure the
implementations ignore or pass them through as needed so the call
descriptor(..., fparam=fparam_input_for_des, case_embd=case_embd_input_for_des)
no longer raises.

In `@deepmd/pt/model/descriptor/dpa3.py`:
- Around line 204-218: The charge/spin embedding tables defined in the
add_chg_spin_ebd branch (chg_embedding and spin_embedding using TypeEmbedNet)
are too small for the documented ranges because indices are shifted (charge
+100, spin +0) so max values (charge=100 -> index 200, spin=100 -> index 100)
are out of range; fix by increasing the embedding sizes by one (use
TypeEmbedNet(201, ...) for chg_embedding and TypeEmbedNet(101, ...) for
spin_embedding) or adjust the indexing/offset logic so the max values map inside
bounds; apply the same fix to the other occurrence of these embeddings later in
the file (the repeated chg/spin embedding initialization).
- Around line 123-125: The new optional descriptor flags and modules
(add_chg_spin_ebd, add_case_embd, dim_case_embd and the trainable modules
chg_embedding, spin_embedding, mix_cs_mlp, case_embd_mlp) are not included in
the file's export/import (round-trip) logic; update the serialization and
deserialization code paths to persist and restore these fields: write
add_chg_spin_ebd, add_case_embd, dim_case_embd into the serialized config/state,
and when serializing module weights include the state_dicts for chg_embedding,
spin_embedding, mix_cs_mlp, and case_embd_mlp only when those flags are true;
conversely, in the loader/constructor detect those flags, instantiate the
corresponding modules (with correct dims) and load their saved state_dicts so
models enabling these options keep both config and trained parameters on
export-reload.

In `@deepmd/pt/model/network/mlp.py`:
- Around line 309-322: mlp_core and mlp_gate are created with the same seed,
causing identical weight initialization; change the MLPLayer calls to use
distinct seeds by deriving child seeds from the provided seed (e.g., call
child_seed(seed) twice to produce seed_core and seed_gate) and pass seed_core to
MLPLayer for mlp_core and seed_gate to MLPLayer for mlp_gate so the two paths
get independent RNG states and different initial weights.

In `@deepmd/pt/model/task/ener.py`:
- Around line 275-277: Remove the public registration for the non-serializable
fitting type so DPAtomicModel.serialize won't call into an unimplemented
serialize path: delete or comment out the `@Fitting.register`("ener_readout")
decorator on the EnergyFittingNetReadout class (and the analogous registration
for the related energy-readout fitting declared later), or replace the decorator
with a guarded registration that only runs once the class implements
serialize/deserialize; ensure references to DPAtomicModel.serialize and
self.fitting_net.serialize will only see registered fitting classes that support
round-trip serialization.
- Around line 328-350: The edge readout builds one FittingNet per type in the
constructor (edge_embed.networks) when mixed_types is False but forward() always
calls self.edge_embed.networks[0](edge_feature), collapsing all types to the
first network; change forward() to select the network per-edge using the type
index (the per-edge type variable used in forward, e.g., t or edge_type) and
call self.edge_embed.networks[type_idx](edge_feature) so each type uses its
corresponding FittingNet; apply the same fix for the other occurrences noted
(around the lines referenced, e.g., the second invocation at 404-406) and ensure
you still handle the mixed_types branch (which should keep using index 0).
- Around line 282-297: Change the mutable list default arguments in
EnergyFittingNetReadout.__init__: make the parameters neuron and norm_fact
default to None (not mutable lists) and inside __init__ set neuron = [128, 128,
128] if neuron is None else neuron and norm_fact = [120.0] if norm_fact is None
else norm_fact so each instance gets its own list; update the signature (neuron:
Optional[list[int]] = None, norm_fact: Optional[list[float]] = None) and any
type hints/usages referencing these parameters accordingly.
- Around line 333-348: The seed arithmetic fails when self.seed is None or a
list; fix by normalizing self.seed before calling child_seed: compute a local
base_seed = self.seed (if list, pick base_seed = self.seed[ii]); if
isinstance(base_seed, int) add 100 to it, otherwise leave None unchanged; then
call child_seed(base_seed, ii) when constructing edge_embed (NetworkCollection /
FittingNet) so child_seed only ever receives an int or None, not a list or an
expression that adds to None.

In `@deepmd/pt/train/training.py`:
- Around line 1165-1168: The conditional that checks find_fparam currently
assumes batch_data["find_fparam"] exists when "fparam" is present; update the
guard in the loop over input_keys (the block using item_key, input_keys,
batch_data, input_dict) to avoid KeyError by checking for the key first or using
a safe lookup (e.g., batch_data.get("find_fparam", 0.0)) so the condition
becomes: if item_key != "fparam" or batch_data.get("find_fparam", 0.0) != 0.0,
then assign input_dict[item_key] = batch_data[item_key].

In `@deepmd/pt/train/wrapper.py`:
- Around line 139-149: The computation of frac_prob may divide by zero when
model_key_prob_map[model_key_base] is zero; before computing frac_prob, check
the denominator (model_key_prob_map[model_key_base]) and handle the zero case
defensively (e.g., set frac_prob to 0 or a small epsilon, or log/raise a clear
exception) and then call link_class.share_params(base_class, shared_level_link,
model_prob=frac_prob, protection=data_stat_protect, resume=resume); ensure you
reference the same symbols model_key_prob_map, model_key_link, model_key_base,
frac_prob and link_class.share_params when making the change.

In `@deepmd/utils/argcheck.py`:
- Around line 2877-2907: The function fitting_ener_readout() is missing the
default_fparam argument; add an Argument("default_fparam", type=appropriate,
optional=True, default=None, doc=...) alongside numb_fparam/numb_aparam in
fitting_ener_readout() so its signature matches other fitting_* functions, and
then update the EnergyFittingNetReadout.__init__() to accept default_fparam and
pass it through to the parent InvarFitting (or to super().__init__(...,
default_fparam=default_fparam)) so the readout class forwards the value to the
base class.

---

Outside diff comments:
In `@deepmd/entrypoints/test.py`:
- Around line 705-712: In test_dos, avoid unconditionally reading
test_data["fparam"] when dp.get_dim_fparam() > 0; instead mirror test_ener's
logic and first check test_data["find_fparam"] != 0.0 (and/or
dp.has_default_fparam()) before slicing and passing fparam into the model.
Update the fparam assignment in test_dos so that fparam is set to None when
find_fparam is zero (or when the dp indicates default fparam should be used),
otherwise slice test_data["fparam"][:numb_test]; reference variables/functions:
test_dos, fparam, test_data["find_fparam"], dp.get_dim_fparam(),
dp.has_default_fparam().
- Around line 877-884: The snippet uses test_data["fparam"] without checking
whether fparam exists for this dataset; add the same guard used in other tests
by checking dp.find_fparam() (in addition to dp.get_dim_fparam() > 0) before
assigning fparam so that fparam = test_data["fparam"][:numb_test] only when
dp.find_fparam() is true, otherwise set fparam = None; keep the existing aparam
logic unchanged and update any downstream uses accordingly (refer to
dp.get_dim_fparam, dp.find_fparam, fparam, aparam, and test_data to locate the
code).

In `@deepmd/pd/model/task/invar_fitting.py`:
- Around line 87-134: Add an explicit default_fparam parameter to the Paddle
backend __init__ signature (e.g., default_fparam: Optional[list[float]] = None)
in deepmd/pd/model/task/invar_fitting.py and forward it into the parent
constructor call by adding default_fparam=default_fparam to the
super().__init__(...) invocation; this makes the signature consistent with the
PyTorch backend (the __init__ method and its super().__init__ call are the
unique identifiers to update).

In `@deepmd/pt/model/descriptor/repflow_layer.py`:
- Around line 1223-1263: The serialize() method of RepFlowLayer is missing the
new fields update_use_layernorm, use_gated_mlp, and gated_mlp_norm; update
serialize() to include these keys (e.g. "update_use_layernorm":
self.update_use_layernorm, "use_gated_mlp": self.use_gated_mlp,
"gated_mlp_norm": self.gated_mlp_norm) so deserialization preserves their
values, and bump the "@version" (e.g. increment to 3) to reflect the schema
change; ensure these attribute names match the constructor/instance attributes
used elsewhere in the class.
- Around line 1322-1338: Add proper serialization support and type-dispatch for
GatedMLP: implement serialize() and classmethod deserialize() on the GatedMLP
class so it returns a dict with a unique type tag and its params, and update the
deserialization sites that currently call MLPLayer.deserialize() (specifically
restoring node_edge_linear and edge_angle_linear1/edge_angle_linear2 and any
a_compress_* when use_gated_mlp=True) to inspect the saved type tag (or the flag
use_gated_mlp) and call GatedMLP.deserialize(...) when the tag/flag indicates a
gated MLP, otherwise call MLPLayer.deserialize(...); ensure the produced dict
format from GatedMLP.serialize matches what GatedMLP.deserialize expects so
round-trip works without raising AttributeError.

In `@deepmd/pt/model/descriptor/repflows.py`:
- Around line 258-269: Move the check that validates sel_reduce_factor (>0) to
before any use of it so dynamic_e_sel and dynamic_a_sel aren't computed when
sel_reduce_factor is zero; specifically, in the initializer validate
self.sel_reduce_factor and raise the ValueError if <=0 before computing
self.dynamic_e_sel = self.nnei / self.sel_reduce_factor and self.dynamic_a_sel =
self.a_sel / self.sel_reduce_factor, while keeping the existing check that if
self.use_dynamic_sel and not self.smooth_edge_update raise NotImplementedError.

---

Nitpick comments:
In `@deepmd/dpmodel/fitting/dos_fitting.py`:
- Around line 49-50: Update the type annotation for the parameter default_fparam
from Optional[list] to Optional[list[float]] in the function signature (the
constructor or initializer in dos_fitting.py that declares default_fparam) so it
matches other fitting classes (InvarFitting, PolarFitting, DipoleFitting); no
runtime changes required—just change the type hint to Optional[list[float]] to
improve precision and IDE/type-checker support.

In `@deepmd/dpmodel/fitting/ener_fitting.py`:
- Around line 49-50: Update the type annotation for the parameter default_fparam
to use Optional[list[float]] instead of Optional[list] for precise typing (match
InvarFitting); locate the function or constructor in ener_fitting.py that
declares default_fparam and change its signature to default_fparam:
Optional[list[float]] = None, and update any related type hints or docstrings in
the same module mentioning default_fparam to reflect list[float].

In `@deepmd/dpmodel/fitting/general_fitting.py`:
- Around line 186-193: The code allows a non-None default_fparam when
numb_fparam == 0 which will be ignored later; update validation in the
constructor (around default_fparam handling) to detect this mismatch and either
raise an assertion or emit a warning: if self.numb_fparam == 0 and
default_fparam is not None and len(self.default_fparam) > 0 then raise an
AssertionError (or log a warning via warnings.warn) indicating that
default_fparam will be ignored, referencing the symbols default_fparam,
numb_fparam, default_fparam_tensor and _call_common so reviewers can see the
relation.

In `@deepmd/dpmodel/utils/learning_rate.py`:
- Around line 72-73: Add a single blank line before the method definition "def
value(self, step) -> np.float64:" inside its class to comply with PEP8
method-spacing rules; locate the "value" method in
deepmd/dpmodel/utils/learning_rate.py and insert one blank line immediately
above that def so the method is separated from the preceding code block.

In `@deepmd/pd/model/task/fitting.py`:
- Around line 152-165: The constructor in deepmd/pd/model/task/fitting.py sets
default_fparam but lacks validation: add a check in __init__ to ensure that when
default_fparam is not None and numb_fparam is not None/defined,
len(default_fparam) == numb_fparam; if not, raise a ValueError with a clear
message referencing the mismatch. Mirror the validation logic used in
deepmd/dpmodel/fitting/general_fitting.py (lines around the existing check) so
cross-backend serialization/inference won't fail due to mismatched
default_fparam length.

In `@deepmd/pt/model/network/mlp.py`:
- Around line 350-358: The function find_normalization currently silently
returns None for unrecognized names; change it to validate the lowercase name
against the allowed keys ("batch", "layer", "none") and raise a descriptive
ValueError if the name is unknown (keeping the existing behavior of returning
None when name is None or "none"); reference the find_normalization function and
the mapping for "batch" -> nn.BatchNorm1d(dim) and "layer" -> nn.LayerNorm(dim)
when implementing this check and error message.

In `@deepmd/pt/model/task/dipole.py`:
- Around line 100-101: Change the broad type hint Optional[list] for the
default_fparam parameter to the more precise Optional[list[float]] in the dipole
task signature (the parameter named default_fparam in the function/constructor
in deepmd/pt/model/task/dipole.py); update any other occurrences of
default_fparam type annotations in that file to match and ensure typing imports
still provide Optional (no code logic changes needed).

In `@deepmd/pt/model/task/fitting.py`:
- Around line 746-751: The current bare assert in the block handling
self.numb_fparam > 0 (when fparam is None) can produce an unhelpful
AssertionError if self.default_fparam_tensor is also None; replace the assert
with an explicit check that raises a clear ValueError indicating that
default_fparam must be configured or fparam provided (reference symbols:
self.numb_fparam, fparam, self.default_fparam_tensor), then proceed to create
fparam by tiling self.default_fparam_tensor via unsqueeze and torch.tile as
before.

In `@deepmd/pt/model/task/invar_fitting.py`:
- Around line 179-195: The forward method currently accepts sw and edge_index
but never uses them; update the method to either consume or explicitly
acknowledge these parameters for interface compatibility by adding a short
inline comment or docstring note in forward() stating that sw and edge_index are
intentionally unused (kept for signature alignment with other fitting nets), and
ensure the implementation still calls self._forward_common(descriptor, atype,
gr, g2, h2, fparam, aparam) and returns {self.var_name:
out.to(env.GLOBAL_PT_FLOAT_PRECISION)} unchanged; reference symbols: forward(),
sw, edge_index, _forward_common, self.var_name, env.GLOBAL_PT_FLOAT_PRECISION.

In `@deepmd/pt/model/task/property.py`:
- Line 95: The parameter default_fparam in the function/class signature
currently uses Optional[list]; change its type annotation to the more specific
Optional[list[float]] to match InvarFitting's annotation (see invar_fitting.py)
so type checkers and readers know it holds floats; locate the default_fparam
declaration in deepmd/pt/model/task/property.py and update its annotation to
Optional[list[float]] accordingly.

In `@deepmd/utils/argcheck.py`:
- Around line 2522-2531: The learning_rate_wsd() function's Argument for
"decay_mode" lacks a doc string; update the Argument("decay_mode", ...) in
learning_rate_wsd to include a clear doc explaining the format and semantics
(e.g., a colon-separated percentage triplet "warmup:stable:decay" that sums to
100 and controls the fraction of training steps in each phase, with the default
"85:10:5" meaning 85% warmup, 10% stable, 5% decay), and mention that values are
interpreted as percentages of total training steps and should be integers
summing to 100.
- Around line 1686-1703: Add missing documentation for the three new repflow
arguments by supplying a descriptive doc= string for each Argument call:
update_use_layernorm, use_gated_mlp, and gated_mlp_norm in the Argument block.
Follow the pattern used by other arguments in this function (brief one-line
description, expected behavior/defaults and valid values for gated_mlp_norm like
"none"|"pre"|"post" if applicable) so users can understand when to enable
layernorm, what gated MLP toggles, and accepted values for gated_mlp_norm.
- Around line 2701-2754: The new loss arguments use_default_pf, f_use_norm, and
use_l1_all in the loss_ener() argument list are missing doc strings; add
descriptive doc parameters for each following the existing pattern (e.g., define
doc_use_default_pf, doc_f_use_norm, doc_use_l1_all near other doc_* variables
and pass them into the corresponding Argument(...) calls) so each
Argument("use_default_pf", ...), Argument("f_use_norm", ...), and
Argument("use_l1_all", ...) includes a doc=... parameter that clearly explains
the option's behavior and defaults.
- Around line 2958-2970: The two Argument entries for "add_edge_readout" and
"slim_edge_readout" are missing their doc strings; update the Argument(...)
calls for add_edge_readout and slim_edge_readout to include a doc= parameter
that briefly explains what each flag controls (e.g., whether to include an
edge-level readout in the model and whether to use a slim/compact edge readout),
and mention their default behavior (True for add_edge_readout, False for
slim_edge_readout) to match other arguments' style and clarity.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e9a98c2e-8bc9-4fb9-ad37-7f0821389d86

📥 Commits

Reviewing files that changed from the base of the PR and between e97967b and ad98d0d.

📒 Files selected for processing (49)
  • deepmd/dpmodel/atomic_model/base_atomic_model.py
  • deepmd/dpmodel/atomic_model/dp_atomic_model.py
  • deepmd/dpmodel/descriptor/dpa3.py
  • deepmd/dpmodel/descriptor/make_base_descriptor.py
  • deepmd/dpmodel/fitting/dipole_fitting.py
  • deepmd/dpmodel/fitting/dos_fitting.py
  • deepmd/dpmodel/fitting/ener_fitting.py
  • deepmd/dpmodel/fitting/general_fitting.py
  • deepmd/dpmodel/fitting/invar_fitting.py
  • deepmd/dpmodel/fitting/make_base_fitting.py
  • deepmd/dpmodel/fitting/polarizability_fitting.py
  • deepmd/dpmodel/fitting/property_fitting.py
  • deepmd/dpmodel/infer/deep_eval.py
  • deepmd/dpmodel/model/make_model.py
  • deepmd/dpmodel/utils/learning_rate.py
  • deepmd/entrypoints/test.py
  • deepmd/infer/deep_eval.py
  • deepmd/infer/deep_property.py
  • deepmd/jax/fitting/fitting.py
  • deepmd/pd/model/task/ener.py
  • deepmd/pd/model/task/fitting.py
  • deepmd/pd/model/task/invar_fitting.py
  • deepmd/pt/infer/deep_eval.py
  • deepmd/pt/loss/ener.py
  • deepmd/pt/loss/property.py
  • deepmd/pt/model/atomic_model/base_atomic_model.py
  • deepmd/pt/model/atomic_model/dp_atomic_model.py
  • deepmd/pt/model/descriptor/dpa1.py
  • deepmd/pt/model/descriptor/dpa3.py
  • deepmd/pt/model/descriptor/repflow_layer.py
  • deepmd/pt/model/descriptor/repflows.py
  • deepmd/pt/model/model/__init__.py
  • deepmd/pt/model/model/make_model.py
  • deepmd/pt/model/network/mlp.py
  • deepmd/pt/model/task/dipole.py
  • deepmd/pt/model/task/dos.py
  • deepmd/pt/model/task/ener.py
  • deepmd/pt/model/task/fitting.py
  • deepmd/pt/model/task/invar_fitting.py
  • deepmd/pt/model/task/polarizability.py
  • deepmd/pt/model/task/property.py
  • deepmd/pt/train/training.py
  • deepmd/pt/train/wrapper.py
  • deepmd/pt/utils/learning_rate.py
  • deepmd/pt/utils/stat.py
  • deepmd/utils/argcheck.py
  • deepmd/utils/env_mat_stat.py
  • deepmd/utils/path.py
  • source/tests/pt/test_dp_test.py

Comment on lines +180 to +182
update_use_layernorm: bool = False,
use_gated_mlp: bool = False,
gated_mlp_norm: str = "none",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if DescrptBlockRepflows accepts these parameters
ast-grep --pattern $'class DescrptBlockRepflows {
  $$$
  def __init__($$$) {
    $$$
  }
  $$$
}'

# Alternative: search for the constructor signature
rg -n "def __init__" deepmd/dpmodel/descriptor/repflows.py -A 30 | head -50

Repository: deepmodeling/deepmd-kit

Length of output: 1725


🏁 Script executed:

# Get the full RepFlowArgs class definition including __init__, serialize, and DescrptBlockRepflows call
sed -n '170,370p' deepmd/dpmodel/descriptor/dpa3.py

Repository: deepmodeling/deepmd-kit

Length of output: 8519


🏁 Script executed:

# Get the full DescrptBlockRepflows.__init__ signature
rg -n "class DescrptBlockRepflows" deepmd/dpmodel/descriptor/repflows.py -A 200 | head -100

Repository: deepmodeling/deepmd-kit

Length of output: 6378


🏁 Script executed:

# Get the complete __init__ signature of DescrptBlockRepflows
sed -n '172,250p' deepmd/dpmodel/descriptor/repflows.py

Repository: deepmodeling/deepmd-kit

Length of output: 3029


Missing serialization and forwarding of three new RepFlowArgs parameters.

The three new parameters (update_use_layernorm, use_gated_mlp, gated_mlp_norm) are stored as instance attributes but:

  1. Not serialized: RepFlowArgs.serialize() omits these parameters. Models will lose them when saved/loaded.

  2. Not forwarded: These parameters are not passed to DescrptBlockRepflows, and DescrptBlockRepflows doesn't accept them in its constructor. This indicates incomplete implementation—either they should be added to DescrptBlockRepflows or they should not be in RepFlowArgs yet.

🔧 Proposed fix for serialization

Add the missing parameters to serialize():

             "use_dynamic_sel": self.use_dynamic_sel,
             "sel_reduce_factor": self.sel_reduce_factor,
+            "update_use_layernorm": self.update_use_layernorm,
+            "use_gated_mlp": self.use_gated_mlp,
+            "gated_mlp_norm": self.gated_mlp_norm,
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/descriptor/dpa3.py` around lines 180 - 182, RepFlowArgs now
has three new fields (update_use_layernorm, use_gated_mlp, gated_mlp_norm) but
they are not included in RepFlowArgs.serialize() nor forwarded to
DescrptBlockRepflows; update RepFlowArgs.serialize() to include these three keys
so they persist, and then modify the call site where a RepFlowArgs instance is
passed to construct DescrptBlockRepflows to forward these values (or update
DescrptBlockRepflows.__init__ to accept and store them), ensuring the
constructor/attribute names match (update_use_layernorm, use_gated_mlp,
gated_mlp_norm) so saved models round-trip correctly.

Comment on lines +80 to +86
decay_rate = (self.start_lr - self.stop_lr) / (
self.decay_end_rate * self.stop_steps
- self.decay_start_rate * self.stop_steps
)
return self.start_lr - decay_rate * (
step - self.decay_start_rate * self.stop_steps
) No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential division by zero when decay phase duration is zero.

If decay_mode[1] == 0 (e.g., "90:0:10"), then decay_end_rate == decay_start_rate, causing division by zero at line 80-83. The boundary checks at lines 74-77 might not catch all steps in this edge case.

🛡️ Proposed fix to handle zero-duration decay
     def value(self, step) -> np.float64:
         if step < self.decay_start_rate * self.stop_steps:
             return self.start_lr
         elif step >= self.decay_end_rate * self.stop_steps:
             return self.stop_lr
+        elif self.decay_start_rate == self.decay_end_rate:
+            # No decay phase, should not reach here but handle gracefully
+            return self.start_lr
         else:
             # linear decay
             decay_rate = (self.start_lr - self.stop_lr) / (
                 self.decay_end_rate * self.stop_steps
                 - self.decay_start_rate * self.stop_steps
             )
             return self.start_lr - decay_rate * (
                 step - self.decay_start_rate * self.stop_steps
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/utils/learning_rate.py` around lines 80 - 86, Detect the
zero-duration decay case before computing decay_rate: if self.decay_end_rate ==
self.decay_start_rate (i.e., decay phase length is zero) avoid the division and
return an instantaneous value—return self.start_lr when step <=
self.decay_start_rate * self.stop_steps else return self.stop_lr; update the
block that computes decay_rate and the return to short-circuit on this equality
to prevent division by zero in decay_rate and ensure correct boundary behavior
for step.

Comment on lines +142 to +150
# --- softmax-weighted averaging over frames (minimal) ---
print(f"Nframes == {nframes}")
if nframes != 3:
raise RuntimeError(f"Expected nframes == 3, got {nframes}")
scores = property.mean(axis=1) # (3,)
# If you want to favor *smaller* values (e.g., energies), use: scores = -scores
w = np.exp(scores - scores.max()); w /= w.sum() # (3,)
avg = (w[:, None] * property).sum(axis=0, keepdims=True) # (1, D)
property[:] = np.repeat(avg, nframes, axis=0) # (3, D)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

ruff check deepmd/infer/deep_property.py --select F823,E702 || true
nl -ba deepmd/infer/deep_property.py | sed -n '139,155p'

Repository: deepmodeling/deepmd-kit

Length of output: 1517


🏁 Script executed:

cat -n deepmd/infer/deep_property.py | sed -n '139,155p'

Repository: deepmodeling/deepmd-kit

Length of output: 1077


🏁 Script executed:

cat -n deepmd/infer/deep_property.py | sed -n '130,165p'

Repository: deepmodeling/deepmd-kit

Length of output: 1613


This block currently makes DeepProperty.eval() fail for every caller.

For non-3-frame inputs, line 145 always raises; for 3-frame inputs, line 146 attempts to read property before its first assignment at line 153, causing a NameError. Even if that reference were fixed, lines 153-155 would immediately overwrite any weighted result computed here. Remove this entire block or restructure to compute property once before the weighting logic. The code also violates Ruff checks (F823: local variable referenced before assignment at line 146; E702: multiple statements on one line at line 148). Per coding guidelines, always run ruff check . and ruff format . before committing changes.

🛠️ Minimal fix
-        # --- softmax-weighted averaging over frames (minimal) ---
-        print(f"Nframes == {nframes}")
-        if nframes != 3:
-            raise RuntimeError(f"Expected nframes == 3, got {nframes}")
-        scores = property.mean(axis=1)                           # (3,)
-        # If you want to favor *smaller* values (e.g., energies), use: scores = -scores
-        w = np.exp(scores - scores.max()); w /= w.sum()          # (3,)
-        avg = (w[:, None] * property).sum(axis=0, keepdims=True) # (1, D)
-        property[:] = np.repeat(avg, nframes, axis=0)            # (3, D)
-        # --------------------------------------------------------
-
         property = results[f"{self.get_var_name()}_redu"].reshape(
             nframes, self.get_task_dim()
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# --- softmax-weighted averaging over frames (minimal) ---
print(f"Nframes == {nframes}")
if nframes != 3:
raise RuntimeError(f"Expected nframes == 3, got {nframes}")
scores = property.mean(axis=1) # (3,)
# If you want to favor *smaller* values (e.g., energies), use: scores = -scores
w = np.exp(scores - scores.max()); w /= w.sum() # (3,)
avg = (w[:, None] * property).sum(axis=0, keepdims=True) # (1, D)
property[:] = np.repeat(avg, nframes, axis=0) # (3, D)
property = results[f"{self.get_var_name()}_redu"].reshape(
nframes, self.get_task_dim()
)
🧰 Tools
🪛 Ruff (0.15.6)

[error] 146-146: Local variable property referenced before assignment

(F823)


[error] 148-148: Multiple statements on one line (semicolon)

(E702)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/infer/deep_property.py` around lines 142 - 150, The softmax-weighted
averaging block in DeepProperty.eval() wrongly assumes nframes==3, references
the local name property before it's assigned, and uses multiple statements per
line; remove or refactor this block so property is computed exactly once and
then weighted: compute the per-frame values into a new local (e.g., prop_vals)
before any conditional, replace the nframes check with a guard that handles
arbitrary frame counts (or early-return if unsupported), perform the softmax
weighting using separate statements (no nondescript inline semicolons) and
assign the final weighted result back to property once (avoiding overwriting
earlier results), and then run ruff check/format to remove F823/E702 violations;
look for DeepProperty.eval, the variables nframes, scores, w, avg and property
in the shown diff to locate the code to change.

Comment on lines +98 to 102
default_fparam: list[float], optional
The default frame parameter. If set, when `fparam.npy` files are not included in the data system,
this value will be used as the default value for the frame parameter in the fitting net.
This parameter is not supported in PaddlePaddle.
dim_case_embd : int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring claims feature is "not supported" but code stores and serializes it.

The docstring states "This parameter is not supported in PaddlePaddle" yet the code stores default_fparam on the instance and includes it in serialization. This is inconsistent:

  1. If truly unsupported, why store/serialize it? This could confuse users.
  2. If intended for serialization compatibility only, the docstring should clarify this.
  3. Unlike deepmd/dpmodel/fitting/general_fitting.py, this implementation lacks runtime usage in _forward_common to fill missing fparam.

Consider either:

  • Implementing full support (using default when fparam is None in _forward_common)
  • Clarifying the docstring to explain this is for serialization interoperability only
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pd/model/task/fitting.py` around lines 98 - 102, The docstring
incorrectly says default_fparam is "not supported" while the class stores and
serializes it; fix by making runtime behavior consistent: update _forward_common
to detect missing/None fparam and substitute self.default_fparam (or a
tensor/array converted from it) before further processing, ensure the instance
attribute default_fparam remains used for serialization, and remove or update
the "not supported" phrase in the docstring of the class so it reflects actual
runtime support; reference symbols: default_fparam and _forward_common.

Comment on lines +56 to +61
inference: bool = False,
use_huber: bool = False,
use_default_pf: bool = False,
f_use_norm: bool = False,
huber_delta: float = 0.01,
**kwargs: Any,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The new loss flags also need to round-trip through serialization.

use_default_pf and f_use_norm now change loss semantics, but they are not part of the persisted loss state. Any round-trip through serialize() / deserialize() will silently fall back to False and change behavior.

Also applies to: 137-153

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/loss/ener.py` around lines 56 - 61, The new boolean flags
use_default_pf and f_use_norm (and any other loss-semantics flags like
use_huber/huber_delta if relevant) must be added to the object's persisted state
so they survive serialize()/deserialize() round-trips: update the class
constructor parameters (the signature that includes inference, use_huber,
use_default_pf, f_use_norm, huber_delta) and the serialize() method to include
these flags in the returned state, and update deserialize()/from_state() to read
and restore them into the instance; ensure the same keys are used in both
serialize and deserialize so restored instances preserve loss behavior (also
mirror this change where similar state handling occurs around the other block
mentioned at lines ~137-153).

Comment on lines +333 to +348
if self.add_edge_readout:
self.edge_embed = NetworkCollection(
1 if not self.mixed_types else 0,
self.ntypes,
network_type="fitting_network",
networks=[
FittingNet(
self.embedding_width,
1,
self.neuron if not self.slim_edge_readout else self.neuron[:1],
self.activation_function,
self.resnet_dt,
self.precision,
bias_out=True,
seed=child_seed(self.seed + 100, ii),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "ener.py" | grep -E "deepmd/pt/model/task"

Repository: deepmodeling/deepmd-kit

Length of output: 97


🏁 Script executed:

cd deepmd/pt/model/task && wc -l ener.py

Repository: deepmodeling/deepmd-kit

Length of output: 78


🏁 Script executed:

head -400 deepmd/pt/model/task/ener.py | tail -100

Repository: deepmodeling/deepmd-kit

Length of output: 3436


🏁 Script executed:

rg "def __init__" deepmd/pt/model/task/ener.py -A 50 | head -80

Repository: deepmodeling/deepmd-kit

Length of output: 2522


🏁 Script executed:

rg "self.seed" deepmd/pt/model/task/ener.py -B 2 -A 2

Repository: deepmodeling/deepmd-kit

Length of output: 309


🏁 Script executed:

rg "def child_seed" deepmd/pt/model/task/ener.py -B 2 -A 10

Repository: deepmodeling/deepmd-kit

Length of output: 49


🏁 Script executed:

rg "def child_seed" . --type py -A 10

Repository: deepmodeling/deepmd-kit

Length of output: 1020


🏁 Script executed:

cat deepmd/dpmodel/utils/seed.py

Repository: deepmodeling/deepmd-kit

Length of output: 987


🏁 Script executed:

rg "class EnergyFittingNetReadout" deepmd/pt/model/task/ener.py -A 100 | head -120

Repository: deepmodeling/deepmd-kit

Length of output: 3407


🏁 Script executed:

rg "self\.seed" deepmd/pt/model/task/ener.py -B 3 -A 3

Repository: deepmodeling/deepmd-kit

Length of output: 368


🏁 Script executed:

rg "class InvarFitting" deepmd/pt/model/ --type py -A 5

Repository: deepmodeling/deepmd-kit

Length of output: 423


🏁 Script executed:

fd "invariant.py" deepmd/pt/model/

Repository: deepmodeling/deepmd-kit

Length of output: 49


🏁 Script executed:

rg "self.seed" deepmd/pt/model/invariant.py -B 2 -A 2

Repository: deepmodeling/deepmd-kit

Length of output: 135


🏁 Script executed:

cat deepmd/pt/model/task/invar_fitting.py | head -150

Repository: deepmodeling/deepmd-kit

Length of output: 4910


🏁 Script executed:

rg "self\.seed\s*=" deepmd/pt/model/ --type py -B 2 -A 2

Repository: deepmodeling/deepmd-kit

Length of output: 5294


🏁 Script executed:

rg "class GeneralFitting" deepmd/pt/model/ --type py -A 50 | head -80

Repository: deepmodeling/deepmd-kit

Length of output: 3873


🏁 Script executed:

cd deepmd/pt/model/task && python -c "
# Test what happens with seed=None
seed = None
try:
    result = seed + 100
    print(f'seed + 100 = {result}')
except TypeError as e:
    print(f'seed=None: TypeError - {e}')

# Test what happens with seed=[1, 2]
seed = [1, 2]
try:
    result = seed + 100
    print(f'seed + 100 = {result}')
except TypeError as e:
    print(f'seed=list: TypeError - {e}')

# Test what happens with seed=42
seed = 42
try:
    result = seed + 100
    print(f'seed + 100 = {result}')
except TypeError as e:
    print(f'seed=int: TypeError - {e}')
"

Repository: deepmodeling/deepmd-kit

Length of output: 231


🏁 Script executed:

cd deepmd && python -c "
from dpmodel.utils.seed import child_seed

# Test the suggested fix approach
print('Testing suggested fix: child_seed(child_seed(seed, 100), ii)')
print()

# Case 1: seed=None
seed = None
print(f'Case 1: seed=None')
first_call = child_seed(seed, 100)
print(f'  child_seed(None, 100) = {first_call}')
second_call = child_seed(first_call, 0)
print(f'  child_seed({first_call}, 0) = {second_call}')
print()

# Case 2: seed=42 (int)
seed = 42
print(f'Case 2: seed=42')
first_call = child_seed(seed, 100)
print(f'  child_seed(42, 100) = {first_call}')
second_call = child_seed(first_call, 0)
print(f'  child_seed({first_call}, 0) = {second_call}')
print()

# Case 3: seed=[1, 2] (list)
seed = [1, 2]
print(f'Case 3: seed=[1, 2]')
first_call = child_seed(seed, 100)
print(f'  child_seed([1, 2], 100) = {first_call}')
second_call = child_seed(first_call, 0)
print(f'  child_seed({first_call}, 0) = {second_call}')
"

Repository: deepmodeling/deepmd-kit

Length of output: 305


🏁 Script executed:

cd deepmd && ruff check pt/model/task/ener.py --select=E,W,F

Repository: deepmodeling/deepmd-kit

Length of output: 805


🏁 Script executed:

head -50 deepmd/pt/model/task/ener.py | grep -E "^import|^from"

Repository: deepmodeling/deepmd-kit

Length of output: 526


🏁 Script executed:

grep "child_seed" deepmd/pt/model/task/ener.py

Repository: deepmodeling/deepmd-kit

Length of output: 144


Fix the seed handling to support None and list-based seeds.

The code seed=child_seed(self.seed + 100, ii) fails for the default seed=None and for seed as a list, raising TypeError before the edge readout networks are constructed. The public signature allows seed: Optional[Union[int, list[int]]] = None, but only the integer case works with the current implementation.

Suggested fix
-                        seed=child_seed(self.seed + 100, ii),
+                        seed=child_seed(child_seed(self.seed, 100), ii),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/model/task/ener.py` around lines 333 - 348, The seed arithmetic
fails when self.seed is None or a list; fix by normalizing self.seed before
calling child_seed: compute a local base_seed = self.seed (if list, pick
base_seed = self.seed[ii]); if isinstance(base_seed, int) add 100 to it,
otherwise leave None unchanged; then call child_seed(base_seed, ii) when
constructing edge_embed (NetworkCollection / FittingNet) so child_seed only ever
receives an int or None, not a list or an expression that adds to None.

Comment on lines +544 to +568
old_model_params = self.wrapper.state_dict()["_extra_state"][
"model_params"
]
try:
self.wrapper.load_state_dict(state_dict)
except RuntimeError as e:
# init from direct fitting
rm_list = []
for kk in state_dict:
# delete direct heads
if (
"fitting_net.force_embed." in kk
or "fitting_net.noise_embed" in kk
):
rm_list.append(kk)
for kk in rm_list:
state_dict.pop(kk)
state_dict["_extra_state"]["model_params"] = old_model_params
out_shape_list = [
"model.Default.atomic_model.out_bias",
"model.Default.atomic_model.out_std",
]
for kk in out_shape_list:
state_dict[kk] = state_dict[kk][:1, :, :1]
self.wrapper.load_state_dict(state_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The checkpoint fallback is too shape-specific.

This recovery path catches any load_state_dict() RuntimeError, then hard-codes model.Default.atomic_model.out_bias/out_std and truncates them to [:1, :, :1]. That only matches one branch/output shape; multitask checkpoints or multi-output heads will either miss the right keys or silently load corrupted statistics.

Comment on lines 1165 to +1168
for item_key in batch_data:
if item_key in input_keys:
input_dict[item_key] = batch_data[item_key]
if item_key != "fparam" or batch_data["find_fparam"] != 0.0:
input_dict[item_key] = batch_data[item_key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard find_fparam lookups for batches that only carry fparam.

This new gate unconditionally indexes batch_data["find_fparam"] whenever fparam is present. Other paths already treat find_fparam as optional, so loaders that only emit required fparam values can now hit a KeyError.

Suggested fix
             if item_key in input_keys:
-                if item_key != "fparam" or batch_data["find_fparam"] != 0.0:
+                if item_key != "fparam" or batch_data.get("find_fparam", 1.0) != 0.0:
                     input_dict[item_key] = batch_data[item_key]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for item_key in batch_data:
if item_key in input_keys:
input_dict[item_key] = batch_data[item_key]
if item_key != "fparam" or batch_data["find_fparam"] != 0.0:
input_dict[item_key] = batch_data[item_key]
for item_key in batch_data:
if item_key in input_keys:
if item_key != "fparam" or batch_data.get("find_fparam", 1.0) != 0.0:
input_dict[item_key] = batch_data[item_key]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/train/training.py` around lines 1165 - 1168, The conditional that
checks find_fparam currently assumes batch_data["find_fparam"] exists when
"fparam" is present; update the guard in the loop over input_keys (the block
using item_key, input_keys, batch_data, input_dict) to avoid KeyError by
checking for the key first or using a safe lookup (e.g.,
batch_data.get("find_fparam", 0.0)) so the condition becomes: if item_key !=
"fparam" or batch_data.get("find_fparam", 0.0) != 0.0, then assign
input_dict[item_key] = batch_data[item_key].

Comment on lines +139 to 149
frac_prob = (
model_key_prob_map[model_key_link]
/ model_key_prob_map[model_key_base]
)
link_class.share_params(
base_class, shared_level_link, resume=resume
base_class,
shared_level_link,
model_prob=frac_prob,
protection=data_stat_protect,
resume=resume,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential division by zero when computing frac_prob.

If model_key_prob_map[model_key_base] is zero, this will raise a ZeroDivisionError. While the base model probability is likely always positive in practice, defensive coding would prevent unexpected failures.

🛡️ Proposed fix to add protection
                         frac_prob = (
                             model_key_prob_map[model_key_link]
                             / model_key_prob_map[model_key_base]
                         )
+                        if model_key_prob_map[model_key_base] == 0:
+                            raise ValueError(
+                                f"Base model '{model_key_base}' has zero probability in model_key_prob_map"
+                            )

Or preemptively check before the division:

+                        base_prob = model_key_prob_map[model_key_base]
+                        if base_prob == 0:
+                            raise ValueError(
+                                f"Base model '{model_key_base}' has zero probability"
+                            )
                         frac_prob = (
                             model_key_prob_map[model_key_link]
-                            / model_key_prob_map[model_key_base]
+                            / base_prob
                         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
frac_prob = (
model_key_prob_map[model_key_link]
/ model_key_prob_map[model_key_base]
)
link_class.share_params(
base_class, shared_level_link, resume=resume
base_class,
shared_level_link,
model_prob=frac_prob,
protection=data_stat_protect,
resume=resume,
)
base_prob = model_key_prob_map[model_key_base]
if base_prob == 0:
raise ValueError(
f"Base model '{model_key_base}' has zero probability"
)
frac_prob = (
model_key_prob_map[model_key_link]
/ base_prob
)
link_class.share_params(
base_class,
shared_level_link,
model_prob=frac_prob,
protection=data_stat_protect,
resume=resume,
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/train/wrapper.py` around lines 139 - 149, The computation of
frac_prob may divide by zero when model_key_prob_map[model_key_base] is zero;
before computing frac_prob, check the denominator
(model_key_prob_map[model_key_base]) and handle the zero case defensively (e.g.,
set frac_prob to 0 or a small epsilon, or log/raise a clear exception) and then
call link_class.share_params(base_class, shared_level_link,
model_prob=frac_prob, protection=data_stat_protect, resume=resume); ensure you
reference the same symbols model_key_prob_map, model_key_link, model_key_base,
frac_prob and link_class.share_params when making the change.

Comment on lines +2877 to +2907
@fitting_args_plugin.register("ener_readout")
def fitting_ener_readout():
doc_numb_fparam = "The dimension of the frame parameter. If set to >0, file `fparam.npy` should be included to provided the input fparams."
doc_numb_aparam = "The dimension of the atomic parameter. If set to >0, file `aparam.npy` should be included to provided the input aparams."
doc_dim_case_embd = "The dimension of the case embedding embedding. When training or fine-tuning a multitask model with case embedding embeddings, this number should be set to the number of model branches."
doc_neuron = "The number of neurons in each hidden layers of the fitting net. When two hidden layers are of the same size, a skip connection is built."
doc_activation_function = f'The activation function in the fitting net. Supported activation functions are {list_to_doc(ACTIVATION_FN_DICT.keys())} Note that "gelu" denotes the custom operator version, and "gelu_tf" denotes the TF standard version. If you set "None" or "none" here, no activation function will be used.'
doc_precision = f"The precision of the fitting net parameters, supported options are {list_to_doc(PRECISION_DICT.keys())} Default follows the interface precision."
doc_resnet_dt = 'Whether to use a "Timestep" in the skip connection'
doc_trainable = f"Whether the parameters in the fitting net are trainable. This option can be\n\n\
- bool: True if all parameters of the fitting net are trainable, False otherwise.\n\n\
- list of bool{doc_only_tf_supported}: Specifies if each layer is trainable. Since the fitting net is composed by hidden layers followed by a output layer, the length of this list should be equal to len(`neuron`)+1."
doc_rcond = "The condition number used to determine the initial energy shift for each type of atoms. See `rcond` in :py:meth:`numpy.linalg.lstsq` for more details."
doc_seed = "Random seed for parameter initialization of the fitting net"
doc_atom_ener = "Specify the atomic energy in vacuum for each type"
doc_layer_name = (
"The name of the each layer. The length of this list should be equal to n_neuron + 1. "
"If two layers, either in the same fitting or different fittings, "
"have the same name, they will share the same neural network parameters. "
"The shape of these layers should be the same. "
"If null is given for a layer, parameters will not be shared."
)
doc_use_aparam_as_mask = (
"Whether to use the aparam as a mask in input."
"If True, the aparam will not be used in fitting net for embedding."
"When descrpt is se_a_mask, the aparam will be used as a mask to indicate the input atom is real/virtual. And use_aparam_as_mask should be set to True."
)

return [
Argument("numb_fparam", int, optional=True, default=0, doc=doc_numb_fparam),
Argument("numb_aparam", int, optional=True, default=0, doc=doc_numb_aparam),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if EnergyFittingNetReadout implementation supports fparam/default_fparam
rg -n "class.*EnergyFittingNetReadout|default_fparam" --type py -A 5 -B 2

Repository: deepmodeling/deepmd-kit

Length of output: 50379


🏁 Script executed:

#!/bin/bash
# View the complete fitting_ener_readout function to see all arguments
sed -n '2877,2950p' deepmd/utils/argcheck.py

Repository: deepmodeling/deepmd-kit

Length of output: 4348


🏁 Script executed:

#!/bin/bash
# View lines beyond 2950 to see if there are more arguments
sed -n '2940,3000p' deepmd/utils/argcheck.py

Repository: deepmodeling/deepmd-kit

Length of output: 1974


🏁 Script executed:

#!/bin/bash
# View the EnergyFittingNetReadout class definition and its __init__ method
sed -n '275,330p' deepmd/pt/model/task/ener.py

Repository: deepmodeling/deepmd-kit

Length of output: 1961


🏁 Script executed:

#!/bin/bash
# Search for InvarFitting class definition and its __init__
rg -n "class InvarFitting" deepmd/pt/model/task/ -A 50 | head -80

Repository: deepmodeling/deepmd-kit

Length of output: 4279


🏁 Script executed:

#!/bin/bash
# View the InvarFitting __init__ method signature
sed -n '88,140p' deepmd/pt/model/task/invar_fitting.py

Repository: deepmodeling/deepmd-kit

Length of output: 1891


Add default_fparam argument to fitting_ener_readout() for consistency.

The fitting_ener_readout() function is missing the default_fparam argument that is present in all other fitting functions (fitting_ener, fitting_dos, fitting_property, fitting_polar, fitting_dipole). Since EnergyFittingNetReadout extends InvarFitting, which supports default_fparam, and the function already defines numb_fparam, the default_fparam argument should be added to maintain consistency.

Additionally, the EnergyFittingNetReadout.__init__() signature needs to include the default_fparam parameter and pass it to the parent class.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/utils/argcheck.py` around lines 2877 - 2907, The function
fitting_ener_readout() is missing the default_fparam argument; add an
Argument("default_fparam", type=appropriate, optional=True, default=None,
doc=...) alongside numb_fparam/numb_aparam in fitting_ener_readout() so its
signature matches other fitting_* functions, and then update the
EnergyFittingNetReadout.__init__() to accept default_fparam and pass it through
to the parent InvarFitting (or to super().__init__(...,
default_fparam=default_fparam)) so the readout class forwards the value to the
base class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants