Skip to content

test(pt_expt): add .pt2 (AOTInductor) unit tests and bug fixes#5334

Open
wanghan-iapcm wants to merge 48 commits intodeepmodeling:masterfrom
wanghan-iapcm:feat-pt-expt-pt2-uts
Open

test(pt_expt): add .pt2 (AOTInductor) unit tests and bug fixes#5334
wanghan-iapcm wants to merge 48 commits intodeepmodeling:masterfrom
wanghan-iapcm:feat-pt-expt-pt2-uts

Conversation

@wanghan-iapcm
Copy link
Collaborator

@wanghan-iapcm wanghan-iapcm commented Mar 23, 2026

Summary

Comprehensive .pt2 test coverage for the pt_expt backend, plus bug fixes discovered during testing.

Bug fixes

  • Default fparam segfault: When fparam is omitted on a model with dim_fparam > 0, the C++ and Python inference paths now substitute default values from metadata instead of passing a shape-mismatched tensor (which caused a segfault in AOTInductor)
  • NoPBC negative coordinates: Shift coordinates so minimum is at rcut before building the fake box, ensuring atoms with negative coordinates are properly handled by copy_coord/build_nlist
  • CodeQL integer overflow: Use ptrdiff_t for all operands in fold_back iterator arithmetic
  • Custom op fallback: Add missing tabulate_fusion_se_t_tebd Python fallback stub (all other tabulate ops had one)

New Python tests

  • Freeze: .pt2 freeze, .pte vs .pt2 eval consistency, NoPBC with negative coords
  • Default fparam: Eval without fparam matches eval with explicit default value
  • Compress: Freeze → compress → .pt2 output → DeepPot eval, min_nbor_dist roundtrip
  • Finetune: Train → freeze to .pt2 → finetune from .pt2, with --use-pretrain-script
  • Change-bias: Change-bias on .pt2 with data, user-defined values, .pte/.pt2 consistency

New C++ tests

  • NoPBC atomic: Verify atomic energy/virial for NoPBC path
  • Multi-frame: 2-frame PBC and NoPBC via compute_mixed_type
  • Default fparam: Internal nlist + external nlist (LAMMPS path) — eval without fparam matches eval with explicit default
  • Parser/metadata: Invalid ZIP/file errors, metadata accessors (type_map, cutoff, ntypes, dim_fparam/aparam, has_default_fparam), JSON type coverage across 3 models

Refactoring

  • Extract common gen script helpers into gen_common.py (eliminates duplication across 5 scripts)
  • Use CXX env var instead of hardcoded compiler fallback list
  • Simplify custom op .so install in test_cc_local.sh

Test plan

  • All Python tests pass locally (CPU)
  • All C++ tests pass locally (CPU)
  • Python tests pass on GPU (freeze, finetune, change-bias)
  • Compression tests skip gracefully when custom ops unavailable

Summary by CodeRabbit

  • New Features

    • Models can include a default frame parameter used automatically during inference when none is provided.
    • Exported metadata now records default-frame-parameter data for frozen PT2 artifacts.
  • Bug Fixes

    • Inference now raises a clear error if a PT2 model requires a default frame parameter but none is available.
  • Tests

    • Expanded coverage for default-frame-parameter behavior, multi-frame/PT2 inference, per-atom outputs, freeze/load/eval, change-bias, finetune-from-PT2, and compression round-trips.

Han Wang added 30 commits March 21, 2026 21:28
- add support for serialization to .pt2
- deep eval with .pt2
- C/C++ inference api with .pt2
- add ut for python inference
- add ut for c++/c inference: se_e2_a, fparam/aparam, multi-frames, dpa1
Replace flat (nf*nall,) indexing in dpa1.py and exclude_mask.py with
xp_take_along_axis and xp_take_first_n. The flat indexing creates
Ne(nall, nloc) assertions during torch.export tracing that fail when
nall == nloc (NoPbc). The new gather-based approach avoids these
shape constraints.

Unify PT/pt_expt test models so .pth and .pt2 are generated from the
same dpmodel weights (type_one_side=True) via gen scripts. Remove
deeppot_sea.pth, deeppot_dpa.pth, and fparam_aparam.pth from git
tracking — they are now regenerated in CI via convert-models.sh.

Changes:
- dpa1.py, exclude_mask.py: xp_take_along_axis replaces flat indexing
- gen_dpa1.py: generates deeppot_dpa1.pth + .pt2 from dpmodel
- gen_fparam_aparam.py: generates fparam_aparam.pth + .pt2 from dpmodel
- convert-models.sh: regenerate .pth from yaml and run gen scripts in CI
- test_deeppot_dpa_ptexpt.cc: add NoPbc test fixture
- test_deeppot_dpa1_pt.cc: new DPA1 .pth test (PBC + NoPbc)
- test_deeppot_a_fparam_aparam_pt.cc: update reference values
…ckends

Fix a latent bug in all 5 C++ backends (DeepPotPT, DeepPotPTExpt,
DeepPotJAX, DeepSpinPT, DeepPotPD) where the ghost-to-local mapping
was incorrectly indexed through fwd_map when virtual atoms are present.
The old code `mapping[ii] = lmp_list.mapping[fwd_map[ii]]` treats the
post-filter index as an original index, causing undefined behavior when
fwd_map contains -1 entries for filtered atoms. The fix uses bkw_map
to correctly translate: new_idx -> orig_idx -> mapped_orig -> new_mapped.

Also fix a use-after-free in DeepPotPTExpt.cc where torch::from_blob
referenced a local vector's data after the vector went out of scope.

Additionally fix dpmodel code (dpa2.py, repformers.py, make_model.py,
nlist.py) to use xp_take_first_n and xp_expand_dims instead of slice
indexing that creates shape constraints incompatible with torch.export.

Add DPA2 C++ test files for both .pth and .pt2 backends with PBC,
NoPbc, and type_sel test cases. Reduce DPA2 test model size for
machine-precision agreement between jit and export paths.
Fix torch.export-incompatible slicing in dpmodel DPA3/repflows code
([:, :nloc] → xp_take_first_n, reshape with -1 → expand_dims), add
gen_dpa3.py model generation script, and create C++ test files for
both .pth and .pt2 backends with PBC and NoPbc fixtures.
Restore deeppot_sea.pth and deeppot_dpa.pth to git tracking — CI
cannot regenerate them because tabulate_fusion_se_t_tebd custom op
is not available in the Python-only CI environment. Remove the
dp convert-backend .pth lines from convert-models.sh.

Add try/except around .pth export in gen_dpa1.py and
gen_fparam_aparam.py so the scripts don't fail when custom ops
are unavailable — only .pt2 generation is required.
make_fx tracing through torch.autograd.grad fails on CUDA with
"opt_ready_stream && opt_parent_stream INTERNAL ASSERT FAILED".
Fix by running make_fx on CPU (the graph is device-agnostic), then
moving sample inputs to the target device before torch.export.export
and AOTInductor compilation so the compiled package targets the
correct device (GPU when available).
make_fx with _allow_non_fake_inputs=True on CUDA triggers autograd
stream assertion (opt_ready_stream && opt_parent_stream). Fix by:
1. Tracing via make_fx on CPU (decomposes autograd.grad into aten ops)
2. Extracting output keys from CPU-traced module
3. Moving traced module + inputs to target device (CUDA if available)
4. Running torch.export.export on target device (the traced graph has
   no autograd.grad calls, so no CUDA stream issue)
This ensures AOTInductor compiles for the correct device.
The -fsanitize=leak flag produces sanitized .so files that fail to
dlopen in Python processes lacking the LSAN runtime.  Preload liblsan.so
via LD_PRELOAD when CXXFLAGS contains the leak sanitizer.

Also use os.path.realpath in gen scripts for robust path resolution and
print diagnostic on library loading failure instead of silently passing.
Replace manual traced-module-to-device approach with the official
torch.export.passes.move_to_device_pass API. This properly rewrites
device='cpu' constants baked into the FX graph by make_fx, avoiding
FakeTensor device propagation errors on CUDA (pytorch/pytorch#121761).
- C++ JSON/ZIP parser: add EOF checks, bounds validation, stod
  exception handling, int64_t loop var to prevent unsigned underflow
- C++ from_blob: add .clone() to all calls to prevent use-after-free
- C++ translate_error: catch std::exception instead of std::runtime_error
- Python deep_eval/serialization: validate ZIP entries before reading
- gen_*.py: move custom op loading after deepmd imports to avoid double
  registration; add inductor compiler fallbacks for CI; remove unused
  subprocess imports
LSAN runtime inherited by g++ subprocesses causes false leak reports
and non-zero exit codes, making torch._inductor think no C++ compiler
works. Clear LD_PRELOAD after custom ops are loaded since it's no
longer needed.
Add has_default_fparam support to DeepPotPTExpt:
- Embed has_default_fparam in model_def_script.json metadata during
  freeze (_collect_metadata)
- Read it in C++ init from metadata, with fallback to false for older
  .pt2 files
- Add C++ tests for both has_default_fparam==false and ==true
- Generate fparam_aparam_default.pt2 test model with default_fparam set
- Guard EOCD scan against short/corrupt .pt2 files (P1)
- Parse ZIP64 fields using full 64-bit widths (P2)
- Match ZIP entry names by suffix for archives with dir prefixes
- Reject unsupported in-memory file_content in DeepPotPTExpt::init
- Replace debug-only assert with runtime exception for output count
- Fix virtual atom coordinate copy: nvir -> nvir*3 in 3 test files
- Tighten DPA2 float EPSILON from 1e-1 to 1e-4, add TODO for ghost
- Narrow exception handling in gen_dpa3.py custom op loading
- Add parity assertions (not just prints) in gen_dpa3.py/.py exports
- Guard .pth parity check on successful export in gen_fparam_aparam.py
- Extend parity check to virial and atomic outputs
- Update deep_eval.py comment on private torch API
The gen_*.py scripts import torch, which has internal pybind/JIT
allocations that LSAN reports as leaks.  These are not our leaks.
Previously, LD_PRELOAD=liblsan.so was set for the gen scripts to
handle dlopen of the sanitizer-built custom op .so, but this causes
LSAN to track all Python/torch allocations and fail on exit.

Fix: run gen scripts without LD_PRELOAD.  The gen scripts don't
need LSAN — only the C++ test binary (ctest) does, and ctest
already inherits LSAN via the -fsanitize=leak compile flag.

Also revert the DPA2 cpu_lmp_nlist test (needs further investigation).
- Remove unused variable nloc in gen_fparam_aparam.py
- Cast int multiplications to size_t before use as iterator offsets
  in DeepPotPTExpt multi-frame compute to prevent overflow
The gen scripts need LD_PRELOAD=liblsan.so to dlopen the custom op
.so built with -fsanitize=leak.  But LSAN leak detection must be
disabled (detect_leaks=0) to avoid false reports from torch/paddle
internal allocations.
The AOTInductor model_package_loader.h header may not exist on all
platforms (e.g. macOS x86_64 torch builds).  Use __has_include to
detect it and define BUILD_PT_EXPT accordingly, so DeepPotPTExpt
compiles as an empty TU when the header is missing.
The _load_custom_ops() call was placed before deepmd.pt was imported,
so the guard `hasattr(torch.ops.deepmd, "border_op")` didn't see the
op registered by the standard install path. This caused the build
directory's .so to be loaded, and then when deepmd.pt was imported
later for .pth export, it loaded the same op again — crash.

Fix: move _load_custom_ops() to after the deepmd.pt import so the
guard correctly detects the already-registered op and skips.
The gen scripts import deepmd.pt which loads custom ops from
deepmd/lib/.  Without the .so installed there, the custom op
(tabulate_fusion_se_t_tebd) is unavailable and .pth export fails
for DPA2/DPA3 models, causing C++ tests to fail on missing files.

Fix: copy libdeepmd_op_pt.so from the build directory to the
deepmd package's lib/ directory after cmake --build, before
running gen scripts.
The custom op .so (libdeepmd_op_pt.so) has undefined symbols for
compute kernels (tabulate_fusion_se_t_tebd_cpu etc.) that live in
libdeepmd.so.  Without dp_test/lib/ in LD_LIBRARY_PATH, dlopen
succeeds (lazy binding) but torch.jit.script fails when it tries
to resolve the symbols.
The script was copying libdeepmd_op_pt.so to
os.path.dirname(deepmd.__file__)+"/lib" which resolves to the workspace
source tree, but cxx_op.py loads from SHARED_LIB_DIR (deepmd.lib.__path__)
which resolves to site-packages. For editable installs these diverge,
so the custom ops were never loaded and ENABLE_CUSTOMIZED_OP was False.

This caused 10 C++ test failures:
- 8 DPA2 .pth tests: tabulate_fusion_se_t_tebd has no Python fallback
- 2 DPA3 .pth NoPbc tests: border_op fallback raises NotImplementedError
…_local.sh

Move custom op .so install, LSAN setup, and gen script invocations
into a single ENABLE_PYTORCH guard block. Update comment to reference
SHARED_LIB_DIR instead of deepmd/lib/.
- common.h fold_back: use ptrdiff_t for loop variables to prevent
  int multiplication overflow before widening to difference_type
- serialization.py: unify import style for deepmd.pt_expt.utils.env
  (use `import ... as _env` consistently, remove `from ... import DEVICE`)
Without the custom op .so loaded, torch.jit.script fails on se_t_tebd
with "object has no attribute tabulate_fusion_se_t_tebd". Add the same
NotImplementedError fallback pattern used by tabulate_fusion_se_t,
tabulate_fusion_se_a, tabulate_fusion_se_atten, and border_op.
The previous fix only changed loop variables but nall*ndim and
nloc*ndim were still computed as int*int before widening. Cast
all parameters to ptrdiff_t at function entry.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50507c6af7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@codecov
Copy link

codecov bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 77.67857% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.25%. Comparing base (e97967b) to head (808b9c7).

Files with missing lines Patch % Lines
source/api_cc/tests/test_deeppot_ptexpt.cc 73.94% 37 Missing ⚠️
source/api_cc/src/DeepPotPTExpt.cc 80.00% 2 Missing and 4 partials ⚠️
...pi_cc/tests/test_deeppot_a_fparam_aparam_ptexpt.cc 87.23% 6 Missing ⚠️
deepmd/pt_expt/infer/deep_eval.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5334      +/-   ##
==========================================
- Coverage   82.28%   82.25%   -0.03%     
==========================================
  Files         797      797              
  Lines       82100    82323     +223     
  Branches     4003     4020      +17     
==========================================
+ Hits        67557    67717     +160     
- Misses      13336    13393      +57     
- Partials     1207     1213       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Han Wang added 9 commits March 24, 2026 08:27
- Store default_fparam values in .pt2 metadata during serialization
- Python inference: when fparam is omitted and model has default,
  substitute default values from metadata instead of passing None
- C++ inference: read default_fparam from metadata and use when
  fparam is empty (without fix: segfault from shape mismatch)
- NoPBC: shift coordinates so minimum is at rcut, ensuring atoms
  with negative coords are inside [0, L) for copy_coord/build_nlist
- Add C++ tests: eval_default_vs_explicit (internal nlist) and
  eval_default_vs_explicit_lmp (external nlist)
- Add Python tests: test_pt2_eval_default_fparam and
  test_freeze_pt2_nopbc_negative_coords
Resolve conflict in serialization.py: keep .pt2 dispatch in
serialize_from_file and add model_params.json support from upstream
(finetune --use-pretrain-script) to both .pte and .pt2 paths.
- NoPbc cpu_build_nlist_atomic: verify atomic energy/virial for NoPBC
- PBC cpu_build_nlist_nframes: 2-frame PBC via compute_mixed_type
- NoPbc cpu_build_nlist_nframes: 2-frame NoPBC via compute_mixed_type

These cover previously untested code paths in DeepPotPTExpt::compute
(NoPBC internal nlist) and compute_nframes (multi-frame wrapper).
- test_finetune_from_pt2: train → freeze to .pt2 → finetune from
  .pt2 → verify inherited weights match pretrained
- test_finetune_from_pt2_use_pretrain_script: train → freeze to
  .pt2 → finetune with --use-pretrain-script → verify config updated
…Expt

- Parser error tests: nonexistent file, invalid ZIP, truncated file
- Metadata accessor tests: type_map, cutoff, ntypes, dim_fparam/aparam
- JSON type coverage: integers, booleans, strings, float, arrays
  via SE_A model (no fparam) and fparam_aparam model
- Default fparam JSON parsing: boolean true + float array
  via fparam_aparam_default model
Resolve conflicts: keep default_fparam logic, NoPBC coord shift,
and new tests from our branch on top of upstream's merged feat-pt-expt-c.
- test_change_bias_frozen_pt2: change-bias with data on .pt2 model
- test_change_bias_frozen_pt2_user_defined: user-defined bias on .pt2
- test_change_bias_pt2_pte_consistency: .pte and .pt2 produce same bias
@wanghan-iapcm wanghan-iapcm changed the title test(pt_expt): add .pt2 (AOTInductor) unit tests test(pt_expt): add .pt2 (AOTInductor) unit tests and bug fixes Mar 25, 2026
@wanghan-iapcm wanghan-iapcm marked this pull request as ready for review March 25, 2026 01:47
@wanghan-iapcm wanghan-iapcm requested a review from njzjz March 25, 2026 01:47
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a45b88af9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds exporter-stored model default_fparam, Python inference and C++ runtime use that default when callers omit fparam for PT2 models with nonzero fparam dimension; tests and serializers updated across Python and C++ layers.

Changes

Cohort / File(s) Summary
Python inference & exporter
deepmd/pt_expt/infer/deep_eval.py, deepmd/pt_expt/utils/serialization.py
Exporter now includes "default_fparam" in metadata. _eval_model constructs fparam_t from metadata["default_fparam"] (float64, broadcast to (nframes, dim_fparam)) when caller omits fparam for .pt2; raises if missing.
C++ inference API
source/api_cc/include/DeepPotPTExpt.h, source/api_cc/src/DeepPotPTExpt.cc
Added private default_fparam_ and initialize from metadata. compute overloads build fparam_tensor from default_fparam_ when caller fparam is empty; throw when dfparam>0 and no default exists. Adjusted NoPBC fake-box path to shift coordinates so minima equal rcut.
C++ unit tests
source/api_cc/tests/test_deeppot_a_fparam_aparam_ptexpt.cc, source/api_cc/tests/test_deeppot_ptexpt.cc
Added typed tests for default vs explicit fparam equivalence (including external/LAMMPS nlist), multi-frame PT2 inference, atomic outputs, metadata parsing, and PT2 init error cases.
Python integration tests
source/tests/pt_expt/model/test_model_compression.py, source/tests/pt_expt/test_dp_freeze.py, source/tests/pt_expt/test_finetune.py, source/tests/pt_expt/test_change_bias.py
Added tests covering freeze→.pt2 output and eval parity, NoPBC/negative-coordinate PT2 behavior, finetune-from-.pt2 flows, change-bias on .pt2, compression round-trips preserving min_nbor_dist, and default-fparam eval behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Exporter as Exporter\n(Py)
    participant Metadata as Metadata\n(JSON)
    participant InferPy as Infer\n(Py)
    participant Runtime as Runtime\n(C++)
    participant Runner as Runner\n(Model)

    Exporter->>Metadata: write model + "default_fparam"
    InferPy->>Metadata: load model + metadata
    alt caller omits fparam and dim_fparam>0
        InferPy->>InferPy: build fparam_t from metadata["default_fparam"]\n(dtype=float64, broadcast to nframes)
    end
    InferPy->>Runtime: call compute(fparam_t)
    alt fparam empty in C++ compute
        Runtime->>Runtime: construct fparam tensor from default_fparam_\n(device float64, broadcast)
    end
    Runtime->>Runner: invoke model runner with fparam tensor
    Runner-->>Runtime: energies, forces, virials
    Runtime-->>InferPy: return results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • njzjz
  • caic99
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding .pt2 unit tests and bug fixes for the pt_expt backend, which aligns with the comprehensive test additions and bug fixes described in the PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deepmd/pt_expt/utils/serialization.py (1)

188-199: ⚠️ Potential issue | 🔴 Critical

Convert default_fparam tensor to JSON-serializable format before storage.

model.get_default_fparam() returns a torch.Tensor | None, but the metadata dict is serialized via json.dumps() which cannot serialize tensors. When a model with a default fparam is frozen, this will raise TypeError: Object of type Tensor is not JSON serializable. Convert the tensor to a list or None:

Suggested fix
+    default_fparam = model.get_default_fparam()
     return {
         "type_map": model.get_type_map(),
         "rcut": model.get_rcut(),
         "sel": model.get_sel(),
         "model_output_type": model.model_output_type(),
         "dim_fparam": model.get_dim_fparam(),
         "dim_aparam": model.get_dim_aparam(),
         "mixed_types": model.mixed_types(),
         "sel_type": model.get_sel_type(),
         "has_default_fparam": model.has_default_fparam(),
-        "default_fparam": model.get_default_fparam(),
+        "default_fparam": (
+            None
+            if default_fparam is None
+            else default_fparam.detach().cpu().tolist()
+        ),
         "fitting_output_defs": fitting_output_defs,
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/utils/serialization.py` around lines 188 - 199, The metadata
dict contains "default_fparam" which is assigned model.get_default_fparam() (a
torch.Tensor or None) and will fail json.dumps; update the serialization to call
model.get_default_fparam(), check for None, and if not None convert the tensor
to a plain Python list (e.g., via .tolist()) before inserting into the dict so
"default_fparam" is JSON-serializable; modify the code around the return dict
(the keys "default_fparam" and related model.get_default_fparam()) to perform
this conversion.
🧹 Nitpick comments (1)
source/api_cc/tests/test_deeppot_ptexpt.cc (1)

656-669: Drop TYPED_TEST from the metadata-only suites.

TypeParam is unused in these fixtures and assertions, so the same .pt2 files get loaded once per ValueTypes entry without increasing coverage. Converting these to TEST_F would keep the checks and reduce duplicate runtime/noise.

Also applies to: 699-712, 735-748

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/api_cc/tests/test_deeppot_ptexpt.cc` around lines 656 - 669, The test
fixture TestDeepPotPTExptMetadata (and the other metadata-only fixtures noted)
doesn't use TypeParam, so keep a single instantiation: remove
TYPED_TEST_SUITE/TYPED_TEST usage and convert the suite to a plain TEST_F-style
test class; e.g., replace the TYPED_TEST_SUITE(TestDeepPotPTExptMetadata,
ValueTypes) and any TYPED_TESTs that use it with regular TEST_F tests that call
dp.init("../../tests/infer/deeppot_sea.pt2") in SetUp, and delete any unused
TypeParam references; apply the same change for the other metadata-only fixtures
referenced in the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 828-836: The code currently creates a zero-length fparam_tensor
when default_fparam_ is empty which breaks the AOTInductor model if dfparam > 0;
modify the fparam_tensor construction (where default_fparam_ and fparam_tensor
are handled) to check dfparam: if dfparam > 0 create a zeros tensor shaped {1,
dfparam} with the same options and .to(device), otherwise keep the zero-length
zeros({0}) behavior; apply the identical change to both places where
fparam_tensor is built (the two occurrences around the original diff and the
later one referenced in the comment) and ensure this aligns with how run_model()
conditions inclusion of fparam in the input list.

In `@source/api_cc/tests/test_deeppot_ptexpt.cc`:
- Around line 631-638: The test writes a temporary file then calls
deepmd::DeepPot::init(tmpfile) but doesn't assert the write succeeded, so
failures to create the file can mask invalid-ZIP handling; modify the test to
(1) use a unique temp path (e.g., include PID/timestamp or tmpnam-style string)
for tmpfile, (2) after creating std::ofstream ofs(tmpfile, std::ios::binary)
assert ofs.is_open() and that the write succeeded (e.g., check ofs.good() or
verify bytes written) before closing the scope, and then call dp.init(tmpfile)
and remove the file; apply the same change to the other similar block around the
dp.init call.
- Around line 541-578: The test cpu_build_nlist_nframes is appending an
identical second frame so it won't catch frame-specific bugs; change the
second-frame inputs (coord_2f, atype_2f, and box_2f for PBC case) to be a
perturbed copy (e.g., shift one atom coordinate and adjust box if needed), then
after calling dp.compute_mixed_type(...) verify each frame by calling
dp.compute(...) on the single modified frame and comparing that single-frame
energy/force/virial to the corresponding slice from the batched outputs; apply
the same change to the other batched test referenced (the block around 581-615)
and use symbol names dp.compute_mixed_type and dp.compute to locate the code to
modify.

In `@source/tests/pt_expt/model/test_model_compression.py`:
- Around line 340-343: The test unpacks e_pt2, f_pt2, v_pt2 from dp.eval(coord,
box, atype) but only asserts energy, leaving f_pt2 and v_pt2 unused (Ruff error
and missing derivative checks); either add assertions comparing f_pt2 and v_pt2
to ret_frozen["forces"] and ret_frozen["virial"] (using
np.testing.assert_allclose with appropriate atol and err_msg) or, if you intend
energy-only coverage, change the unpack to e_pt2, _, _ = dp.eval(coord, box,
atype) to avoid unused variables; update the lines around dp.eval and the
existing energy assertion accordingly.

In `@source/tests/pt_expt/test_dp_freeze.py`:
- Around line 124-126: The test calls dp.eval(coord, box, atype) and assigns (e,
f, v) but never uses v, which triggers RUF059 and omits the virial shape check;
fix by either asserting the virial shape (e.g., use self.assertEqual(v.shape,
(1, 1)) or the correct expected shape) or, if you intentionally don't check it,
rename v to _v to signal it's unused; update the lines around dp.eval and the
subsequent self.assertEqual calls accordingly and run ruff check . and ruff
format . before committing.

In `@source/tests/pt_expt/test_finetune.py`:
- Around line 895-935: The test is reloading the finetuned checkpoint as the
"pretrained" baseline because ckpt_path (created by _train_pretrained) points to
the same model.ckpt.pt that the finetune run overwrites; before calling
main([...,"--finetune", pt2_path,...]) capture and cache the pretrained state
returned by torch.load(ckpt_path, map_location=DEVICE, weights_only=True) into
pre_model_state (or save it to a uniquely named file) and then run main so that
_assert_inherited_weights_match(ft_model_state, pre_model_state,
random_fitting=False) compares the original pretrained weights to the finetuned
output instead of reloading the overwritten file; alternatively, change the
finetune output path (ft_ckpt) to a different filename so ckpt_path remains
untouched.

---

Outside diff comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 188-199: The metadata dict contains "default_fparam" which is
assigned model.get_default_fparam() (a torch.Tensor or None) and will fail
json.dumps; update the serialization to call model.get_default_fparam(), check
for None, and if not None convert the tensor to a plain Python list (e.g., via
.tolist()) before inserting into the dict so "default_fparam" is
JSON-serializable; modify the code around the return dict (the keys
"default_fparam" and related model.get_default_fparam()) to perform this
conversion.

---

Nitpick comments:
In `@source/api_cc/tests/test_deeppot_ptexpt.cc`:
- Around line 656-669: The test fixture TestDeepPotPTExptMetadata (and the other
metadata-only fixtures noted) doesn't use TypeParam, so keep a single
instantiation: remove TYPED_TEST_SUITE/TYPED_TEST usage and convert the suite to
a plain TEST_F-style test class; e.g., replace the
TYPED_TEST_SUITE(TestDeepPotPTExptMetadata, ValueTypes) and any TYPED_TESTs that
use it with regular TEST_F tests that call
dp.init("../../tests/infer/deeppot_sea.pt2") in SetUp, and delete any unused
TypeParam references; apply the same change for the other metadata-only fixtures
referenced in the comment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a171353c-1edf-4ab1-9928-d0e66a75b8aa

📥 Commits

Reviewing files that changed from the base of the PR and between 97e9c9e and 1a45b88.

📒 Files selected for processing (9)
  • deepmd/pt_expt/infer/deep_eval.py
  • deepmd/pt_expt/utils/serialization.py
  • source/api_cc/include/DeepPotPTExpt.h
  • source/api_cc/src/DeepPotPTExpt.cc
  • source/api_cc/tests/test_deeppot_a_fparam_aparam_ptexpt.cc
  • source/api_cc/tests/test_deeppot_ptexpt.cc
  • source/tests/pt_expt/model/test_model_compression.py
  • source/tests/pt_expt/test_dp_freeze.py
  • source/tests/pt_expt/test_finetune.py

Han Wang added 2 commits March 25, 2026 13:05
- deep_eval.py: add .contiguous() on expanded default fparam; raise
  ValueError when no default available instead of silent zero injection
- test_deeppot_ptexpt.cc: use distinct coord_alt for frame 1 in
  multi-frame tests; assert temp file creation in parser error tests
- test_model_compression.py: assert force/virial on .pt2 compress path
- test_dp_freeze.py: assert virial shape in .pt2 smoke test
- test_finetune.py: copy pretrained checkpoint before finetune overwrites
…fault

Consistent with the Python side: throw deepmd_exception instead of
passing a shape-mismatched {0} tensor that would crash in AOTInductor.
@wanghan-iapcm wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 25, 2026
@github-actions github-actions bot removed the Test CUDA Trigger test CUDA workflow label Mar 25, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/tests/pt_expt/test_change_bias.py`:
- Around line 311-315: The test only asserts the first two entries of the
deserialized bias (using updated_bias.flatten()[:2]) which can miss
extra/mis-shaped values; replace that partial check with a full-array comparison
by reshaping the expected user bias to match updated_bias.shape and using
np.testing.assert_allclose(updated_bias, expected_bias) so the entire bias
tensor from BaseModel.get_out_bias (after serialize_from_file and
BaseModel.deserialize) is validated.

In `@source/tests/pt_expt/test_dp_freeze.py`:
- Around line 159-191: The test test_freeze_pt2_nopbc_negative_coords currently
only compares pt2 vs pte (dp_pt2.eval vs dp_pte.eval) so it won't detect
regressions in the shared frontend; instead create an independent oracle by
manually shifting the negative coord array into the [0,L) range and using that
shifted_coord as the reference for energy/force/virial, then compare
dp_pt2.eval(coord, None, atype) against the oracle results; concretely, compute
an offset = -coord.min() (per axis or global as appropriate) to produce
shifted_coord = coord + offset, call DeepPot.eval on shifted_coord (or another
trusted reference evaluator) to produce e_ref,f_ref,v_ref, and assert
np.testing.assert_allclose(e_pt2, e_ref, ...), etc.; update the
test_freeze_pt2_nopbc_negative_coords function and keep calls to freeze(...) and
dp_pt2 so the C++ NoPBC fake-box path (DeepPot.eval with box=None) is exercised
while using the shifted_coord oracle for independent validation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 69223829-8114-46ce-9b68-0b038a29b438

📥 Commits

Reviewing files that changed from the base of the PR and between 1a45b88 and b6e7db3.

📒 Files selected for processing (6)
  • deepmd/pt_expt/infer/deep_eval.py
  • source/api_cc/tests/test_deeppot_ptexpt.cc
  • source/tests/pt_expt/model/test_model_compression.py
  • source/tests/pt_expt/test_change_bias.py
  • source/tests/pt_expt/test_dp_freeze.py
  • source/tests/pt_expt/test_finetune.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • source/tests/pt_expt/model/test_model_compression.py
  • source/api_cc/tests/test_deeppot_ptexpt.cc
  • source/tests/pt_expt/test_finetune.py

- test_change_bias: assert full bias tensor shape, not just first 2 values
- test_dp_freeze: clarify NoPBC test compares independent implementations
@wanghan-iapcm wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 25, 2026
@github-actions github-actions bot removed the Test CUDA Trigger test CUDA workflow label Mar 25, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
source/tests/pt_expt/test_dp_freeze.py (1)

237-244: Add a non-default fparam control so this test cannot pass vacuously.

Right now this only proves omitted == explicit default. Because the checkpoint is synthetic, the model could become effectively insensitive to fparam, and this would still pass even if the fallback plumbing stopped being exercised. One extra non-default call plus a “not all equal” assertion would make the regression meaningful.

🧪 Possible hardening
         # Eval WITHOUT fparam — model should use default (0.5)
         e_no, f_no, v_no = dp.eval(coord, box, atype)
         # Eval WITH explicit default value
         e_ex, f_ex, v_ex = dp.eval(coord, box, atype, fparam=[0.5])
+        e_alt, f_alt, v_alt = dp.eval(coord, box, atype, fparam=[0.0])
+
+        self.assertFalse(
+            np.allclose(e_ex, e_alt, rtol=0, atol=1e-12)
+            and np.allclose(f_ex, f_alt, rtol=0, atol=1e-12)
+            and np.allclose(v_ex, v_alt, rtol=0, atol=1e-12)
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/test_dp_freeze.py` around lines 237 - 244, The test
currently only checks omitted fparam equals the explicit default using
dp.eval(coord, box, atype) vs dp.eval(..., fparam=[0.5]) so it can pass
vacuously; add one more call using a non-default fparam (e.g., fparam=[0.7]) via
dp.eval(coord, box, atype, fparam=[...]) to produce e_alt, f_alt, v_alt and then
assert that at least one of e_alt, f_alt, v_alt is not allclose to the default
outputs (e_no/f_no/v_no) to ensure the fparam branch is actually exercised. Use
the existing variable names (dp.eval, e_no/f_no/v_no, e_ex/f_ex/v_ex) to locate
where to insert the extra call and the “not all equal” assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/tests/pt_expt/test_dp_freeze.py`:
- Around line 152-157: The parity asserts between outputs of dp_pte.eval and
dp_pt2.eval (variables e_pte, f_pte, v_pte vs e_pt2, f_pt2, v_pt2) use
np.testing.assert_allclose with implicit rtol and may allow NaN-equality; change
each comparison to explicitly pass rtol=0 and keep atol=1e-10, and before
asserting closeness add finiteness checks (e.g. assert np.all(np.isfinite(...))
for both sides of each pair) so NaNs/Infs fail fast; make the same changes for
the other two analogous blocks that compare dp_pte vs dp_pt2 outputs.

---

Nitpick comments:
In `@source/tests/pt_expt/test_dp_freeze.py`:
- Around line 237-244: The test currently only checks omitted fparam equals the
explicit default using dp.eval(coord, box, atype) vs dp.eval(..., fparam=[0.5])
so it can pass vacuously; add one more call using a non-default fparam (e.g.,
fparam=[0.7]) via dp.eval(coord, box, atype, fparam=[...]) to produce e_alt,
f_alt, v_alt and then assert that at least one of e_alt, f_alt, v_alt is not
allclose to the default outputs (e_no/f_no/v_no) to ensure the fparam branch is
actually exercised. Use the existing variable names (dp.eval, e_no/f_no/v_no,
e_ex/f_ex/v_ex) to locate where to insert the extra call and the “not all equal”
assertion.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f9ba4faf-0b6d-445e-b8ef-88bc27a8e961

📥 Commits

Reviewing files that changed from the base of the PR and between 196bcca and 808b9c7.

📒 Files selected for processing (2)
  • source/tests/pt_expt/test_change_bias.py
  • source/tests/pt_expt/test_dp_freeze.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/tests/pt_expt/test_change_bias.py

Han Wang added 2 commits March 25, 2026 18:01
get_out_bias() returns shape (n_out, ntypes, max_out_size) but the
test expected (ntypes, max_out_size). Use .reshape(updated_bias.shape)
to match the actual shape, consistent with the .pt test pattern.
Add np.isfinite assertions before allclose comparisons to catch NaN
regressions that would otherwise pass if both paths produce matching
NaNs. Set rtol=0 so only absolute tolerance is used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant