test(pt_expt): add .pt2 (AOTInductor) unit tests and bug fixes by wanghan-iapcm · Pull Request #5334 · deepmodeling/deepmd-kit

wanghan-iapcm · 2026-03-23T15:06:00Z

Summary

Comprehensive .pt2 test coverage for the pt_expt backend, plus bug fixes discovered during testing.

Bug fixes

Default fparam segfault: When fparam is omitted on a model with dim_fparam > 0, the C++ and Python inference paths now substitute default values from metadata instead of passing a shape-mismatched tensor (which caused a segfault in AOTInductor)
NoPBC negative coordinates: Shift coordinates so minimum is at rcut before building the fake box, ensuring atoms with negative coordinates are properly handled by copy_coord/build_nlist
CodeQL integer overflow: Use ptrdiff_t for all operands in fold_back iterator arithmetic
Custom op fallback: Add missing tabulate_fusion_se_t_tebd Python fallback stub (all other tabulate ops had one)

New Python tests

Freeze: .pt2 freeze, .pte vs .pt2 eval consistency, NoPBC with negative coords
Default fparam: Eval without fparam matches eval with explicit default value
Compress: Freeze → compress → .pt2 output → DeepPot eval, min_nbor_dist roundtrip
Finetune: Train → freeze to .pt2 → finetune from .pt2, with --use-pretrain-script
Change-bias: Change-bias on .pt2 with data, user-defined values, .pte/.pt2 consistency

New C++ tests

NoPBC atomic: Verify atomic energy/virial for NoPBC path
Multi-frame: 2-frame PBC and NoPBC via compute_mixed_type
Default fparam: Internal nlist + external nlist (LAMMPS path) — eval without fparam matches eval with explicit default
Parser/metadata: Invalid ZIP/file errors, metadata accessors (type_map, cutoff, ntypes, dim_fparam/aparam, has_default_fparam), JSON type coverage across 3 models

Refactoring

Extract common gen script helpers into gen_common.py (eliminates duplication across 5 scripts)
Use CXX env var instead of hardcoded compiler fallback list
Simplify custom op .so install in test_cc_local.sh

Test plan

All Python tests pass locally (CPU)
All C++ tests pass locally (CPU)
Python tests pass on GPU (freeze, finetune, change-bias)
Compression tests skip gracefully when custom ops unavailable

Summary by CodeRabbit

New Features
- Models can include a default frame parameter used automatically during inference when none is provided.
- Exported metadata now records default-frame-parameter data for frozen PT2 artifacts.
Bug Fixes
- Inference now raises a clear error if a PT2 model requires a default frame parameter but none is available.
Tests
- Expanded coverage for default-frame-parameter behavior, multi-frame/PT2 inference, per-atom outputs, freeze/load/eval, change-bias, finetune-from-PT2, and compression round-trips.

- add support for serialization to .pt2 - deep eval with .pt2 - C/C++ inference api with .pt2 - add ut for python inference - add ut for c++/c inference: se_e2_a, fparam/aparam, multi-frames, dpa1

Replace flat (nf*nall,) indexing in dpa1.py and exclude_mask.py with xp_take_along_axis and xp_take_first_n. The flat indexing creates Ne(nall, nloc) assertions during torch.export tracing that fail when nall == nloc (NoPbc). The new gather-based approach avoids these shape constraints. Unify PT/pt_expt test models so .pth and .pt2 are generated from the same dpmodel weights (type_one_side=True) via gen scripts. Remove deeppot_sea.pth, deeppot_dpa.pth, and fparam_aparam.pth from git tracking — they are now regenerated in CI via convert-models.sh. Changes: - dpa1.py, exclude_mask.py: xp_take_along_axis replaces flat indexing - gen_dpa1.py: generates deeppot_dpa1.pth + .pt2 from dpmodel - gen_fparam_aparam.py: generates fparam_aparam.pth + .pt2 from dpmodel - convert-models.sh: regenerate .pth from yaml and run gen scripts in CI - test_deeppot_dpa_ptexpt.cc: add NoPbc test fixture - test_deeppot_dpa1_pt.cc: new DPA1 .pth test (PBC + NoPbc) - test_deeppot_a_fparam_aparam_pt.cc: update reference values

…ckends Fix a latent bug in all 5 C++ backends (DeepPotPT, DeepPotPTExpt, DeepPotJAX, DeepSpinPT, DeepPotPD) where the ghost-to-local mapping was incorrectly indexed through fwd_map when virtual atoms are present. The old code `mapping[ii] = lmp_list.mapping[fwd_map[ii]]` treats the post-filter index as an original index, causing undefined behavior when fwd_map contains -1 entries for filtered atoms. The fix uses bkw_map to correctly translate: new_idx -> orig_idx -> mapped_orig -> new_mapped. Also fix a use-after-free in DeepPotPTExpt.cc where torch::from_blob referenced a local vector's data after the vector went out of scope. Additionally fix dpmodel code (dpa2.py, repformers.py, make_model.py, nlist.py) to use xp_take_first_n and xp_expand_dims instead of slice indexing that creates shape constraints incompatible with torch.export. Add DPA2 C++ test files for both .pth and .pt2 backends with PBC, NoPbc, and type_sel test cases. Reduce DPA2 test model size for machine-precision agreement between jit and export paths.

Fix torch.export-incompatible slicing in dpmodel DPA3/repflows code ([:, :nloc] → xp_take_first_n, reshape with -1 → expand_dims), add gen_dpa3.py model generation script, and create C++ test files for both .pth and .pt2 backends with PBC and NoPbc fixtures.

Restore deeppot_sea.pth and deeppot_dpa.pth to git tracking — CI cannot regenerate them because tabulate_fusion_se_t_tebd custom op is not available in the Python-only CI environment. Remove the dp convert-backend .pth lines from convert-models.sh. Add try/except around .pth export in gen_dpa1.py and gen_fparam_aparam.py so the scripts don't fail when custom ops are unavailable — only .pt2 generation is required.

…ile_and_package

make_fx tracing through torch.autograd.grad fails on CUDA with "opt_ready_stream && opt_parent_stream INTERNAL ASSERT FAILED". Fix by running make_fx on CPU (the graph is device-agnostic), then moving sample inputs to the target device before torch.export.export and AOTInductor compilation so the compiled package targets the correct device (GPU when available).

make_fx with _allow_non_fake_inputs=True on CUDA triggers autograd stream assertion (opt_ready_stream && opt_parent_stream). Fix by: 1. Tracing via make_fx on CPU (decomposes autograd.grad into aten ops) 2. Extracting output keys from CPU-traced module 3. Moving traced module + inputs to target device (CUDA if available) 4. Running torch.export.export on target device (the traced graph has no autograd.grad calls, so no CUDA stream issue) This ensures AOTInductor compiles for the correct device.

The -fsanitize=leak flag produces sanitized .so files that fail to dlopen in Python processes lacking the LSAN runtime. Preload liblsan.so via LD_PRELOAD when CXXFLAGS contains the leak sanitizer. Also use os.path.realpath in gen scripts for robust path resolution and print diagnostic on library loading failure instead of silently passing.

Replace manual traced-module-to-device approach with the official torch.export.passes.move_to_device_pass API. This properly rewrites device='cpu' constants baked into the FX graph by make_fx, avoiding FakeTensor device propagation errors on CUDA (pytorch/pytorch#121761).

- C++ JSON/ZIP parser: add EOF checks, bounds validation, stod exception handling, int64_t loop var to prevent unsigned underflow - C++ from_blob: add .clone() to all calls to prevent use-after-free - C++ translate_error: catch std::exception instead of std::runtime_error - Python deep_eval/serialization: validate ZIP entries before reading - gen_*.py: move custom op loading after deepmd imports to avoid double registration; add inductor compiler fallbacks for CI; remove unused subprocess imports

LSAN runtime inherited by g++ subprocesses causes false leak reports and non-zero exit codes, making torch._inductor think no C++ compiler works. Clear LD_PRELOAD after custom ops are loaded since it's no longer needed.

Add has_default_fparam support to DeepPotPTExpt: - Embed has_default_fparam in model_def_script.json metadata during freeze (_collect_metadata) - Read it in C++ init from metadata, with fallback to false for older .pt2 files - Add C++ tests for both has_default_fparam==false and ==true - Generate fparam_aparam_default.pt2 test model with default_fparam set

- Guard EOCD scan against short/corrupt .pt2 files (P1) - Parse ZIP64 fields using full 64-bit widths (P2) - Match ZIP entry names by suffix for archives with dir prefixes - Reject unsupported in-memory file_content in DeepPotPTExpt::init - Replace debug-only assert with runtime exception for output count - Fix virtual atom coordinate copy: nvir -> nvir*3 in 3 test files - Tighten DPA2 float EPSILON from 1e-1 to 1e-4, add TODO for ghost - Narrow exception handling in gen_dpa3.py custom op loading - Add parity assertions (not just prints) in gen_dpa3.py/.py exports - Guard .pth parity check on successful export in gen_fparam_aparam.py - Extend parity check to virial and atomic outputs - Update deep_eval.py comment on private torch API

The gen_*.py scripts import torch, which has internal pybind/JIT allocations that LSAN reports as leaks. These are not our leaks. Previously, LD_PRELOAD=liblsan.so was set for the gen scripts to handle dlopen of the sanitizer-built custom op .so, but this causes LSAN to track all Python/torch allocations and fail on exit. Fix: run gen scripts without LD_PRELOAD. The gen scripts don't need LSAN — only the C++ test binary (ctest) does, and ctest already inherits LSAN via the -fsanitize=leak compile flag. Also revert the DPA2 cpu_lmp_nlist test (needs further investigation).

- Remove unused variable nloc in gen_fparam_aparam.py - Cast int multiplications to size_t before use as iterator offsets in DeepPotPTExpt multi-frame compute to prevent overflow

The gen scripts need LD_PRELOAD=liblsan.so to dlopen the custom op .so built with -fsanitize=leak. But LSAN leak detection must be disabled (detect_leaks=0) to avoid false reports from torch/paddle internal allocations.

The AOTInductor model_package_loader.h header may not exist on all platforms (e.g. macOS x86_64 torch builds). Use __has_include to detect it and define BUILD_PT_EXPT accordingly, so DeepPotPTExpt compiles as an empty TU when the header is missing.

The _load_custom_ops() call was placed before deepmd.pt was imported, so the guard `hasattr(torch.ops.deepmd, "border_op")` didn't see the op registered by the standard install path. This caused the build directory's .so to be loaded, and then when deepmd.pt was imported later for .pth export, it loaded the same op again — crash. Fix: move _load_custom_ops() to after the deepmd.pt import so the guard correctly detects the already-registered op and skips.

The gen scripts import deepmd.pt which loads custom ops from deepmd/lib/. Without the .so installed there, the custom op (tabulate_fusion_se_t_tebd) is unavailable and .pth export fails for DPA2/DPA3 models, causing C++ tests to fail on missing files. Fix: copy libdeepmd_op_pt.so from the build directory to the deepmd package's lib/ directory after cmake --build, before running gen scripts.

The custom op .so (libdeepmd_op_pt.so) has undefined symbols for compute kernels (tabulate_fusion_se_t_tebd_cpu etc.) that live in libdeepmd.so. Without dp_test/lib/ in LD_LIBRARY_PATH, dlopen succeeds (lazy binding) but torch.jit.script fails when it tries to resolve the symbols.

The script was copying libdeepmd_op_pt.so to os.path.dirname(deepmd.__file__)+"/lib" which resolves to the workspace source tree, but cxx_op.py loads from SHARED_LIB_DIR (deepmd.lib.__path__) which resolves to site-packages. For editable installs these diverge, so the custom ops were never loaded and ENABLE_CUSTOMIZED_OP was False. This caused 10 C++ test failures: - 8 DPA2 .pth tests: tabulate_fusion_se_t_tebd has no Python fallback - 2 DPA3 .pth NoPbc tests: border_op fallback raises NotImplementedError

…_local.sh Move custom op .so install, LSAN setup, and gen script invocations into a single ENABLE_PYTORCH guard block. Update comment to reference SHARED_LIB_DIR instead of deepmd/lib/.

- common.h fold_back: use ptrdiff_t for loop variables to prevent int multiplication overflow before widening to difference_type - serialization.py: unify import style for deepmd.pt_expt.utils.env (use `import ... as _env` consistently, remove `from ... import DEVICE`)

Without the custom op .so loaded, torch.jit.script fails on se_t_tebd with "object has no attribute tabulate_fusion_se_t_tebd". Add the same NotImplementedError fallback pattern used by tabulate_fusion_se_t, tabulate_fusion_se_a, tabulate_fusion_se_atten, and border_op.

The previous fix only changed loop variables but nall*ndim and nloc*ndim were still computed as int*int before widening. Cast all parameters to ptrdiff_t at function entry.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50507c6af7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

source/api_cc/src/DeepPotPTExpt.cc

codecov · 2026-03-23T15:47:11Z

Codecov Report

❌ Patch coverage is 77.67857% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.25%. Comparing base (e97967b) to head (808b9c7).

Files with missing lines	Patch %	Lines
source/api_cc/tests/test_deeppot_ptexpt.cc	73.94%	37 Missing ⚠️
source/api_cc/src/DeepPotPTExpt.cc	80.00%	2 Missing and 4 partials ⚠️
...pi_cc/tests/test_deeppot_a_fparam_aparam_ptexpt.cc	87.23%	6 Missing ⚠️
deepmd/pt_expt/infer/deep_eval.py	80.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5334      +/-   ##
==========================================
- Coverage   82.28%   82.25%   -0.03%     
==========================================
  Files         797      797              
  Lines       82100    82323     +223     
  Branches     4003     4020      +17     
==========================================
+ Hits        67557    67717     +160     
- Misses      13336    13393      +57     
- Partials     1207     1213       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Store default_fparam values in .pt2 metadata during serialization - Python inference: when fparam is omitted and model has default, substitute default values from metadata instead of passing None - C++ inference: read default_fparam from metadata and use when fparam is empty (without fix: segfault from shape mismatch) - NoPBC: shift coordinates so minimum is at rcut, ensuring atoms with negative coords are inside [0, L) for copy_coord/build_nlist - Add C++ tests: eval_default_vs_explicit (internal nlist) and eval_default_vs_explicit_lmp (external nlist) - Add Python tests: test_pt2_eval_default_fparam and test_freeze_pt2_nopbc_negative_coords

Resolve conflict in serialization.py: keep .pt2 dispatch in serialize_from_file and add model_params.json support from upstream (finetune --use-pretrain-script) to both .pte and .pt2 paths.

- NoPbc cpu_build_nlist_atomic: verify atomic energy/virial for NoPBC - PBC cpu_build_nlist_nframes: 2-frame PBC via compute_mixed_type - NoPbc cpu_build_nlist_nframes: 2-frame NoPBC via compute_mixed_type These cover previously untested code paths in DeepPotPTExpt::compute (NoPBC internal nlist) and compute_nframes (multi-frame wrapper).

…t-pt2-uts

- test_finetune_from_pt2: train → freeze to .pt2 → finetune from .pt2 → verify inherited weights match pretrained - test_finetune_from_pt2_use_pretrain_script: train → freeze to .pt2 → finetune with --use-pretrain-script → verify config updated

…Expt - Parser error tests: nonexistent file, invalid ZIP, truncated file - Metadata accessor tests: type_map, cutoff, ntypes, dim_fparam/aparam - JSON type coverage: integers, booleans, strings, float, arrays via SE_A model (no fparam) and fparam_aparam model - Default fparam JSON parsing: boolean true + float array via fparam_aparam_default model

Resolve conflicts: keep default_fparam logic, NoPBC coord shift, and new tests from our branch on top of upstream's merged feat-pt-expt-c.

- test_change_bias_frozen_pt2: change-bias with data on .pt2 model - test_change_bias_frozen_pt2_user_defined: user-defined bias on .pt2 - test_change_bias_pt2_pte_consistency: .pte and .pt2 produce same bias

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a45b88af9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

deepmd/pt_expt/infer/deep_eval.py

coderabbitai · 2026-03-25T01:58:47Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds exporter-stored model default_fparam, Python inference and C++ runtime use that default when callers omit fparam for PT2 models with nonzero fparam dimension; tests and serializers updated across Python and C++ layers.

Changes

Cohort / File(s)	Summary
Python inference & exporter `deepmd/pt_expt/infer/deep_eval.py`, `deepmd/pt_expt/utils/serialization.py`	Exporter now includes `"default_fparam"` in metadata. `_eval_model` constructs `fparam_t` from `metadata["default_fparam"]` (float64, broadcast to `(nframes, dim_fparam)`) when caller omits `fparam` for `.pt2`; raises if missing.
C++ inference API `source/api_cc/include/DeepPotPTExpt.h`, `source/api_cc/src/DeepPotPTExpt.cc`	Added private `default_fparam_` and initialize from metadata. `compute` overloads build `fparam_tensor` from `default_fparam_` when caller `fparam` is empty; throw when `dfparam>0` and no default exists. Adjusted NoPBC fake-box path to shift coordinates so minima equal `rcut`.
C++ unit tests `source/api_cc/tests/test_deeppot_a_fparam_aparam_ptexpt.cc`, `source/api_cc/tests/test_deeppot_ptexpt.cc`	Added typed tests for default vs explicit `fparam` equivalence (including external/LAMMPS nlist), multi-frame PT2 inference, atomic outputs, metadata parsing, and PT2 init error cases.
Python integration tests `source/tests/pt_expt/model/test_model_compression.py`, `source/tests/pt_expt/test_dp_freeze.py`, `source/tests/pt_expt/test_finetune.py`, `source/tests/pt_expt/test_change_bias.py`	Added tests covering freeze→`.pt2` output and eval parity, NoPBC/negative-coordinate PT2 behavior, finetune-from-`.pt2` flows, change-bias on `.pt2`, compression round-trips preserving `min_nbor_dist`, and default-fparam eval behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Exporter as Exporter\n(Py)
    participant Metadata as Metadata\n(JSON)
    participant InferPy as Infer\n(Py)
    participant Runtime as Runtime\n(C++)
    participant Runner as Runner\n(Model)

    Exporter->>Metadata: write model + "default_fparam"
    InferPy->>Metadata: load model + metadata
    alt caller omits fparam and dim_fparam>0
        InferPy->>InferPy: build fparam_t from metadata["default_fparam"]\n(dtype=float64, broadcast to nframes)
    end
    InferPy->>Runtime: call compute(fparam_t)
    alt fparam empty in C++ compute
        Runtime->>Runtime: construct fparam tensor from default_fparam_\n(device float64, broadcast)
    end
    Runtime->>Runner: invoke model runner with fparam tensor
    Runner-->>Runtime: energies, forces, virials
    Runtime-->>InferPy: return results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

njzjz
caic99

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: adding .pt2 unit tests and bug fixes for the pt_expt backend, which aligns with the comprehensive test additions and bug fixes described in the PR objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

deepmd/pt_expt/utils/serialization.py (1)

188-199: ⚠️ Potential issue | 🔴 Critical

Convert default_fparam tensor to JSON-serializable format before storage.

model.get_default_fparam() returns a torch.Tensor | None, but the metadata dict is serialized via json.dumps() which cannot serialize tensors. When a model with a default fparam is frozen, this will raise TypeError: Object of type Tensor is not JSON serializable. Convert the tensor to a list or None:

Suggested fix

+    default_fparam = model.get_default_fparam()
     return {
         "type_map": model.get_type_map(),
         "rcut": model.get_rcut(),
         "sel": model.get_sel(),
         "model_output_type": model.model_output_type(),
         "dim_fparam": model.get_dim_fparam(),
         "dim_aparam": model.get_dim_aparam(),
         "mixed_types": model.mixed_types(),
         "sel_type": model.get_sel_type(),
         "has_default_fparam": model.has_default_fparam(),
-        "default_fparam": model.get_default_fparam(),
+        "default_fparam": (
+            None
+            if default_fparam is None
+            else default_fparam.detach().cpu().tolist()
+        ),
         "fitting_output_defs": fitting_output_defs,
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/utils/serialization.py` around lines 188 - 199, The metadata
dict contains "default_fparam" which is assigned model.get_default_fparam() (a
torch.Tensor or None) and will fail json.dumps; update the serialization to call
model.get_default_fparam(), check for None, and if not None convert the tensor
to a plain Python list (e.g., via .tolist()) before inserting into the dict so
"default_fparam" is JSON-serializable; modify the code around the return dict
(the keys "default_fparam" and related model.get_default_fparam()) to perform
this conversion.

🧹 Nitpick comments (1)

source/api_cc/tests/test_deeppot_ptexpt.cc (1)
656-669: Drop TYPED_TEST from the metadata-only suites.

TypeParam is unused in these fixtures and assertions, so the same .pt2 files get loaded once per ValueTypes entry without increasing coverage. Converting these to TEST_F would keep the checks and reduce duplicate runtime/noise.

Also applies to: 699-712, 735-748
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/api_cc/tests/test_deeppot_ptexpt.cc` around lines 656 - 669, The test
fixture TestDeepPotPTExptMetadata (and the other metadata-only fixtures noted)
doesn't use TypeParam, so keep a single instantiation: remove
TYPED_TEST_SUITE/TYPED_TEST usage and convert the suite to a plain TEST_F-style
test class; e.g., replace the TYPED_TEST_SUITE(TestDeepPotPTExptMetadata,
ValueTypes) and any TYPED_TESTs that use it with regular TEST_F tests that call
dp.init("../../tests/infer/deeppot_sea.pt2") in SetUp, and delete any unused
TypeParam references; apply the same change for the other metadata-only fixtures
referenced in the comment.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 828-836: The code currently creates a zero-length fparam_tensor
when default_fparam_ is empty which breaks the AOTInductor model if dfparam > 0;
modify the fparam_tensor construction (where default_fparam_ and fparam_tensor
are handled) to check dfparam: if dfparam > 0 create a zeros tensor shaped {1,
dfparam} with the same options and .to(device), otherwise keep the zero-length
zeros({0}) behavior; apply the identical change to both places where
fparam_tensor is built (the two occurrences around the original diff and the
later one referenced in the comment) and ensure this aligns with how run_model()
conditions inclusion of fparam in the input list.

In `@source/api_cc/tests/test_deeppot_ptexpt.cc`:
- Around line 631-638: The test writes a temporary file then calls
deepmd::DeepPot::init(tmpfile) but doesn't assert the write succeeded, so
failures to create the file can mask invalid-ZIP handling; modify the test to
(1) use a unique temp path (e.g., include PID/timestamp or tmpnam-style string)
for tmpfile, (2) after creating std::ofstream ofs(tmpfile, std::ios::binary)
assert ofs.is_open() and that the write succeeded (e.g., check ofs.good() or
verify bytes written) before closing the scope, and then call dp.init(tmpfile)
and remove the file; apply the same change to the other similar block around the
dp.init call.
- Around line 541-578: The test cpu_build_nlist_nframes is appending an
identical second frame so it won't catch frame-specific bugs; change the
second-frame inputs (coord_2f, atype_2f, and box_2f for PBC case) to be a
perturbed copy (e.g., shift one atom coordinate and adjust box if needed), then
after calling dp.compute_mixed_type(...) verify each frame by calling
dp.compute(...) on the single modified frame and comparing that single-frame
energy/force/virial to the corresponding slice from the batched outputs; apply
the same change to the other batched test referenced (the block around 581-615)
and use symbol names dp.compute_mixed_type and dp.compute to locate the code to
modify.

In `@source/tests/pt_expt/model/test_model_compression.py`:
- Around line 340-343: The test unpacks e_pt2, f_pt2, v_pt2 from dp.eval(coord,
box, atype) but only asserts energy, leaving f_pt2 and v_pt2 unused (Ruff error
and missing derivative checks); either add assertions comparing f_pt2 and v_pt2
to ret_frozen["forces"] and ret_frozen["virial"] (using
np.testing.assert_allclose with appropriate atol and err_msg) or, if you intend
energy-only coverage, change the unpack to e_pt2, _, _ = dp.eval(coord, box,
atype) to avoid unused variables; update the lines around dp.eval and the
existing energy assertion accordingly.

In `@source/tests/pt_expt/test_dp_freeze.py`:
- Around line 124-126: The test calls dp.eval(coord, box, atype) and assigns (e,
f, v) but never uses v, which triggers RUF059 and omits the virial shape check;
fix by either asserting the virial shape (e.g., use self.assertEqual(v.shape,
(1, 1)) or the correct expected shape) or, if you intentionally don't check it,
rename v to _v to signal it's unused; update the lines around dp.eval and the
subsequent self.assertEqual calls accordingly and run ruff check . and ruff
format . before committing.

In `@source/tests/pt_expt/test_finetune.py`:
- Around line 895-935: The test is reloading the finetuned checkpoint as the
"pretrained" baseline because ckpt_path (created by _train_pretrained) points to
the same model.ckpt.pt that the finetune run overwrites; before calling
main([...,"--finetune", pt2_path,...]) capture and cache the pretrained state
returned by torch.load(ckpt_path, map_location=DEVICE, weights_only=True) into
pre_model_state (or save it to a uniquely named file) and then run main so that
_assert_inherited_weights_match(ft_model_state, pre_model_state,
random_fitting=False) compares the original pretrained weights to the finetuned
output instead of reloading the overwritten file; alternatively, change the
finetune output path (ft_ckpt) to a different filename so ckpt_path remains
untouched.

---

Outside diff comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 188-199: The metadata dict contains "default_fparam" which is
assigned model.get_default_fparam() (a torch.Tensor or None) and will fail
json.dumps; update the serialization to call model.get_default_fparam(), check
for None, and if not None convert the tensor to a plain Python list (e.g., via
.tolist()) before inserting into the dict so "default_fparam" is
JSON-serializable; modify the code around the return dict (the keys
"default_fparam" and related model.get_default_fparam()) to perform this
conversion.

---

Nitpick comments:
In `@source/api_cc/tests/test_deeppot_ptexpt.cc`:
- Around line 656-669: The test fixture TestDeepPotPTExptMetadata (and the other
metadata-only fixtures noted) doesn't use TypeParam, so keep a single
instantiation: remove TYPED_TEST_SUITE/TYPED_TEST usage and convert the suite to
a plain TEST_F-style test class; e.g., replace the
TYPED_TEST_SUITE(TestDeepPotPTExptMetadata, ValueTypes) and any TYPED_TESTs that
use it with regular TEST_F tests that call
dp.init("../../tests/infer/deeppot_sea.pt2") in SetUp, and delete any unused
TypeParam references; apply the same change for the other metadata-only fixtures
referenced in the comment.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a171353c-1edf-4ab1-9928-d0e66a75b8aa

📥 Commits

Reviewing files that changed from the base of the PR and between 97e9c9e and 1a45b88.

📒 Files selected for processing (9)

deepmd/pt_expt/infer/deep_eval.py
deepmd/pt_expt/utils/serialization.py
source/api_cc/include/DeepPotPTExpt.h
source/api_cc/src/DeepPotPTExpt.cc
source/api_cc/tests/test_deeppot_a_fparam_aparam_ptexpt.cc
source/api_cc/tests/test_deeppot_ptexpt.cc
source/tests/pt_expt/model/test_model_compression.py
source/tests/pt_expt/test_dp_freeze.py
source/tests/pt_expt/test_finetune.py

source/api_cc/src/DeepPotPTExpt.cc

source/api_cc/tests/test_deeppot_ptexpt.cc

source/tests/pt_expt/model/test_model_compression.py

source/tests/pt_expt/test_dp_freeze.py

source/tests/pt_expt/test_finetune.py

- deep_eval.py: add .contiguous() on expanded default fparam; raise ValueError when no default available instead of silent zero injection - test_deeppot_ptexpt.cc: use distinct coord_alt for frame 1 in multi-frame tests; assert temp file creation in parser error tests - test_model_compression.py: assert force/virial on .pt2 compress path - test_dp_freeze.py: assert virial shape in .pt2 smoke test - test_finetune.py: copy pretrained checkpoint before finetune overwrites

…fault Consistent with the Python side: throw deepmd_exception instead of passing a shape-mismatched {0} tensor that would crash in AOTInductor.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/tests/pt_expt/test_change_bias.py`:
- Around line 311-315: The test only asserts the first two entries of the
deserialized bias (using updated_bias.flatten()[:2]) which can miss
extra/mis-shaped values; replace that partial check with a full-array comparison
by reshaping the expected user bias to match updated_bias.shape and using
np.testing.assert_allclose(updated_bias, expected_bias) so the entire bias
tensor from BaseModel.get_out_bias (after serialize_from_file and
BaseModel.deserialize) is validated.

In `@source/tests/pt_expt/test_dp_freeze.py`:
- Around line 159-191: The test test_freeze_pt2_nopbc_negative_coords currently
only compares pt2 vs pte (dp_pt2.eval vs dp_pte.eval) so it won't detect
regressions in the shared frontend; instead create an independent oracle by
manually shifting the negative coord array into the [0,L) range and using that
shifted_coord as the reference for energy/force/virial, then compare
dp_pt2.eval(coord, None, atype) against the oracle results; concretely, compute
an offset = -coord.min() (per axis or global as appropriate) to produce
shifted_coord = coord + offset, call DeepPot.eval on shifted_coord (or another
trusted reference evaluator) to produce e_ref,f_ref,v_ref, and assert
np.testing.assert_allclose(e_pt2, e_ref, ...), etc.; update the
test_freeze_pt2_nopbc_negative_coords function and keep calls to freeze(...) and
dp_pt2 so the C++ NoPBC fake-box path (DeepPot.eval with box=None) is exercised
while using the shifted_coord oracle for independent validation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 69223829-8114-46ce-9b68-0b038a29b438

📥 Commits

Reviewing files that changed from the base of the PR and between 1a45b88 and b6e7db3.

📒 Files selected for processing (6)

deepmd/pt_expt/infer/deep_eval.py
source/api_cc/tests/test_deeppot_ptexpt.cc
source/tests/pt_expt/model/test_model_compression.py
source/tests/pt_expt/test_change_bias.py
source/tests/pt_expt/test_dp_freeze.py
source/tests/pt_expt/test_finetune.py

🚧 Files skipped from review as they are similar to previous changes (3)

source/tests/pt_expt/model/test_model_compression.py
source/api_cc/tests/test_deeppot_ptexpt.cc
source/tests/pt_expt/test_finetune.py

source/tests/pt_expt/test_change_bias.py

source/tests/pt_expt/test_dp_freeze.py

- test_change_bias: assert full bias tensor shape, not just first 2 values - test_dp_freeze: clarify NoPBC test compares independent implementations

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

source/tests/pt_expt/test_dp_freeze.py (1)

237-244: Add a non-default fparam control so this test cannot pass vacuously.

Right now this only proves omitted == explicit default. Because the checkpoint is synthetic, the model could become effectively insensitive to fparam, and this would still pass even if the fallback plumbing stopped being exercised. One extra non-default call plus a “not all equal” assertion would make the regression meaningful.

🧪 Possible hardening

         # Eval WITHOUT fparam — model should use default (0.5)
         e_no, f_no, v_no = dp.eval(coord, box, atype)
         # Eval WITH explicit default value
         e_ex, f_ex, v_ex = dp.eval(coord, box, atype, fparam=[0.5])
+        e_alt, f_alt, v_alt = dp.eval(coord, box, atype, fparam=[0.0])
+
+        self.assertFalse(
+            np.allclose(e_ex, e_alt, rtol=0, atol=1e-12)
+            and np.allclose(f_ex, f_alt, rtol=0, atol=1e-12)
+            and np.allclose(v_ex, v_alt, rtol=0, atol=1e-12)
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/test_dp_freeze.py` around lines 237 - 244, The test
currently only checks omitted fparam equals the explicit default using
dp.eval(coord, box, atype) vs dp.eval(..., fparam=[0.5]) so it can pass
vacuously; add one more call using a non-default fparam (e.g., fparam=[0.7]) via
dp.eval(coord, box, atype, fparam=[...]) to produce e_alt, f_alt, v_alt and then
assert that at least one of e_alt, f_alt, v_alt is not allclose to the default
outputs (e_no/f_no/v_no) to ensure the fparam branch is actually exercised. Use
the existing variable names (dp.eval, e_no/f_no/v_no, e_ex/f_ex/v_ex) to locate
where to insert the extra call and the “not all equal” assertion.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/tests/pt_expt/test_dp_freeze.py`:
- Around line 152-157: The parity asserts between outputs of dp_pte.eval and
dp_pt2.eval (variables e_pte, f_pte, v_pte vs e_pt2, f_pt2, v_pt2) use
np.testing.assert_allclose with implicit rtol and may allow NaN-equality; change
each comparison to explicitly pass rtol=0 and keep atol=1e-10, and before
asserting closeness add finiteness checks (e.g. assert np.all(np.isfinite(...))
for both sides of each pair) so NaNs/Infs fail fast; make the same changes for
the other two analogous blocks that compare dp_pte vs dp_pt2 outputs.

---

Nitpick comments:
In `@source/tests/pt_expt/test_dp_freeze.py`:
- Around line 237-244: The test currently only checks omitted fparam equals the
explicit default using dp.eval(coord, box, atype) vs dp.eval(..., fparam=[0.5])
so it can pass vacuously; add one more call using a non-default fparam (e.g.,
fparam=[0.7]) via dp.eval(coord, box, atype, fparam=[...]) to produce e_alt,
f_alt, v_alt and then assert that at least one of e_alt, f_alt, v_alt is not
allclose to the default outputs (e_no/f_no/v_no) to ensure the fparam branch is
actually exercised. Use the existing variable names (dp.eval, e_no/f_no/v_no,
e_ex/f_ex/v_ex) to locate where to insert the extra call and the “not all equal”
assertion.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f9ba4faf-0b6d-445e-b8ef-88bc27a8e961

📥 Commits

Reviewing files that changed from the base of the PR and between 196bcca and 808b9c7.

📒 Files selected for processing (2)

source/tests/pt_expt/test_change_bias.py
source/tests/pt_expt/test_dp_freeze.py

🚧 Files skipped from review as they are similar to previous changes (1)

source/tests/pt_expt/test_change_bias.py

source/tests/pt_expt/test_dp_freeze.py

get_out_bias() returns shape (n_out, ntypes, max_out_size) but the test expected (ntypes, max_out_size). Use .reshape(updated_bias.shape) to match the actual shape, consistent with the .pt test pattern.

Add np.isfinite assertions before allclose comparisons to catch NaN regressions that would otherwise pass if both paths produce matching NaNs. Set rtol=0 so only absolute tolerance is used.

Han Wang added 30 commits March 21, 2026 21:28

Add pt2 format for C/C++ inference.

584613b

- add support for serialization to .pt2 - deep eval with .pt2 - C/C++ inference api with .pt2 - add ut for python inference - add ut for c++/c inference: se_e2_a, fparam/aparam, multi-frames, dpa1

temporarily clears the default device (None) before calling aoti_comp…

da5579d

…ile_and_package

fix issue of c/c++ ut

0091315

check the size validity

46c56e3

fix: zip 64-bit widths

1ec9b85

rm dead code

fbb8a61

fix(ci): clear LD_PRELOAD before AOTInductor compilation

cd90c31

LSAN runtime inherited by g++ subprocesses causes false leak reports and non-zero exit codes, making torch._inductor think no C++ compiler works. Clear LD_PRELOAD after custom ops are loaded since it's no longer needed.

fix: resolve github-advanced-security alerts

4b1f897

- Remove unused variable nloc in gen_fparam_aparam.py - Cast int multiplications to size_t before use as iterator offsets in DeepPotPTExpt multi-frame compute to prevent overflow

fix(ci): preload LSAN with detect_leaks=0 for gen scripts

1aded45

The gen scripts need LD_PRELOAD=liblsan.so to dlopen the custom op .so built with -fsanitize=leak. But LSAN leak detection must be disabled (detect_leaks=0) to avoid false reports from torch/paddle internal allocations.

fix(ci): guard gen scripts behind ENABLE_PYTORCH and clean up test_cc…

7e04ad5

…_local.sh Move custom op .so install, LSAN setup, and gen script invocations into a single ENABLE_PYTORCH guard block. Update comment to reference SHARED_LIB_DIR instead of deepmd/lib/.

fix: widen all int operands in fold_back to ptrdiff_t

6d21400

The previous fix only changed loop variables but nall*ndim and nloc*ndim were still computed as int*int before widening. Cast all parameters to ptrdiff_t at function entry.

chatgpt-codex-connector bot reviewed Mar 23, 2026

View reviewed changes

source/api_cc/src/DeepPotPTExpt.cc Show resolved Hide resolved

source/api_cc/src/DeepPotPTExpt.cc Show resolved Hide resolved

Han Wang added 9 commits March 24, 2026 08:27

merge upstream/master into feat-pt-expt-c

00fcba0

Resolve conflict in serialization.py: keep .pt2 dispatch in serialize_from_file and add model_params.json support from upstream (finetune --use-pretrain-script) to both .pte and .pt2 paths.

Merge remote-tracking branch 'origin/feat-pt-expt-c' into feat-pt-exp…

0cb00cb

…t-pt2-uts

merge upstream/master into feat-pt-expt-pt2-uts

1a45b88

Resolve conflicts: keep default_fparam logic, NoPBC coord shift, and new tests from our branch on top of upstream's merged feat-pt-expt-c.

Merge remote-tracking branch 'upstream/master' into feat-pt-expt-pt2-uts

80aa996

test(pt_expt): add .pt2 change-bias tests

78d5bde

- test_change_bias_frozen_pt2: change-bias with data on .pt2 model - test_change_bias_frozen_pt2_user_defined: user-defined bias on .pt2 - test_change_bias_pt2_pte_consistency: .pte and .pt2 produce same bias

wanghan-iapcm changed the title ~~test(pt_expt): add .pt2 (AOTInductor) unit tests~~ test(pt_expt): add .pt2 (AOTInductor) unit tests and bug fixes Mar 25, 2026

wanghan-iapcm marked this pull request as ready for review March 25, 2026 01:47

wanghan-iapcm requested a review from njzjz March 25, 2026 01:47

dosubot bot added bug new feature labels Mar 25, 2026

chatgpt-codex-connector bot reviewed Mar 25, 2026

View reviewed changes

deepmd/pt_expt/infer/deep_eval.py Show resolved Hide resolved

deepmd/pt_expt/infer/deep_eval.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

Han Wang added 2 commits March 25, 2026 13:05

fix(c++): raise error when fparam required but not provided and no de…

196bcca

…fault Consistent with the Python side: throw deepmd_exception instead of passing a shape-mismatched {0} tensor that would crash in AOTInductor.

wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 25, 2026

github-actions bot removed the Test CUDA Trigger test CUDA workflow label Mar 25, 2026

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

source/tests/pt_expt/test_change_bias.py Outdated Show resolved Hide resolved

source/tests/pt_expt/test_dp_freeze.py Outdated Show resolved Hide resolved

fix: address remaining reviewer comments

808b9c7

- test_change_bias: assert full bias tensor shape, not just first 2 values - test_dp_freeze: clarify NoPBC test compares independent implementations

wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 25, 2026

github-actions bot removed the Test CUDA Trigger test CUDA workflow label Mar 25, 2026

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

source/tests/pt_expt/test_dp_freeze.py Outdated Show resolved Hide resolved

Han Wang added 2 commits March 25, 2026 18:01

fix(tests): fix expected bias shape in pt2 user-defined test

ef878f5

get_out_bias() returns shape (n_out, ntypes, max_out_size) but the test expected (ntypes, max_out_size). Use .reshape(updated_bias.shape) to match the actual shape, consistent with the .pt test pattern.

fix(tests): add finiteness checks and set rtol=0 in freeze parity tests

b7ec3a6

Add np.isfinite assertions before allclose comparisons to catch NaN regressions that would otherwise pass if both paths produce matching NaNs. Set rtol=0 so only absolute tolerance is used.

Conversation

wanghan-iapcm commented Mar 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug fixes

New Python tests

New C++ tests

Refactoring

Test plan

Summary by CodeRabbit

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wanghan-iapcm commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

codecov bot commented Mar 23, 2026 •

edited

Loading

coderabbitai bot commented Mar 25, 2026 •

edited

Loading