Skip to content

feat: add WSD LR Scheduler#5326

Open
OutisLi wants to merge 3 commits intodeepmodeling:masterfrom
OutisLi:pr/wsd
Open

feat: add WSD LR Scheduler#5326
OutisLi wants to merge 3 commits intodeepmodeling:masterfrom
OutisLi:pr/wsd

Conversation

@OutisLi
Copy link
Collaborator

@OutisLi OutisLi commented Mar 18, 2026

The doc will be added through PR #5276

Summary by CodeRabbit

  • New Features

    • Added a "wsd" (warmup → stable → decay) learning-rate schedule with configurable warmup, stable duration, decay-phase ratio, and three decay modes: inverse_linear, cosine, linear. Validation prevents invalid parameter combinations and enforces sensible ranges.
    • Exposed the new schedule in the public API.
  • Tests

    • Added comprehensive tests for decay modes, warmup/stable behavior, edge cases, array/JIT inputs, and beyond-endstep behavior.
  • Documentation

    • Documented the new "wsd" schedule, parameters, examples, and mathematical definitions.

Copilot AI review requested due to automatic review settings March 18, 2026 04:51
@dosubot dosubot bot added the new feature label Mar 18, 2026
@OutisLi OutisLi marked this pull request as draft March 18, 2026 04:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new warmup-stable-decay (WSD) learning-rate scheduler to the backend-agnostic BaseLR registry, wires it into argument checking and backend exports, and expands cross-backend test coverage to validate behavior and consistency.

Changes:

  • Implement LearningRateWSD (type="wsd") with warmup support plus configurable decay modes (inverse_linear, cosine, linear).
  • Extend CLI/config arg validation to accept and validate WSD-specific parameters.
  • Add/extend tests across universal, TF, PT, PD, and consistency suites for WSD behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
deepmd/dpmodel/utils/learning_rate.py Adds the LearningRateWSD scheduler implementation and registers it under BaseLR.
deepmd/utils/argcheck.py Adds WSD-specific validation and exposes WSD arguments via the lr args plugin registry.
deepmd/pt/utils/learning_rate.py Re-exports LearningRateWSD for the PyTorch backend utilities.
deepmd/pd/utils/learning_rate.py Re-exports LearningRateWSD for the Paddle backend utilities.
source/tests/universal/dpmodel/utils/test_learning_rate.py Adds extensive unit tests for WSD (modes, warmup, array input, boundary conditions).
source/tests/tf/test_lr.py Adds TF wrapper build/value tests for WSD (default + cosine).
source/tests/pt/test_lr.py Adds PT-side WSD curve tests for all decay modes.
source/tests/pd/test_lr.py Adds PD-side WSD curve tests for all decay modes.
source/tests/consistent/test_learning_rate.py Extends consistency parameterization to include WSD and adjusts the sampling step to hit the decay phase.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

Adds a new plugin-registered learning-rate schedule LearningRateWSD ("wsd") implementing warmup → stable plateau → decay with three decay modes, plus argument validation, re-exports, documentation, and cross-backend tests.

Changes

Cohort / File(s) Summary
Core Implementation
deepmd/dpmodel/utils/learning_rate.py
Added LearningRateWSD (registered as "wsd") implementing warmup, stable plateau, and decay; validates params, computes decay_phase_steps/stable_steps, and implements _decay_value() with "inverse_linear", "cosine", and "linear" modes. (Class appears inserted twice in diff.)
Public Re-exports
deepmd/pd/utils/learning_rate.py, deepmd/pt/utils/learning_rate.py
Imported and added LearningRateWSD to __all__ in PD and PT public modules.
Argument Validation
deepmd/utils/argcheck.py
Added _check_wsd_args() and learning_rate_wsd() plugin registration; integrated WSD-specific validations (start_lr, stop_lr/ratio positivity, decay_phase_ratio ∈ (0,1], allowed decay_type).
Cross-backend & Unit Tests
source/tests/universal/dpmodel/utils/test_learning_rate.py, source/tests/consistent/test_learning_rate.py
Added extensive WSD tests covering stable/decay boundaries, warmup interaction, invalid params, clamping, JIT/array inputs, and beyond-num_steps behavior.
Framework Tests (PD / PT / TF)
source/tests/pd/test_lr.py, source/tests/pt/test_lr.py, source/tests/tf/test_lr.py
Added TestLearningRateWSD suites for PD/PT and TF wrapper tests that build "wsd" schedules and assert value correctness across decay types.
Documentation
doc/train/learning-rate.md
Documented new wsd schedule, added parameters decay_phase_ratio and decay_type, provided examples and mathematical definitions for stable/decay phases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

Docs

Suggested reviewers

  • njzjz
  • iProzd
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: add WSD LR Scheduler' accurately and concisely summarizes the main change—introducing a new WSD (Warmup-Stable-Decay) learning rate scheduler across multiple modules.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
source/tests/tf/test_lr.py (1)

106-140: These assertions don't exercise the TF graph path.

lr_schedule.value() just delegates to base_lr.value(), so both new tests still pass if LearningRateSchedule.build() is wrong for type="wsd". Please evaluate the tensor returned by build() at a mid-decay step and compare that result instead.

🧪 Suggested coverage improvement
-        global_step = tf.constant(0, dtype=tf.int64)
-        lr_schedule.build(global_step, num_steps=10000)
+        g = tf.Graph()
+        with g.as_default():
+            global_step = tf.placeholder(shape=[], dtype=tf.int64)
+            lr_tensor = lr_schedule.build(global_step, num_steps=10000)
 
         self.assertIsInstance(lr_schedule.base_lr, LearningRateWSD)
-        np.testing.assert_allclose(
-            lr_schedule.value(9500), lr_schedule.base_lr.value(9500), rtol=1e-10
-        )
+        with tf.Session(graph=g) as sess:
+            tensor_value = sess.run(lr_tensor, feed_dict={global_step: 9500})
+        np.testing.assert_allclose(
+            tensor_value, lr_schedule.base_lr.value(9500), rtol=1e-10
+        )

Apply the same pattern to the cosine WSD variant as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/tf/test_lr.py` around lines 106 - 140, The tests are only
exercising the Python path because lr_schedule.value(9500) passes a plain int;
update both tests to pass a TF tensor so the built TF graph is evaluated: after
calling LearningRateSchedule.build(global_step, num_steps=10000), call
lr_schedule.value(tf.constant(9500, dtype=tf.int64)) and compare it to
lr_schedule.base_lr.value(tf.constant(9500, dtype=tf.int64)) (and do the same
change in test_wsd_cosine_build_and_value) to ensure LearningRateSchedule.build,
base_lr and LearningRateWSD TF-paths are exercised.
source/tests/universal/dpmodel/utils/test_learning_rate.py (1)

142-165: Consider splitting mixed validation assertions into separate tests.

test_invalid_decay_phase_ratio also checks invalid decay_type; splitting improves failure localization.

♻️ Suggested test split
-    def test_invalid_decay_phase_ratio(self) -> None:
-        """Test invalid WSD decay_phase_ratio values."""
+    def test_invalid_decay_phase_ratio(self) -> None:
+        """Test invalid WSD decay_phase_ratio values."""
         with self.assertRaises(ValueError):
             LearningRateWSD(
                 start_lr=1e-3,
                 stop_lr=1e-5,
                 num_steps=10000,
                 decay_phase_ratio=0.0,
             )
         with self.assertRaises(ValueError):
             LearningRateWSD(
                 start_lr=1e-3,
                 stop_lr=1e-5,
                 num_steps=10000,
                 decay_phase_ratio=1.1,
             )
+
+    def test_invalid_decay_type(self) -> None:
+        """Test invalid WSD decay_type values."""
         with self.assertRaises(ValueError):
             LearningRateWSD(
                 start_lr=1e-3,
                 stop_lr=1e-5,
                 num_steps=10000,
                 decay_type="bad_mode",
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/universal/dpmodel/utils/test_learning_rate.py` around lines 142
- 165, test_invalid_decay_phase_ratio currently asserts multiple unrelated
invalid cases (invalid decay_phase_ratio and invalid decay_type) in one test;
split it into separate tests so each assertion checks a single validation rule.
Create one test (e.g., test_invalid_decay_phase_ratio_values) that calls
LearningRateWSD with decay_phase_ratio=0.0 and decay_phase_ratio=1.1 and asserts
ValueError, and another test (e.g., test_invalid_decay_type) that calls
LearningRateWSD with decay_type="bad_mode" and asserts ValueError; keep
function/class names LearningRateWSD, decay_phase_ratio, and decay_type to
locate the code. Ensure each new test has a clear name and only one
responsibility for better failure localization.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/dpmodel/utils/learning_rate.py`:
- Around line 514-525: The current int(self.decay_phase_ratio * self.num_steps)
can produce 0 or exceed post-warmup steps; change the computation of
decay_phase_steps to derive from and clamp against decay_num_steps instead of
raising: compute e.g. desired = max(0, int(round(self.decay_phase_ratio *
self.num_steps))) (or 0 if you prefer), then set self.decay_phase_steps =
min(desired, self.decay_num_steps) and if self.decay_phase_steps == 0 and
self.decay_num_steps > 0 set it to 1 (or otherwise handle the zero-decay case),
replacing the two ValueError checks; update code around decay_phase_steps,
decay_phase_ratio, num_steps and decay_num_steps to use this clamped value so
short runs or heavy-warmup configs don’t raise at runtime.

---

Nitpick comments:
In `@source/tests/tf/test_lr.py`:
- Around line 106-140: The tests are only exercising the Python path because
lr_schedule.value(9500) passes a plain int; update both tests to pass a TF
tensor so the built TF graph is evaluated: after calling
LearningRateSchedule.build(global_step, num_steps=10000), call
lr_schedule.value(tf.constant(9500, dtype=tf.int64)) and compare it to
lr_schedule.base_lr.value(tf.constant(9500, dtype=tf.int64)) (and do the same
change in test_wsd_cosine_build_and_value) to ensure LearningRateSchedule.build,
base_lr and LearningRateWSD TF-paths are exercised.

In `@source/tests/universal/dpmodel/utils/test_learning_rate.py`:
- Around line 142-165: test_invalid_decay_phase_ratio currently asserts multiple
unrelated invalid cases (invalid decay_phase_ratio and invalid decay_type) in
one test; split it into separate tests so each assertion checks a single
validation rule. Create one test (e.g., test_invalid_decay_phase_ratio_values)
that calls LearningRateWSD with decay_phase_ratio=0.0 and decay_phase_ratio=1.1
and asserts ValueError, and another test (e.g., test_invalid_decay_type) that
calls LearningRateWSD with decay_type="bad_mode" and asserts ValueError; keep
function/class names LearningRateWSD, decay_phase_ratio, and decay_type to
locate the code. Ensure each new test has a clear name and only one
responsibility for better failure localization.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a283f5a5-f445-4ee4-858e-4f089ee4287d

📥 Commits

Reviewing files that changed from the base of the PR and between b2805fb and f271bf8.

📒 Files selected for processing (9)
  • deepmd/dpmodel/utils/learning_rate.py
  • deepmd/pd/utils/learning_rate.py
  • deepmd/pt/utils/learning_rate.py
  • deepmd/utils/argcheck.py
  • source/tests/consistent/test_learning_rate.py
  • source/tests/pd/test_lr.py
  • source/tests/pt/test_lr.py
  • source/tests/tf/test_lr.py
  • source/tests/universal/dpmodel/utils/test_learning_rate.py

@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 70.31250% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.42%. Comparing base (034e613) to head (156dc05).

Files with missing lines Patch % Lines
deepmd/utils/argcheck.py 38.46% 16 Missing ⚠️
deepmd/dpmodel/utils/learning_rate.py 92.10% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5326      +/-   ##
==========================================
- Coverage   82.42%   82.42%   -0.01%     
==========================================
  Files         784      784              
  Lines       79125    79189      +64     
  Branches     3676     3676              
==========================================
+ Hits        65220    65268      +48     
- Misses      12732    12749      +17     
+ Partials     1173     1172       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@OutisLi OutisLi marked this pull request as ready for review March 19, 2026 01:28
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
deepmd/dpmodel/utils/learning_rate.py (1)

514-518: ⚠️ Potential issue | 🟡 Minor

Handle edge case when decay_num_steps is zero.

The clamping logic at line 515-516 ensures decay_phase_steps >= 1, but when decay_num_steps = 0 (i.e., num_steps = warmup_steps = 0), this results in stable_steps = -1. While the _decay_value logic handles this gracefully by returning stop_lr immediately, having negative stable_steps is semantically incorrect.

Consider adding early validation or adjusting the clamping:

🛡️ Suggested fix
         # === Derive stable and decay phase lengths ===
         self.decay_phase_ratio = decay_phase_ratio
         self.decay_type = decay_type
+        # When decay_num_steps is 0, there's no room for decay
+        if self.decay_num_steps == 0:
+            self.decay_phase_steps = 0
+            self.stable_steps = 0
+            return
         # Clamp decay_phase_steps to valid range [1, decay_num_steps]
         self.decay_phase_steps = max(
             1, min(int(self.decay_phase_ratio * self.num_steps), self.decay_num_steps)
         )
         self.stable_steps = self.decay_num_steps - self.decay_phase_steps
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/utils/learning_rate.py` around lines 514 - 518, The current
clamping forces decay_phase_steps >= 1 which yields stable_steps negative when
decay_num_steps == 0; update the logic so decay_phase_steps is clamped to the
range [0, decay_num_steps] (allowing zero) or explicitly handle the
decay_num_steps == 0 case by setting decay_phase_steps = 0 and stable_steps = 0
and/or validating decay_num_steps >= 0; adjust the block that sets
self.decay_phase_steps and self.stable_steps (and any callers like _decay_value)
to rely on non-negative stable_steps and ensure no negative values are produced.
🧹 Nitpick comments (1)
deepmd/dpmodel/utils/learning_rate.py (1)

559-562: Redundant xp.asarray wrapping.

Since tau is already an array with step_dtype, the expression xp.pi * tau will produce an array of the same dtype. The xp.asarray(..., dtype=step_dtype) call is redundant.

♻️ Suggested simplification
         elif self.decay_type == "cosine":
             decay_lr = stop_lr + (start_lr - stop_lr) * 0.5 * (
-                one + xp.cos(xp.asarray(xp.pi * tau, dtype=step_dtype))
+                one + xp.cos(xp.pi * tau)
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/utils/learning_rate.py` around lines 559 - 562, In the cosine
decay branch (where self.decay_type == "cosine") the expression xp.asarray(xp.pi
* tau, dtype=step_dtype) is redundant because xp.pi * tau already yields an
array with the correct dtype; replace that subexpression with xp.pi * tau so
decay_lr is computed as stop_lr + (start_lr - stop_lr) * 0.5 * (one +
xp.cos(xp.pi * tau)), keeping references to decay_lr, start_lr, stop_lr, tau,
xp.cos and step_dtype unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@deepmd/dpmodel/utils/learning_rate.py`:
- Around line 514-518: The current clamping forces decay_phase_steps >= 1 which
yields stable_steps negative when decay_num_steps == 0; update the logic so
decay_phase_steps is clamped to the range [0, decay_num_steps] (allowing zero)
or explicitly handle the decay_num_steps == 0 case by setting decay_phase_steps
= 0 and stable_steps = 0 and/or validating decay_num_steps >= 0; adjust the
block that sets self.decay_phase_steps and self.stable_steps (and any callers
like _decay_value) to rely on non-negative stable_steps and ensure no negative
values are produced.

---

Nitpick comments:
In `@deepmd/dpmodel/utils/learning_rate.py`:
- Around line 559-562: In the cosine decay branch (where self.decay_type ==
"cosine") the expression xp.asarray(xp.pi * tau, dtype=step_dtype) is redundant
because xp.pi * tau already yields an array with the correct dtype; replace that
subexpression with xp.pi * tau so decay_lr is computed as stop_lr + (start_lr -
stop_lr) * 0.5 * (one + xp.cos(xp.pi * tau)), keeping references to decay_lr,
start_lr, stop_lr, tau, xp.cos and step_dtype unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1363b61c-f8fb-4804-b834-420b56c86723

📥 Commits

Reviewing files that changed from the base of the PR and between f271bf8 and 689d357.

📒 Files selected for processing (2)
  • deepmd/dpmodel/utils/learning_rate.py
  • source/tests/universal/dpmodel/utils/test_learning_rate.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/tests/universal/dpmodel/utils/test_learning_rate.py

@OutisLi OutisLi changed the title feat: add WSD scheduler feat(pt): add WSD LR Scheduler Mar 19, 2026
@OutisLi OutisLi changed the title feat(pt): add WSD LR Scheduler feat: add WSD LR Scheduler Mar 19, 2026
@OutisLi OutisLi requested a review from iProzd March 24, 2026 03:15
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
deepmd/dpmodel/utils/learning_rate.py (1)

476-483: The Raises section is stale.

The constructor now clamps oversized/empty derived decay phases instead of raising for them, so this docstring no longer matches the implementation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/utils/learning_rate.py` around lines 476 - 483, Update the
stale "Raises" docstring in the constructor of the learning rate class
(referenced by decay_phase_ratio and the derived decay phase logic in
deepmd/dpmodel/utils/learning_rate.py) to reflect current behavior: remove the
claim that an empty or oversized derived decay phase raises ValueError and
instead document that such cases are clamped to valid bounds; keep the existing
notes about non-positive learning rates and invalid decay_type raising
ValueError. Mention the clamping behavior and the valid range for
decay_phase_ratio (0, 1] so the docstring matches the implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/dpmodel/utils/learning_rate.py`:
- Around line 514-518: The WSD scheduler currently forces decay_phase_steps to
at least 1 even when decay_num_steps == 0, producing negative stable_steps and
making value(0) return stop_lr; modify the LearningRateWSD initialization to
special-case zero-decay runs (or explicitly reject num_steps == 0): if
self.decay_num_steps == 0 then set self.decay_phase_steps = 0 and
self.stable_steps = 0 (and ensure value(index) treats all indices as
pre-decay/start_lr), or raise a ValueError when self.num_steps == 0; update
references to decay_phase_steps, decay_num_steps, stable_steps and the
value(...) method so zero-decay behavior matches
LearningRateExp/LearningRateCosine.

---

Nitpick comments:
In `@deepmd/dpmodel/utils/learning_rate.py`:
- Around line 476-483: Update the stale "Raises" docstring in the constructor of
the learning rate class (referenced by decay_phase_ratio and the derived decay
phase logic in deepmd/dpmodel/utils/learning_rate.py) to reflect current
behavior: remove the claim that an empty or oversized derived decay phase raises
ValueError and instead document that such cases are clamped to valid bounds;
keep the existing notes about non-positive learning rates and invalid decay_type
raising ValueError. Mention the clamping behavior and the valid range for
decay_phase_ratio (0, 1] so the docstring matches the implementation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3c30e1f9-f232-4bbc-8eac-79c3c14cfbe3

📥 Commits

Reviewing files that changed from the base of the PR and between 689d357 and 156dc05.

📒 Files selected for processing (10)
  • deepmd/dpmodel/utils/learning_rate.py
  • deepmd/pd/utils/learning_rate.py
  • deepmd/pt/utils/learning_rate.py
  • deepmd/utils/argcheck.py
  • doc/train/learning-rate.md
  • source/tests/consistent/test_learning_rate.py
  • source/tests/pd/test_lr.py
  • source/tests/pt/test_lr.py
  • source/tests/tf/test_lr.py
  • source/tests/universal/dpmodel/utils/test_learning_rate.py
✅ Files skipped from review due to trivial changes (3)
  • source/tests/tf/test_lr.py
  • doc/train/learning-rate.md
  • source/tests/consistent/test_learning_rate.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • deepmd/pd/utils/learning_rate.py
  • deepmd/pt/utils/learning_rate.py
  • source/tests/pd/test_lr.py

Comment on lines +514 to +518
# Clamp decay_phase_steps to valid range [1, decay_num_steps]
self.decay_phase_steps = max(
1, min(int(self.decay_phase_ratio * self.num_steps), self.decay_num_steps)
)
self.stable_steps = self.decay_num_steps - self.decay_phase_steps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle num_steps == 0 before forcing a one-step decay.

Lines 515-517 turn decay_num_steps == 0 into decay_phase_steps == 1, which makes stable_steps negative and causes Lines 565-566 to return stop_lr immediately for value(0). LearningRateExp and LearningRateCosine already special-case zero-decay runs, so WSD should do the same or reject num_steps == 0 explicitly.

💡 Possible fix
         # === Derive stable and decay phase lengths ===
         self.decay_phase_ratio = decay_phase_ratio
         self.decay_type = decay_type
+        if self.decay_num_steps == 0:
+            self.decay_phase_steps = 0
+            self.stable_steps = 0
+            return
         # Clamp decay_phase_steps to valid range [1, decay_num_steps]
         self.decay_phase_steps = max(
             1, min(int(self.decay_phase_ratio * self.num_steps), self.decay_num_steps)
         )
         self.stable_steps = self.decay_num_steps - self.decay_phase_steps
@@
         step_dtype = (
             step.dtype
             if xp.isdtype(step.dtype, "real floating")
             else get_xp_precision(xp, "global")
         )
+        if self.decay_num_steps == 0:
+            return xp.full_like(step, self._start_lr, dtype=step_dtype)

Also applies to: 565-566

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/utils/learning_rate.py` around lines 514 - 518, The WSD
scheduler currently forces decay_phase_steps to at least 1 even when
decay_num_steps == 0, producing negative stable_steps and making value(0) return
stop_lr; modify the LearningRateWSD initialization to special-case zero-decay
runs (or explicitly reject num_steps == 0): if self.decay_num_steps == 0 then
set self.decay_phase_steps = 0 and self.stable_steps = 0 (and ensure
value(index) treats all indices as pre-decay/start_lr), or raise a ValueError
when self.num_steps == 0; update references to decay_phase_steps,
decay_num_steps, stable_steps and the value(...) method so zero-decay behavior
matches LearningRateExp/LearningRateCosine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants