Skip to content

Revamp Message Pool Nonce calculation#6788

Open
sudo-shashank wants to merge 30 commits intomainfrom
shashank/revamp-mpool-nonce-calc
Open

Revamp Message Pool Nonce calculation#6788
sudo-shashank wants to merge 30 commits intomainfrom
shashank/revamp-mpool-nonce-calc

Conversation

@sudo-shashank
Copy link
Copy Markdown
Contributor

@sudo-shashank sudo-shashank commented Mar 25, 2026

Summary of changes

Changes introduced in this pull request:

  • Nonce calculation:

    • get_state_sequence now scans messages included in the current tipset to find the true highest nonce, rather than relying solely on actor.sequence which can lag.
    • Add a (TipsetKey, Address)-keyed state nonce cache to avoid redundant tipset scans.
  • Nonce serialization:

    • Add per-sender MpoolLocker preventing concurrent requests from reading stale state.
    • Add NonceTracker that assigns nonces sequentially and persists them to the datastore, this bridges the gap between "messages broadcast to the network but not yet on-chain" and "node restart wiping in-memory state". without it, any restart while messages are in-flight would cause nonce collisions.
  • Nonce gap enforcement:

    • MsgSet::add gains a strict flag that rejects messages exceeding MAX_NONCE_GAP (4 for trusted, 0 for untrusted) and blocks replace-by-fee when a gap exists. Reorg re-additions use strict=false.
  • Early balance check in MpoolPushMessage to reject before consuming a nonce.

  • forest-wallet list cmd now shows a nonce column with table formatting.

Reference issue to close (if applicable)

Closes #4899
Closes #3628

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Outside contributions

  • I have read and agree to the CONTRIBUTING document.
  • I have read and agree to the AI Policy document. I understand that failure to comply with the guidelines will lead to rejection of the pull request.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed message pool nonce calculation to align with Lotus
  • New Features

    • Added nonce tracking and persistence for message pool operations
    • Implemented nonce gap detection in message pool submissions
    • Wallet balance validation now occurs before message push operations
    • Enhanced wallet list display with formatted table showing address balances and nonces
    • Improved delegated address message signing support

@sudo-shashank sudo-shashank added the RPC requires calibnet RPC checks to run on CI label Mar 25, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

Walkthrough

This PR implements message pool nonce calculation alignment with Lotus by introducing a NonceTracker component that persists nonces across restarts, adds address resolution with nonce gap detection, and implements per-address async locking via MpoolLocker for RPC serialization.

Changes

Cohort / File(s) Summary
CI & Changelog
.github/workflows/forest.yml, CHANGELOG.md
Added concurrency policies to wallet check jobs and documented nonce calculation fix for message pool alignment with Lotus.
Wallet Test Scripts
scripts/tests/calibnet_wallet_check.sh, scripts/tests/calibnet_delegated_wallet_check.sh
Updated wallet address field extraction, added Filecoin-to-Ethereum address conversion with balance verification and remote wallet testing, and introduced timeout handling for balance polling.
Nonce Tracking & Concurrency
src/message_pool/nonce_tracker.rs, src/message_pool/mpool_locker.rs
Introduced NonceTracker for persisting per-address nonce state in SettingsStore with sign_and_push workflow, and MpoolLocker for per-address async locking to serialize RPC operations.
Message Pool Re-exports
src/message_pool/mod.rs, src/message_pool/errors.rs
Added public exports for NonceTracker and MpoolLocker; added NonceGap error variant for nonce gap detection.
Message Pool Core Logic
src/message_pool/msgpool/mod.rs, src/message_pool/msgpool/msg_pool.rs
Implemented address resolution to canonical key form, added key_cache and state_nonce_cache LRU caches, introduced nonce gap detection with strict/untrusted modes, threaded caches through head_change and message selection, and updated pending map keying to use resolved addresses.
Message Pool Provider
src/message_pool/msgpool/provider.rs, src/message_pool/msgpool/test_provider.rs
Extended Provider trait with resolve_to_key (ID-to-key resolution) and messages_for_tipset methods; updated test implementation with address mapping and message retrieval.
Message Pool Selection
src/message_pool/msgpool/selection.rs
Updated head-change simulation to accept key_cache parameter and refactored test setup to use apply_head_change wrapper instead of direct head_change calls.
Key Management Signing
src/key_management/wallet_helpers.rs
Added public sign_message helper for delegated and standard key types with EIP-1559 transaction support and comprehensive unit tests.
RPC State & Daemon
src/rpc/mod.rs, src/daemon/mod.rs, src/rpc/methods/sync.rs
Added mpool_locker and nonce_tracker fields to RPCState struct; initialized both components in daemon, offline server, and test contexts.
RPC Methods
src/rpc/methods/mpool.rs, src/rpc/methods/wallet.rs
Updated MpoolPushMessage to acquire per-address locks, delegate nonce handling to NonceTracker::sign_and_push, and validate wallet balance; refactored WalletSignMessage to use the new sign_message helper.
Tool Commands
src/tool/offline_server/server.rs, src/tool/subcommands/api_cmd/generate_test_snapshot.rs, src/tool/subcommands/api_cmd/test_snapshot.rs
Extended RPC state initialization in offline server and test snapshot tools with NonceTracker and MpoolLocker components.
Wallet CLI
src/wallet/subcommands/wallet_cmd.rs
Simplified message signing logic to use the new sign_message helper; refactored WalletCommands::List to fetch actor state concurrently, derive nonce from state, and render output as a formatted table.

Sequence Diagram

sequenceDiagram
    actor Client
    participant RPC Handler
    participant MpoolLocker
    participant NonceTracker
    participant MessagePool
    participant Provider
    participant SettingsStore
    
    Client->>RPC Handler: MpoolPushMessage(message)
    RPC Handler->>MpoolLocker: take_lock(sender_address)
    MpoolLocker->>MpoolLocker: insert or clone per-address Mutex
    MpoolLocker-->>RPC Handler: OwnedMutexGuard (locked)
    
    RPC Handler->>NonceTracker: sign_and_push(message, key)
    NonceTracker->>NonceTracker: acquire global Mutex
    NonceTracker->>MessagePool: get current sequence
    MessagePool->>Provider: resolve_to_key(address)
    Provider-->>MessagePool: canonical key address
    NonceTracker->>SettingsStore: retrieve persisted nonce
    SettingsStore-->>NonceTracker: nonce or default
    NonceTracker->>NonceTracker: compute next_nonce = max(mpool, persisted)
    NonceTracker->>NonceTracker: sign message with nonce
    NonceTracker->>MessagePool: push SignedMessage
    MessagePool-->>NonceTracker: success
    NonceTracker->>SettingsStore: save_nonce(nonce + 1)
    NonceTracker-->>RPC Handler: SignedMessage
    RPC Handler-->>Client: result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • LesnyRumcajs
  • hanabi1224
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Revamp Message Pool Nonce calculation' directly corresponds to the main objective of the PR and accurately summarizes the primary changes across nonce handling, serialization, gap enforcement, and persistence.
Linked Issues check ✅ Passed The PR comprehensively addresses both linked issues: #4899 (nonce calculation refactoring with state sequence scanning, caching, strict gap enforcement) and #3628 (reorg handling via nonce persistence and mpool resend logic) with supporting code changes, tests, and CLI updates.
Out of Scope Changes check ✅ Passed All changes are directly aligned with stated objectives: nonce calculation improvements, mpool/nonce synchronization primitives, delegated wallet signing support, and CI/test infrastructure updates to support the new functionality.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch shashank/revamp-mpool-nonce-calc
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch shashank/revamp-mpool-nonce-calc

Comment @coderabbitai help to get the list of available commands and usage tips.

@sudo-shashank sudo-shashank marked this pull request as ready for review March 25, 2026 06:56
@sudo-shashank sudo-shashank requested a review from a team as a code owner March 25, 2026 06:56
@sudo-shashank sudo-shashank requested review from LesnyRumcajs and hanabi1224 and removed request for a team March 25, 2026 06:56
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/tool/subcommands/api_cmd/generate_test_snapshot.rs (1)

146-160: ⚠️ Potential issue | 🟠 Major

Don't bind NonceTracker to the shared snapshot-generation DB.

load_db() gives every snapshot run the same Arc<ReadOpsTrackingStore<ManyCar<ParityDb>>>, and this wrapper writes settings through to inner, not tracker. With nonce_tracker pointed at state_manager.blockstore_owned(), persisted nonce-cache updates will bleed into later snapshot generations and may still be absent from the exported minimal snapshot unless those keys get read again. This will make stateful fixtures order-dependent once MpoolPushMessage / MpoolGetNonce snapshots are added.

Based on learnings, "when using ReadOpsTrackingStore to generate minimal snapshots, HEAD_KEY should be written to db.tracker (not db itself) before calling export_forest_car(), because the export reads from the tracker MemoryDB which accumulates only the accessed data during computation."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tool/subcommands/api_cmd/generate_test_snapshot.rs` around lines 146 -
160, NonceTracker is being constructed against state_manager.blockstore_owned()
which points at the full DB; instead bind the nonce cache to the
ReadOpsTrackingStore tracker used for snapshot generation so persisted nonce
writes go into the ephemeral tracker not the underlying DB. Change the
NonceTracker creation to use the tracking store (the db.tracker returned by
load_db() / the ReadOpsTrackingStore wrapper) rather than
state_manager.blockstore_owned(), and ensure HEAD_KEY is written to db.tracker
(tracker) before calling export_forest_car() so the export reads the tracked
(accessed) keys only.
src/message_pool/msgpool/mod.rs (1)

324-342: ⚠️ Potential issue | 🟠 Major

Canonicalize the rmsgs key as well.

After this refactor pending is keyed by resolved key address, but Lines 333-342 still probe rmsgs by raw from. On a revert/apply where the same actor shows up as f0… on one side and key-address form on the other, the apply path misses the reverted entry, removes the pending entry instead, and then re-adds the reverted message at the end.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/message_pool/msgpool/mod.rs` around lines 324 - 342, The function
remove_from_selected_msgs is checking rmsgs by the raw from address while
pending is keyed by resolved key addresses; call resolve_to_key(api, key_cache,
from, cur_ts) once at the start of the lookup and then use the resolved Address
for both rmsgs and pending accesses (i.e., replace rmsgs.get_mut(from) with
rmsgs.get_mut(&resolved) and use resolved in the remove(...) calls), ensuring
you still propagate the Result error from resolve_to_key and keep the existing
logic for removing the sequence from the temp map when present.
🧹 Nitpick comments (3)
scripts/tests/calibnet_wallet_check.sh (1)

76-76: Avoid scraping the human list table for ADDR_ONE.

This script already had to change once because the list layout moved. Pulling the address out of tail | cut keeps the test coupled to presentation-only output and will break again on the next formatting tweak. Prefer a command or output mode that returns just the address.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/tests/calibnet_wallet_check.sh` at line 76, The current extraction of
ADDR_ONE by piping "$FOREST_WALLET_PATH list | tail -1 | cut -d ' ' -f2" scrapes
the human-formatted table and is fragile; change the ADDR_ONE assignment to use
a machine-readable/listing option or output mode from $FOREST_WALLET_PATH (e.g.,
a flag that emits just addresses or JSON) and parse that output (or use a
--quiet/--format option) to reliably select the desired address, so the script
uses the wallet CLI's non-presentational output instead of scraping the printed
table.
src/message_pool/mpool_locker.rs (1)

17-22: Document MpoolLocker::new().

The public constructor is the only exposed MpoolLocker API here without rustdoc.

As per coding guidelines, "Document public functions and structs with doc comments".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/message_pool/mpool_locker.rs` around lines 17 - 22, Add a doc comment for
the public constructor MpoolLocker::new() describing what a MpoolLocker
represents and what the new() instance contains/initializes (e.g., an empty
inner Mutex<HashMap> used to track per-message-pool locks), include any
thread-safety or ownership notes relevant to callers, and place the triple-slash
/// doc immediately above the impl block or the new() function so rustdoc will
document it alongside the MpoolLocker API.
src/message_pool/msgpool/test_provider.rs (1)

229-244: Consider adding BLS message support for completeness.

The messages_for_tipset implementation only collects signed messages. While this is consistent with TestApi's internal storage (which only stores SignedMessage), the production ChainStore::messages_for_tipset collects both unsigned BLS and signed SECP messages via BlockMessages::for_tipset.

This discrepancy is acceptable for current tests since they primarily use SECP messages, but if future tests require BLS message handling during head changes, this may need enhancement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/message_pool/msgpool/test_provider.rs` around lines 229 - 244,
messages_for_tipset currently only returns signed messages (using inner.bmsgs
and ChainMessage::Signed), so add support for BLS/unsigned messages by extending
TestApi's storage and collecting them in the same loop: add a container for
BLS/unsigned messages (e.g., inner.bls_msgs or similar) and in
messages_for_tipset iterate blocks' CIDs to append both signed messages
(ChainMessage::Signed(Arc::new(...))) and unsigned/BLS messages
(ChainMessage::Unsigned or the appropriate ChainMessage variant for BLS) into
msgs before returning; update any TestApi methods that populate messages so
tests can insert BLS messages into the new storage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Line 40: The changelog entry currently references the PR number; update the
entry in CHANGELOG.md so it links to the issue `#4899` instead of PR `#6788`
(replace the PR link "[`#6788`](...)" with the issue link "[`#4899`](...)" and keep
the description "Fixed message pool nonce calculation to align with Lotus.") to
follow the repo convention of preferring issue references when both issue and PR
exist.

In `@scripts/tests/calibnet_delegated_wallet_check.sh`:
- Around line 87-100: The loop that polls DELEGATE_ADDR_REMOTE_THREE_BALANCE can
false-pass because it only compares to the prior observed balance and doesn’t
ensure MSG_DELEGATE_FOUR actually landed; change the logic in the block using
MSG_DELEGATE_FOUR, DELEGATE_ADDR_REMOTE_THREE_BALANCE and
DELEGATE_ADDR_THREE_BALANCE to either (A) wait for the message CID in
MSG_DELEGATE_FOUR to be included/confirmed before returning (preferred), or (B)
if keeping balance-polling, fail the script when the loop reaches the 20 retry
timeout instead of continuing silently (exit non-zero and log an error), and
keep updating DELEGATE_ADDR_REMOTE_THREE_BALANCE via $FOREST_WALLET_PATH
--remote-wallet balance each iteration; ensure the chosen approach references
MSG_DELEGATE_FOUR for waiting or uses an explicit exit on timeout.

In `@scripts/tests/calibnet_wallet_check.sh`:
- Around line 157-181: The loops currently only compare balances
(ETH_ADDR_TWO_BALANCE / ETH_ADDR_THREE_BALANCE) and will silently succeed after
retries even if the send commands (MSG_ETH, MSG_ETH_REMOTE) never produced a
message CID; change the logic to wait on the actual returned message CIDs from
MSG_ETH and MSG_ETH_REMOTE (extract the CID from MSG_ETH / MSG_ETH_REMOTE) and
poll the node/wallet for that CID confirmation, or at minimum make the 20-retry
timeout fatal by exiting non-zero if the CID was not observed; update the two
loops that reference ETH_ADDR_TWO_BALANCE and ETH_ADDR_THREE_BALANCE to use the
extracted CID variables and fail with exit 1 on timeout so the script cannot
pass silently.

In `@src/message_pool/mpool_locker.rs`:
- Around line 52-74: Tests use timing sleeps which makes them flaky; replace the
sleep-based handshakes in the two test blocks (the tasks spawned as t1/t2 that
call locker.take_lock and use first_entered/first_released/second_saw_first)
with a deterministic sync primitive such as tokio::sync::Notify or
tokio::sync::Barrier: have the first task signal (notify.notify_one() or
barrier.wait().await) immediately after it acquires the lock (where it currently
sets entered.store) and have the second task await that signal before attempting
its assertion (instead of sleeping), and likewise use a notify or barrier to
signal release instead of the 20ms/100ms sleeps; apply the same change to the
other test block referenced (the one around lines 94-116) so both tests no
longer rely on timing.

In `@src/message_pool/msgpool/msg_pool.rs`:
- Around line 307-315: The code masks resolve_to_key failures by using
unwrap_or(msg.from()), which can undercount nonces; change the scan to propagate
resolution errors instead: call resolve_to_key(api, key_cache, &msg.from(),
cur_ts)? (or otherwise handle the Result and return Err) instead of unwrap_or,
and adjust the surrounding function's signature/return path to propagate the
error; refer to resolve_to_key, messages_for_tipset, msg.from(), and next_nonce
when making this change so the failure during the current tipset scan is not
silently converted to the original address.

In `@src/message_pool/nonce_tracker.rs`:
- Around line 81-87: The code currently pushes the signed message with
mpool.push(smsg.clone()).await? and then calls self.save_nonce(&message.from,
nonce)? which can return an error and cause MpoolPushMessage to report failure
despite the message already being in the mempool; change this to make
persistence best-effort: call self.save_nonce(&message.from, nonce) but do not
propagate its error — instead catch/log the error (e.g. with error!/warn!) and
still return Ok(smsg). Keep the same call order (sign_message, mpool.push,
save_nonce) but ensure save_nonce failures do not convert the whole operation
into an error.

In `@src/rpc/methods/mpool.rs`:
- Around line 301-306: MpoolGetNonce is reading the nonce directly from mpool
via mpool.get_sequence while WalletSignMessage/MpoolPush use
ctx.nonce_tracker.sign_and_push, causing mismatch after restarts; change
MpoolGetNonce to query the same NonceTracker (e.g., call the nonce-tracking
getter on ctx.nonce_tracker instead of mpool.get_sequence) so both flows use the
same source, and ensure any other code paths that return sequence numbers
(mpool.get_sequence, MpoolPushMessage) are routed through NonceTracker APIs to
keep nonce state consistent.

---

Outside diff comments:
In `@src/message_pool/msgpool/mod.rs`:
- Around line 324-342: The function remove_from_selected_msgs is checking rmsgs
by the raw from address while pending is keyed by resolved key addresses; call
resolve_to_key(api, key_cache, from, cur_ts) once at the start of the lookup and
then use the resolved Address for both rmsgs and pending accesses (i.e., replace
rmsgs.get_mut(from) with rmsgs.get_mut(&resolved) and use resolved in the
remove(...) calls), ensuring you still propagate the Result error from
resolve_to_key and keep the existing logic for removing the sequence from the
temp map when present.

In `@src/tool/subcommands/api_cmd/generate_test_snapshot.rs`:
- Around line 146-160: NonceTracker is being constructed against
state_manager.blockstore_owned() which points at the full DB; instead bind the
nonce cache to the ReadOpsTrackingStore tracker used for snapshot generation so
persisted nonce writes go into the ephemeral tracker not the underlying DB.
Change the NonceTracker creation to use the tracking store (the db.tracker
returned by load_db() / the ReadOpsTrackingStore wrapper) rather than
state_manager.blockstore_owned(), and ensure HEAD_KEY is written to db.tracker
(tracker) before calling export_forest_car() so the export reads the tracked
(accessed) keys only.

---

Nitpick comments:
In `@scripts/tests/calibnet_wallet_check.sh`:
- Line 76: The current extraction of ADDR_ONE by piping "$FOREST_WALLET_PATH
list | tail -1 | cut -d ' ' -f2" scrapes the human-formatted table and is
fragile; change the ADDR_ONE assignment to use a machine-readable/listing option
or output mode from $FOREST_WALLET_PATH (e.g., a flag that emits just addresses
or JSON) and parse that output (or use a --quiet/--format option) to reliably
select the desired address, so the script uses the wallet CLI's
non-presentational output instead of scraping the printed table.

In `@src/message_pool/mpool_locker.rs`:
- Around line 17-22: Add a doc comment for the public constructor
MpoolLocker::new() describing what a MpoolLocker represents and what the new()
instance contains/initializes (e.g., an empty inner Mutex<HashMap> used to track
per-message-pool locks), include any thread-safety or ownership notes relevant
to callers, and place the triple-slash /// doc immediately above the impl block
or the new() function so rustdoc will document it alongside the MpoolLocker API.

In `@src/message_pool/msgpool/test_provider.rs`:
- Around line 229-244: messages_for_tipset currently only returns signed
messages (using inner.bmsgs and ChainMessage::Signed), so add support for
BLS/unsigned messages by extending TestApi's storage and collecting them in the
same loop: add a container for BLS/unsigned messages (e.g., inner.bls_msgs or
similar) and in messages_for_tipset iterate blocks' CIDs to append both signed
messages (ChainMessage::Signed(Arc::new(...))) and unsigned/BLS messages
(ChainMessage::Unsigned or the appropriate ChainMessage variant for BLS) into
msgs before returning; update any TestApi methods that populate messages so
tests can insert BLS messages into the new storage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9c53ddd6-e120-4024-a80f-227d8c503f0b

📥 Commits

Reviewing files that changed from the base of the PR and between 7868455 and e25fabd.

📒 Files selected for processing (24)
  • .github/workflows/forest.yml
  • CHANGELOG.md
  • docs/docs/developers/guides/nonce_handling.md
  • scripts/tests/calibnet_delegated_wallet_check.sh
  • scripts/tests/calibnet_wallet_check.sh
  • src/daemon/mod.rs
  • src/key_management/wallet_helpers.rs
  • src/message_pool/errors.rs
  • src/message_pool/mod.rs
  • src/message_pool/mpool_locker.rs
  • src/message_pool/msgpool/mod.rs
  • src/message_pool/msgpool/msg_pool.rs
  • src/message_pool/msgpool/provider.rs
  • src/message_pool/msgpool/selection.rs
  • src/message_pool/msgpool/test_provider.rs
  • src/message_pool/nonce_tracker.rs
  • src/rpc/methods/mpool.rs
  • src/rpc/methods/sync.rs
  • src/rpc/methods/wallet.rs
  • src/rpc/mod.rs
  • src/tool/offline_server/server.rs
  • src/tool/subcommands/api_cmd/generate_test_snapshot.rs
  • src/tool/subcommands/api_cmd/test_snapshot.rs
  • src/wallet/subcommands/wallet_cmd.rs
💤 Files with no reviewable changes (1)
  • .github/workflows/forest.yml

@sudo-shashank sudo-shashank marked this pull request as draft March 25, 2026 07:15
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/message_pool/nonce_tracker.rs (1)

193-215: Good test coverage for sequential nonce assignment.

The test verifies that consecutive sign_and_push calls assign sequential nonces (0, 1). Consider adding a concurrent test that spawns multiple tasks calling sign_and_push simultaneously to verify the global mutex correctly prevents nonce collisions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/message_pool/nonce_tracker.rs` around lines 193 - 215, Add a new async
test alongside test_sign_and_push_assigns_sequential_nonces that spawns multiple
concurrent tasks which call NonceTracker::sign_and_push using the same tracker,
mpool, wallet key (from make_test_pool_and_wallet) and sender to ensure no two
returned SignedMessage instances share the same sequence; use
make_test_nonce_store() to create the tracker and await all tasks (e.g., via
join_all) then assert that the set of message().sequence values is contiguous
and unique (0..n-1) to verify the global mutex prevents nonce collisions under
concurrency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/message_pool/nonce_tracker.rs`:
- Around line 193-215: Add a new async test alongside
test_sign_and_push_assigns_sequential_nonces that spawns multiple concurrent
tasks which call NonceTracker::sign_and_push using the same tracker, mpool,
wallet key (from make_test_pool_and_wallet) and sender to ensure no two returned
SignedMessage instances share the same sequence; use make_test_nonce_store() to
create the tracker and await all tasks (e.g., via join_all) then assert that the
set of message().sequence values is contiguous and unique (0..n-1) to verify the
global mutex prevents nonce collisions under concurrency.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ce841c06-e607-4d24-9a46-75e1d2eef067

📥 Commits

Reviewing files that changed from the base of the PR and between e25fabd and de77141.

📒 Files selected for processing (6)
  • scripts/tests/calibnet_delegated_wallet_check.sh
  • scripts/tests/calibnet_wallet_check.sh
  • src/daemon/mod.rs
  • src/message_pool/mpool_locker.rs
  • src/message_pool/msgpool/msg_pool.rs
  • src/message_pool/nonce_tracker.rs
✅ Files skipped from review due to trivial changes (1)
  • src/message_pool/msgpool/msg_pool.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • scripts/tests/calibnet_delegated_wallet_check.sh
  • src/message_pool/mpool_locker.rs
  • scripts/tests/calibnet_wallet_check.sh

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 89.12249% with 119 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.24%. Comparing base (5146687) to head (a5f884a).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/wallet/subcommands/wallet_cmd.rs 0.00% 43 Missing ⚠️
src/rpc/methods/mpool.rs 0.00% 18 Missing ⚠️
src/message_pool/msgpool/mod.rs 91.20% 10 Missing and 6 partials ⚠️
src/message_pool/msgpool/msg_pool.rs 97.86% 5 Missing and 4 partials ⚠️
src/message_pool/msgpool/selection.rs 81.25% 9 Missing ⚠️
src/message_pool/nonce_tracker.rs 94.66% 3 Missing and 5 partials ⚠️
src/key_management/wallet_helpers.rs 93.57% 2 Missing and 5 partials ⚠️
src/daemon/mod.rs 0.00% 3 Missing ⚠️
...tool/subcommands/api_cmd/generate_test_snapshot.rs 0.00% 3 Missing ⚠️
src/rpc/methods/wallet.rs 0.00% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
Files with missing lines Coverage Δ
src/message_pool/errors.rs 66.66% <ø> (ø)
src/message_pool/mpool_locker.rs 100.00% <100.00%> (ø)
src/message_pool/msgpool/test_provider.rs 99.02% <100.00%> (+0.71%) ⬆️
src/rpc/methods/sync.rs 72.39% <100.00%> (+0.69%) ⬆️
src/rpc/mod.rs 89.20% <ø> (+0.71%) ⬆️
src/tool/offline_server/server.rs 31.27% <100.00%> (+0.92%) ⬆️
src/tool/subcommands/api_cmd/test_snapshot.rs 85.51% <100.00%> (+0.30%) ⬆️
src/message_pool/msgpool/provider.rs 57.14% <90.00%> (+16.11%) ⬆️
src/rpc/methods/wallet.rs 33.15% <0.00%> (+0.87%) ⬆️
src/daemon/mod.rs 27.82% <0.00%> (-0.15%) ⬇️
... and 8 more

... and 10 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5146687...a5f884a. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@LesnyRumcajs
Copy link
Copy Markdown
Member

@sudo-shashank Can you please summarise what are the effective logic changes?

proceed in parallel for different senders while the nonce-critical section
remains serialized.

## Nonce persistence
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to persiste them?

Copy link
Copy Markdown
Contributor Author

@sudo-shashank sudo-shashank Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a regular case but If the node restarts while messages are broadcast but not yet on-chain, the mpool would reassign them, causing nonce collisions. Persisting the last-assigned nonce to disk ensures we never reuse an in-flight nonce after a node restart

@sudo-shashank
Copy link
Copy Markdown
Contributor Author

@sudo-shashank Can you please summarise what are the effective logic changes?

updated PR description

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
src/message_pool/mpool_locker.rs (1)

26-32: Consider deriving Default alongside new().

clippy::new_without_default suggests implementing Default when a type has a new() constructor taking no arguments.

♻️ Proposed fix
+impl Default for MpoolLocker {
+    fn default() -> Self {
+        Self::new()
+    }
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/message_pool/mpool_locker.rs` around lines 26 - 32, Implement Default for
MpoolLocker to satisfy clippy::new_without_default: add an impl Default for
MpoolLocker { fn default() -> Self { Self::new() } } (or derive Default on the
struct if you prefer and the inner types support it) so that
MpoolLocker::default() is equivalent to MpoolLocker::new(); reference symbols:
MpoolLocker and its new() constructor and the Default trait.
src/message_pool/msgpool/mod.rs (2)

267-292: Consider hoisting the tipset read outside the inner loop.

cur_tipset.read().clone() is called for every message in the loop. While not incorrect, this acquires and releases the read lock repeatedly. Hoisting the read outside the block iteration would be slightly more efficient.

Proposed improvement
         for b in ts.block_headers() {
             let Ok((msgs, smsgs)) = api.messages_for_block(b) else {
                 tracing::error!("error retrieving messages for block");
                 continue;
             };
+            let cur_ts = cur_tipset.read().clone();

             for msg in smsgs {
-                let cur_ts = cur_tipset.read().clone();
                 remove_from_selected_msgs(
                     api,
                     key_cache,
                     &cur_ts,
                     // ... rest unchanged
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/message_pool/msgpool/mod.rs` around lines 267 - 292, The loops call
cur_tipset.read().clone() for every message, causing repeated read-lock
acquires; hoist the tipset snapshot outside the inner loops by reading and
cloning cur_tipset once (e.g., let cur_ts = cur_tipset.read().clone();) before
iterating over smsgs and msgs, then pass that cur_ts into
remove_from_selected_msgs; update both loops that call remove_from_selected_msgs
(the one iterating smsgs and the one iterating msgs) to use the single
precomputed cur_ts and keep existing logic around repub/republished and rmsgs
unchanged.

361-368: Consider using entry API to avoid the unwrap().

While the unwrap() on line 366 cannot actually fail (since the key was just inserted on line 365), it would be cleaner to use the entry API to avoid the pattern entirely. As per coding guidelines, unwrap() should be avoided in production code.

Cleaner entry-based approach
 pub(in crate::message_pool) fn add_to_selected_msgs(
     m: SignedMessage,
     rmsgs: &mut HashMap<Address, HashMap<u64, SignedMessage>>,
 ) {
-    let s = rmsgs.get_mut(&m.from());
-    if let Some(s) = s {
-        s.insert(m.sequence(), m);
-    } else {
-        rmsgs.insert(m.from(), HashMap::new());
-        rmsgs.get_mut(&m.from()).unwrap().insert(m.sequence(), m);
-    }
+    rmsgs
+        .entry(m.from())
+        .or_insert_with(HashMap::new)
+        .insert(m.sequence(), m);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/message_pool/msgpool/mod.rs` around lines 361 - 368, Replace the manual
get_mut/insert/unwrap pattern on rmsgs with the HashMap entry API: use
rmsgs.entry(m.from()).or_insert_with(HashMap::new) (or similar) to obtain a
mutable inner map and then insert m using m.sequence() as the key; update the
block around rmsgs, m.from(), and m.sequence() to avoid any unwrap() calls and
make the insertion atomic via entry().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/forest.yml:
- Around line 280-285: The umbrella workflow job integration-tests-status is
missing the newly re-enabled calibnet-delegated-wallet-check in its needs list;
update the integration-tests-status job definition to include
"calibnet-delegated-wallet-check" in its needs array so the umbrella job waits
for and reflects the delegated wallet check result (ensure the job name matches
exactly "calibnet-delegated-wallet-check" when adding it to
integration-tests-status's needs).

In `@src/message_pool/msgpool/msg_pool.rs`:
- Around line 323-337: The current combined if-let on resolve_to_key and
api.messages_for_tipset silently skips the tipset scan on any error; change this
to explicitly handle errors from resolve_to_key(api, key_cache, addr, cur_ts)
and api.messages_for_tipset(cur_ts) (e.g. use match or the ? operator) so
failures are returned/propagated instead of ignored; then perform the loop over
messages (examining msg.from(), resolve_to_key for each sender, comparing to the
resolved sender, and updating next_nonce from msg.sequence()+1) only when both
calls succeed, ensuring caller sees the error rather than silently undercounting
nonces.

---

Nitpick comments:
In `@src/message_pool/mpool_locker.rs`:
- Around line 26-32: Implement Default for MpoolLocker to satisfy
clippy::new_without_default: add an impl Default for MpoolLocker { fn default()
-> Self { Self::new() } } (or derive Default on the struct if you prefer and the
inner types support it) so that MpoolLocker::default() is equivalent to
MpoolLocker::new(); reference symbols: MpoolLocker and its new() constructor and
the Default trait.

In `@src/message_pool/msgpool/mod.rs`:
- Around line 267-292: The loops call cur_tipset.read().clone() for every
message, causing repeated read-lock acquires; hoist the tipset snapshot outside
the inner loops by reading and cloning cur_tipset once (e.g., let cur_ts =
cur_tipset.read().clone();) before iterating over smsgs and msgs, then pass that
cur_ts into remove_from_selected_msgs; update both loops that call
remove_from_selected_msgs (the one iterating smsgs and the one iterating msgs)
to use the single precomputed cur_ts and keep existing logic around
repub/republished and rmsgs unchanged.
- Around line 361-368: Replace the manual get_mut/insert/unwrap pattern on rmsgs
with the HashMap entry API: use
rmsgs.entry(m.from()).or_insert_with(HashMap::new) (or similar) to obtain a
mutable inner map and then insert m using m.sequence() as the key; update the
block around rmsgs, m.from(), and m.sequence() to avoid any unwrap() calls and
make the insertion atomic via entry().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 99615cba-8153-4242-a701-3f306c25e967

📥 Commits

Reviewing files that changed from the base of the PR and between 856be21 and 8294ad5.

📒 Files selected for processing (9)
  • .github/workflows/forest.yml
  • CHANGELOG.md
  • scripts/tests/calibnet_delegated_wallet_check.sh
  • src/message_pool/mpool_locker.rs
  • src/message_pool/msgpool/mod.rs
  • src/message_pool/msgpool/msg_pool.rs
  • src/message_pool/msgpool/selection.rs
  • src/message_pool/nonce_tracker.rs
  • src/tool/subcommands/api_cmd/generate_test_snapshot.rs
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/tool/subcommands/api_cmd/generate_test_snapshot.rs
  • scripts/tests/calibnet_delegated_wallet_check.sh
  • src/message_pool/msgpool/selection.rs
  • src/message_pool/nonce_tracker.rs

Comment on lines +280 to +285
concurrency:
group: calibnet-wallet-tests
cancel-in-progress: false
needs:
- build-ubuntu
- calibnet-wallet-check
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wire this re-enabled job into integration-tests-status.

calibnet-delegated-wallet-check now runs, but the umbrella job at Lines 672-693 still does not list it in needs. integration-tests-status can therefore pass while delegated wallet coverage is failing or skipped.

Patch for the umbrella job
   integration-tests-status:
     needs:
       - build-macos
       - build-ubuntu
       - cargo-publish-dry-run
       - forest-cli-check
       - calibnet-check
       - calibnet-stateless-mode-check
       - calibnet-stateless-rpc-check
       - state-migrations-check
       - calibnet-wallet-check
+      - calibnet-delegated-wallet-check
       - calibnet-export-check-v2
       - calibnet-no-discovery-checks
       - calibnet-kademlia-checks
       - calibnet-eth-mapping-check
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/forest.yml around lines 280 - 285, The umbrella workflow
job integration-tests-status is missing the newly re-enabled
calibnet-delegated-wallet-check in its needs list; update the
integration-tests-status job definition to include
"calibnet-delegated-wallet-check" in its needs array so the umbrella job waits
for and reflects the delegated wallet check result (ensure the job name matches
exactly "calibnet-delegated-wallet-check" when adding it to
integration-tests-status's needs).

Comment on lines +323 to +337
if let (Ok(resolved), Ok(messages)) = (
resolve_to_key(api, key_cache, addr, cur_ts),
api.messages_for_tipset(cur_ts),
) {
for msg in &messages {
if let Ok(from) = resolve_to_key(api, key_cache, &msg.from(), cur_ts)
&& from == resolved
{
let n = msg.sequence() + 1;
if n > next_nonce {
next_nonce = n;
}
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't silently skip the tipset message scan when operations fail.

The if let (Ok(...), Ok(...)) pattern silently ignores errors from resolve_to_key and messages_for_tipset. If either fails, the scan is skipped entirely, potentially undercounting nonces and leading to nonce reuse — the exact failure this refactor aims to prevent.

This is similar to the concern raised in a previous review about masking sender-resolution failures.

Propagate errors instead of silently skipping
-    if let (Ok(resolved), Ok(messages)) = (
-        resolve_to_key(api, key_cache, addr, cur_ts),
-        api.messages_for_tipset(cur_ts),
-    ) {
-        for msg in &messages {
-            if let Ok(from) = resolve_to_key(api, key_cache, &msg.from(), cur_ts)
-                && from == resolved
-            {
-                let n = msg.sequence() + 1;
-                if n > next_nonce {
-                    next_nonce = n;
-                }
-            }
-        }
-    }
+    let resolved = resolve_to_key(api, key_cache, addr, cur_ts)?;
+    let messages = api.messages_for_tipset(cur_ts)?;
+    for msg in &messages {
+        let from = resolve_to_key(api, key_cache, &msg.from(), cur_ts)?;
+        if from == resolved {
+            let n = msg.sequence() + 1;
+            if n > next_nonce {
+                next_nonce = n;
+            }
+        }
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/message_pool/msgpool/msg_pool.rs` around lines 323 - 337, The current
combined if-let on resolve_to_key and api.messages_for_tipset silently skips the
tipset scan on any error; change this to explicitly handle errors from
resolve_to_key(api, key_cache, addr, cur_ts) and api.messages_for_tipset(cur_ts)
(e.g. use match or the ? operator) so failures are returned/propagated instead
of ignored; then perform the loop over messages (examining msg.from(),
resolve_to_key for each sender, comparing to the resolved sender, and updating
next_nonce from msg.sequence()+1) only when both calls succeed, ensuring caller
sees the error rather than silently undercounting nonces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RPC requires calibnet RPC checks to run on CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Revamp Message Pool Nonce calculation Forest mpool should handle chain reorganisation

2 participants