Skip to content

feat: add sliding block history to executor#1047

Draft
bmuddha wants to merge 2 commits intobmuddha/replication/protocolfrom
bmuddha/executor/block-tracker
Draft

feat: add sliding block history to executor#1047
bmuddha wants to merge 2 commits intobmuddha/replication/protocolfrom
bmuddha/executor/block-tracker

Conversation

@bmuddha
Copy link
Collaborator

@bmuddha bmuddha commented Mar 13, 2026

Summary

Keep limited block history in transaction executor. This allows for the executor to set the environment per each transaction individually.

Compatibility

  • No breaking changes

Checklist

Summary by CodeRabbit

  • Refactor
    • Improved per-slot block history and slot transition handling to make state alignment more reliable across transactions.
    • Optimized synchronization during transaction processing to reduce contention and better support cross-slot workloads.
  • Logging
    • Added warnings when attempting to transition to an untracked slot to aid troubleshooting and visibility.

Copy link
Collaborator Author

bmuddha commented Mar 13, 2026

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

The PR implements a per-slot block history in the executor using a BTreeMap<Slot, LatestBlockInner> bounded by BLOCK_HISTORY_SIZE. It adds register_new_block and transition_to_slot to insert/truncate history and align the processor/environment to a requested slot, emitting a warning for unregistered slots. Transaction processing now calls transition_to_slot when a transaction's slot differs from the processor slot. Synchronization was changed to use an accountsdb write lock/read guard. LatestBlockInner now derives Clone.

Assessment against linked issues

Objective (issue#) Addressed Explanation
Maintain a sliding window / bounded history of recent blocks (#[1046])
Use the block history to set up executor environment per transaction slot (#[1046])
Drive slot/state updates from history for deterministic replication (#[1046])

Out-of-scope changes

Code Change (file_path) Explanation
Added Clone derive to LatestBlockInner (magicblock-ledger/src/lib.rs) This derive change affects a ledger struct and is not required by the executor block-history objectives; it is unrelated to implementing per-slot history.

Suggested reviewers

  • thlorenz
  • GabrielePicco
✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bmuddha/executor/block-tracker
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bmuddha bmuddha self-assigned this Mar 13, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@magicblock-processor/src/executor/mod.rs`:
- Around line 175-177: The code currently calls
transition_to_slot(transaction.slot) and proceeds to execute the transaction
even if the slot transition failed; modify transition_to_slot (or its caller) so
it returns a clear success/failure indicator (e.g., Result<(), SlotError> or
bool) and update the call site in the executor loop to check that return value
and early-return/skip execution when the transition fails (do not call
execute_transaction or any downstream logic). Ensure you update all call sites
noted (the block around where transaction.slot != self.processor.slot and the
nearby ranges referenced) to handle the failure case consistently and preserve
existing logging from transition_to_slot.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0c01457c-e664-4bda-bd60-3db752062f14

📥 Commits

Reviewing files that changed from the base of the PR and between edf70ca and cae6fe8.

📒 Files selected for processing (2)
  • magicblock-ledger/src/lib.rs
  • magicblock-processor/src/executor/mod.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
magicblock-processor/src/executor/mod.rs (1)

171-177: ⚠️ Potential issue | 🔴 Critical

Prevent stale-context execution when slot transition is unavailable.

Line 175 currently proceeds even if transition_to_slot cannot resolve the requested slot (Line 213), and biased selection prioritizes transactions over pending block updates (Line 171), which can trigger this path under normal timing. This can execute transactions with the wrong slot/blockhash/sysvars.

Proposed fix
             tokio::select! {
                 biased;
+                _ = block_updated.recv() => {
+                    // Unlock to allow global ops (snapshots), then
+                    // register the new block for future transactions
+                    RwLockReadGuard::unlock_fair(guard);
+                    self.register_new_block();
+                    guard = sync.read();
+                }
                 txn = self.rx.recv() => {
                     let Some(transaction) = txn else { break };
-                    if transaction.slot != self.processor.slot {
-                        self.transition_to_slot(transaction.slot);
+                    if transaction.slot != self.processor.slot
+                        && !self.transition_to_slot(transaction.slot)
+                    {
+                        warn!(
+                            requested_slot = transaction.slot,
+                            current_slot = self.processor.slot,
+                            "cannot execute transaction: slot context unavailable"
+                        );
+                        let _ = self.ready_tx.try_send(self.id);
+                        continue;
                     }
                     match transaction.txn.mode {
                         TransactionProcessingMode::Execution(_) => {
                             self.execute(transaction, None);
                         }
@@
-                _ = block_updated.recv() => {
-                    // Unlock to allow global ops (snapshots), then
-                    // register the new block for future transactions
-                    RwLockReadGuard::unlock_fair(guard);
-                    self.register_new_block();
-                    guard = sync.read();
-                }
                 else => break,
             }
@@
-    fn transition_to_slot(&mut self, slot: Slot) {
+    fn transition_to_slot(&mut self, slot: Slot) -> bool {
         let Some(block) = self.block_history.get(&slot) else {
             // should never happen in practice
             warn!(slot, "tried to transition to slot which wasn't registered");
-            return;
+            return false;
         };
         self.environment.blockhash = block.blockhash;
         self.processor.slot = block.slot;
         self.set_sysvars(block);
+        true
     }

Also applies to: 191-197, 212-217

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@magicblock-processor/src/executor/mod.rs` around lines 171 - 177, The code
currently proceeds to handle a received transaction immediately after calling
transition_to_slot(transaction.slot), which can fail and leave execution using
stale slot/blockhash/sysvars; update the txn handling paths that call
transition_to_slot (the tokio::select branch reading from self.rx.recv and the
other similar branches around the txn handling at lines 191-197 and 212-217) to
check the result of transition_to_slot and abort/skipping processing when it
fails: after calling transition_to_slot(transaction.slot) verify its return
(Result/Option/boolean) and if it indicates failure, log or drop the transaction
and continue (do not execute the transaction or call downstream processing using
the old context); apply the same guard to every branch that consumes a
transaction from self.rx.recv so no transaction is processed unless
transition_to_slot succeeded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@magicblock-processor/src/executor/mod.rs`:
- Around line 171-177: The code currently proceeds to handle a received
transaction immediately after calling transition_to_slot(transaction.slot),
which can fail and leave execution using stale slot/blockhash/sysvars; update
the txn handling paths that call transition_to_slot (the tokio::select branch
reading from self.rx.recv and the other similar branches around the txn handling
at lines 191-197 and 212-217) to check the result of transition_to_slot and
abort/skipping processing when it fails: after calling
transition_to_slot(transaction.slot) verify its return (Result/Option/boolean)
and if it indicates failure, log or drop the transaction and continue (do not
execute the transaction or call downstream processing using the old context);
apply the same guard to every branch that consumes a transaction from
self.rx.recv so no transaction is processed unless transition_to_slot succeeded.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 722e34ae-51a5-4d53-a1a0-e6131dafee9f

📥 Commits

Reviewing files that changed from the base of the PR and between cae6fe8 and d59919b.

📒 Files selected for processing (1)
  • magicblock-processor/src/executor/mod.rs

bmuddha added 2 commits March 13, 2026 14:57
Block history can be used to set block related environment
on a per transaction basis.
@bmuddha bmuddha force-pushed the bmuddha/executor/block-tracker branch from d59919b to 9743ae6 Compare March 13, 2026 13:58
@bmuddha bmuddha force-pushed the bmuddha/replication/protocol branch from edf70ca to 1e97a73 Compare March 13, 2026 13:58
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
magicblock-processor/src/executor/mod.rs (2)

156-157: ⚠️ Potential issue | 🟠 Major

Avoid .expect() in production code.

While runtime creation failure is likely fatal, the coding guidelines require proper error handling for .expect() calls.

As per coding guidelines: "Treat any usage of .unwrap() or .expect() in production Rust code as a MAJOR issue."

Proposed fix - propagate error or log and abort cleanly
     pub(super) fn spawn(self) {
         std::thread::spawn(move || {
-            let runtime = Builder::new_current_thread()
+            let Ok(runtime) = Builder::new_current_thread()
                 .thread_name(format!("txn-executor-{}", self.id))
                 .build()
-                .expect("Failed to build executor runtime");
+            else {
+                tracing::error!(executor_id = self.id, "Failed to build executor runtime");
+                return;
+            };
 
             runtime.block_on(tokio::task::unconstrained(self.run()));
         });
     }

Alternatively, if spawn can return a Result, propagate the error to the caller.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@magicblock-processor/src/executor/mod.rs` around lines 156 - 157, The
.expect() on the executor runtime construction (the call ending in
.build().expect("Failed to build executor runtime")) must be replaced with
proper error handling: change the code that calls .build() to capture the
Result, then either return/propagate the error to the caller (e.g., convert to a
suitable Error/Result from the surrounding function) or log the error via your
logger and exit cleanly; ensure you refer to the executor runtime build call and
any surrounding function (the function that spawns or initializes the executor)
when adding the Result handling so the failure path is handled without
panicking.

225-225: ⚠️ Potential issue | 🟠 Major

Handle potential lock poisoning instead of using .unwrap().

The .unwrap() on the sysvar cache lock violates the coding guideline for this path. While lock poisoning is rare, production code should handle it gracefully.

As per coding guidelines: "Treat any usage of .unwrap() or .expect() in production Rust code as a MAJOR issue."

Proposed fix
-        let mut cache = self.processor.writable_sysvar_cache().write().unwrap();
+        let Ok(mut cache) = self.processor.writable_sysvar_cache().write() else {
+            warn!("sysvar cache lock poisoned, skipping sysvar update");
+            return;
+        };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@magicblock-processor/src/executor/mod.rs` at line 225, The current call let
mut cache = self.processor.writable_sysvar_cache().write().unwrap(); uses
unwrap() and must handle PoisonError instead; replace the unwrap with explicit
handling of the Result from write() on the RwLock returned by
writable_sysvar_cache(): match or map the Result, log the poisoning (using the
existing logger or process context available on self.processor) and recover with
poison_error.into_inner() if safe to continue, or return/propagate a descriptive
error from the enclosing function; update any code that assumes cache to use the
recovered guard variable. Ensure you reference writable_sysvar_cache(), the
write() call, and the local cache variable when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@magicblock-processor/src/executor/mod.rs`:
- Around line 156-157: The .expect() on the executor runtime construction (the
call ending in .build().expect("Failed to build executor runtime")) must be
replaced with proper error handling: change the code that calls .build() to
capture the Result, then either return/propagate the error to the caller (e.g.,
convert to a suitable Error/Result from the surrounding function) or log the
error via your logger and exit cleanly; ensure you refer to the executor runtime
build call and any surrounding function (the function that spawns or initializes
the executor) when adding the Result handling so the failure path is handled
without panicking.
- Line 225: The current call let mut cache =
self.processor.writable_sysvar_cache().write().unwrap(); uses unwrap() and must
handle PoisonError instead; replace the unwrap with explicit handling of the
Result from write() on the RwLock returned by writable_sysvar_cache(): match or
map the Result, log the poisoning (using the existing logger or process context
available on self.processor) and recover with poison_error.into_inner() if safe
to continue, or return/propagate a descriptive error from the enclosing
function; update any code that assumes cache to use the recovered guard
variable. Ensure you reference writable_sysvar_cache(), the write() call, and
the local cache variable when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d579261d-e3ed-4fd2-81d2-0c1ddfab8eff

📥 Commits

Reviewing files that changed from the base of the PR and between d59919b and 9743ae6.

📒 Files selected for processing (2)
  • magicblock-ledger/src/lib.rs
  • magicblock-processor/src/executor/mod.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant