Skip to content

feat: isolate and track forester worker concurrency#2343

Open
sergeytimoshin wants to merge 8 commits intomainfrom
sergey/forester-concurrency
Open

feat: isolate and track forester worker concurrency#2343
sergeytimoshin wants to merge 8 commits intomainfrom
sergey/forester-concurrency

Conversation

@sergeytimoshin
Copy link
Contributor

@sergeytimoshin sergeytimoshin commented Mar 14, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Replaced panics with graceful error handling for invalid configurations and proof validation failures.
    • Removed unsafe code blocks with safe deserialization alternatives.
    • Enhanced error propagation throughout proof generation pipeline.
  • New Features

    • Added comprehensive error diagnostics for proof operations.
    • Introduced health checks for prover initialization.
    • Improved batch proof processing with validation and error recovery.
  • Refactor

    • Enhanced concurrent processing with improved parallelism and task orchestration.
    • Optimized proof validation and compression workflows.
    • Streamlined error handling with consistent propagation patterns.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 14, 2026

Warning

Rate limit exceeded

@sergeytimoshin has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 29 minutes and 40 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 15f300bf-f39f-4c7e-a414-2fd3f8631e25

📥 Commits

Reviewing files that changed from the base of the PR and between fbf48e5 and 786422b.

⛔ Files ignored due to path filters (61)
  • Cargo.lock is excluded by !**/*.lock and included by none
  • forester-utils/src/account_zero_copy.rs is excluded by none and included by none
  • forester-utils/src/address_merkle_tree_config.rs is excluded by none and included by none
  • forester-utils/src/forester_epoch.rs is excluded by none and included by none
  • forester-utils/src/instructions/compress_and_close_mint.rs is excluded by none and included by none
  • forester-utils/src/rate_limiter.rs is excluded by none and included by none
  • forester-utils/src/registry.rs is excluded by none and included by none
  • program-tests/account-compression-test/tests/address_merkle_tree_tests.rs is excluded by none and included by none
  • program-tests/account-compression-test/tests/merkle_tree_tests.rs is excluded by none and included by none
  • program-tests/batched-merkle-tree-test/tests/merkle_tree.rs is excluded by none and included by none
  • program-tests/compressed-token-test/tests/freeze/functional.rs is excluded by none and included by none
  • program-tests/compressed-token-test/tests/v1.rs is excluded by none and included by none
  • program-tests/system-cpi-v2-test/tests/event.rs is excluded by none and included by none
  • program-tests/system-cpi-v2-test/tests/invoke_cpi_with_read_only.rs is excluded by none and included by none
  • program-tests/utils/src/account_zero_copy.rs is excluded by none and included by none
  • program-tests/utils/src/actions/legacy/instructions/transfer2.rs is excluded by none and included by none
  • program-tests/utils/src/address_tree_rollover.rs is excluded by none and included by none
  • program-tests/utils/src/assert_compressed_tx.rs is excluded by none and included by none
  • program-tests/utils/src/assert_merkle_tree.rs is excluded by none and included by none
  • program-tests/utils/src/assert_queue.rs is excluded by none and included by none
  • program-tests/utils/src/batched_address_tree.rs is excluded by none and included by none
  • program-tests/utils/src/e2e_test_env.rs is excluded by none and included by none
  • program-tests/utils/src/lib.rs is excluded by none and included by none
  • program-tests/utils/src/mock_batched_forester.rs is excluded by none and included by none
  • program-tests/utils/src/state_tree_rollover.rs is excluded by none and included by none
  • program-tests/utils/src/test_batch_forester.rs is excluded by none and included by none
  • program-tests/utils/src/test_forester.rs is excluded by none and included by none
  • sdk-tests/anchor-manual-test/tests/shared.rs is excluded by none and included by none
  • sdk-tests/anchor-semi-manual-test/tests/shared/mod.rs is excluded by none and included by none
  • sdk-tests/anchor-semi-manual-test/tests/stress_test.rs is excluded by none and included by none
  • sdk-tests/client-test/tests/light_client.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/amm_stress_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/amm_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/basic_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/d10_ata_idempotent_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/d10_token_accounts_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/d11_zero_copy_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/failing_tests.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/integration_tests.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/shared.rs is excluded by none and included by none
  • sdk-tests/pinocchio-light-program-test/tests/shared/mod.rs is excluded by none and included by none
  • sdk-tests/pinocchio-light-program-test/tests/stress_test.rs is excluded by none and included by none
  • sdk-tests/pinocchio-manual-test/tests/shared.rs is excluded by none and included by none
  • sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/read_only.rs is excluded by none and included by none
  • sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-native-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-pinocchio-v1-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-pinocchio-v2-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/ctoken_pda.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/decompress_full_cpi.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/pda_ctoken.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_4_invocations.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_4_transfer2.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_deposit.rs is excluded by none and included by none
  • sdk-tests/sdk-v1-native-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-account-loader-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-ata-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-mint-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-pda-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-token-test/tests/test.rs is excluded by none and included by none
📒 Files selected for processing (63)
  • forester/src/config.rs
  • forester/src/epoch_manager.rs
  • forester/src/forester_status.rs
  • forester/src/main.rs
  • forester/src/metrics.rs
  • forester/src/priority_fee.rs
  • forester/src/processor/v2/proof_worker.rs
  • forester/src/queue_helpers.rs
  • forester/tests/e2e_test.rs
  • forester/tests/legacy/batched_state_async_indexer_test.rs
  • forester/tests/legacy/test_utils.rs
  • forester/tests/test_batch_append_spent.rs
  • forester/tests/test_compressible_pda.rs
  • forester/tests/test_indexer_interface.rs
  • forester/tests/test_utils.rs
  • prover/client/src/constants.rs
  • prover/client/src/errors.rs
  • prover/client/src/helpers.rs
  • prover/client/src/proof.rs
  • prover/client/src/proof_client.rs
  • prover/client/src/proof_types/batch_address_append/json.rs
  • prover/client/src/proof_types/batch_append/json.rs
  • prover/client/src/proof_types/batch_append/proof_inputs.rs
  • prover/client/src/proof_types/batch_update/json.rs
  • prover/client/src/proof_types/batch_update/proof_inputs.rs
  • prover/client/src/proof_types/combined/v1/json.rs
  • prover/client/src/proof_types/combined/v2/json.rs
  • prover/client/src/proof_types/combined/v2/proof_inputs.rs
  • prover/client/src/proof_types/inclusion/v1/json.rs
  • prover/client/src/proof_types/inclusion/v1/proof_inputs.rs
  • prover/client/src/proof_types/inclusion/v2/json.rs
  • prover/client/src/proof_types/inclusion/v2/proof_inputs.rs
  • prover/client/src/proof_types/non_inclusion/v1/json.rs
  • prover/client/src/proof_types/non_inclusion/v1/proof_inputs.rs
  • prover/client/src/proof_types/non_inclusion/v2/json.rs
  • prover/client/src/proof_types/non_inclusion/v2/proof_inputs.rs
  • prover/client/src/prover.rs
  • prover/client/tests/batch_append.rs
  • prover/client/tests/batch_update.rs
  • prover/client/tests/combined.rs
  • prover/client/tests/inclusion.rs
  • prover/client/tests/init_merkle_tree.rs
  • prover/client/tests/non_inclusion.rs
  • sdk-libs/client/src/fee.rs
  • sdk-libs/client/src/indexer/error.rs
  • sdk-libs/client/src/indexer/options.rs
  • sdk-libs/client/src/indexer/photon_indexer.rs
  • sdk-libs/client/src/indexer/types/proof.rs
  • sdk-libs/client/src/indexer/types/queue.rs
  • sdk-libs/client/src/indexer/types/token.rs
  • sdk-libs/client/src/interface/initialize_config.rs
  • sdk-libs/client/src/interface/instructions.rs
  • sdk-libs/client/src/interface/load_accounts.rs
  • sdk-libs/client/src/interface/mod.rs
  • sdk-libs/client/src/interface/pack.rs
  • sdk-libs/client/src/interface/serialize.rs
  • sdk-libs/client/src/local_test_validator.rs
  • sdk-libs/client/src/rpc/client.rs
  • sdk-libs/client/src/utils.rs
  • sdk-libs/macros/src/light_pdas/seeds/extract.rs
  • sdk-libs/program-test/src/indexer/test_indexer.rs
  • sdk-libs/program-test/src/program_test/compressible_setup.rs
  • sdk-libs/program-test/src/program_test/light_program_test.rs
📝 Walkthrough

Walkthrough

This PR introduces comprehensive error handling improvements, async concurrency refactoring, and type system enhancements across the forester, prover, and SDK libraries. Key changes include converting panic-prone unwraps to Result-based error propagation, refactoring EpochManager to use Arc for shared task ownership, introducing const generic HEIGHT parameters for proof handling, adding new worker orchestration, and stabilizing proof serialization APIs.

Changes

Cohort / File(s) Summary
Error Handling & Task Management
prover/client/src/errors.rs, prover/client/src/prover.rs
Added 6 new ProverClientError variants (IntegerConversion, JsonSerialization, ProcessStart, ProcessWait, ProjectRootNotFound, HealthCheckFailed). Refactored spawn_prover to return Result with health checks, project root detection, and loading guard logic instead of panicking.
Proof Serialization & Validation
prover/client/src/helpers.rs, prover/client/src/proof.rs, prover/client/src/proof_types/batch_*/json.rs, prover/client/src/proof_types/batch_*/proof_inputs.rs
Converted proof serialization methods to return Result<String, ProverClientError>. Updated functions like bigint_to_u8_32, compute_root_from_merkle_proof, compress_proof, proof_from_json_struct to propagate errors via ? instead of unwrapping. Introduced CompressedProofParts and UncompressedProofParts type aliases.
Proof Type Updates
prover/client/src/proof_types/{inclusion,non_inclusion,combined}/*/json.rs, prover/client/src/proof_types/{inclusion,non_inclusion,combined}/*/proof_inputs.rs
Systematically updated all proof type methods (to_string, from__inputs, public_inputs) to return Results and handle integer conversions gracefully with ProverClientError::IntegerConversion rather than panicking. Added error propagation for JSON serialization failures.
Proof Client & Worker
prover/client/src/proof_client.rs, prover/client/src/proof_worker.rs
Updated poll_proof_completion to take &str instead of String. Changed ProofInput::to_json to return Result. Added fallible proof deserialization and compression with proper error mapping in submit_and_poll_proof paths.
Batch Address Append Processing
prover/client/src/proof_types/batch_address_append/proof_inputs.rs, forester/src/processor/v2/helpers.rs, forester/src/processor/v2/strategy/address.rs
Introduced AddressBatchSnapshot struct. Refactored get_batch_data to get_batch_snapshot with proper hashchain derivation. Changed batch processing to use streaming snapshots and staging tree. Updated function signatures to accept slices instead of owned vectors and use const generics for proof heights.
EpochManager Concurrency Refactoring
forester/src/epoch_manager.rs
Major refactoring: converted core async methods to use Arc receiver for shared task ownership. Introduced NewTreeWorker for background task orchestration with timeout-based shutdown. Reworked epoch processing loop with FuturesUnordered for concurrent task handling. Added panic payload capture and propagation via panic_payload_message. Updated maybe_refinalize to return updated state (ForesterEpochPda, TreeForesterSchedule) instead of mutating in place.
Config & Status Parsing
forester/src/config.rs, forester/src/forester_status.rs, forester/src/queue_helpers.rs
Replaced tree_id parsing unwraps with Result collection and graceful ConfigError mapping. Removed unsafe blocks in parse_tree_status, replacing with safe parse_hash_set_from_bytes. Updated queue deserialization to use safe parsing instead of manual unsafe extraction.
HTTP & Rate Limiting
forester/src/main.rs, forester/src/priority_fee.rs, forester/src/metrics.rs
Added proper error handling for RateLimiter construction with transpose() and ForesterError mapping. Changed HTTP POST to use url.as_str() instead of cloning. Eliminated buffer cloning in metrics_handler by moving buffers into from_utf8.
Forester Tests
forester/tests/e2e_test.rs, forester/tests/legacy/batched_state_async_indexer_test.rs, forester/tests/legacy/test_utils.rs, forester/tests/test_batch_append_spent.rs, forester/tests/test_compressible_pda.rs, forester/tests/test_indexer_interface.rs, forester/tests/test_utils.rs
Added unwraps on spawn_prover results for explicit error propagation. Changed pipeline execution from direct tokio::spawn to spawn_blocking with local runtime for thread safety. Added error assertions on validator spawning.
Prover Tests
prover/client/tests/*.rs
Updated all test files (batch_address_append, batch_append, batch_update, combined, inclusion, non_inclusion, init_merkle_tree) to unwrap Results from spawn_prover and JSON string conversions, converting silent failures to explicit panics on error.
Indexer & Queue Proof Reconstruction
sdk-libs/client/src/indexer/types/queue.rs
Major refactoring: added const generic HEIGHT support for proof reconstruction. Introduced reconstruct_proofs for batch reconstruction. Added build_node_lookup for optimized proof indexing. Added reconstruct_all_proofs and reconstruct_proof_with_lookup. Changed encode_node_index parameter from u8 to usize.
Indexer & Pack Error Handling
sdk-libs/client/src/indexer/types/proof.rs, sdk-libs/client/src/interface/pack.rs, sdk-libs/program-test/src/indexer/test_indexer.rs
Updated pack_tree_infos to return Result<PackedTreeInfos, IndexerError>. Added PackError::Indexer variant with IndexerError conversion. Added comprehensive error mapping in test_indexer for all proof construction paths to use CustomError propagation.
Load Accounts & PDA Handling
sdk-libs/client/src/interface/load_accounts.rs
Introduced PDA batching with MAX_PDAS_PER_IX and group_pda_specs. Added MissingPdaCompressedData error variant. Replaced single fetch_proofs with fetch_proof_batches and fetch_individual_proofs for grouped batch processing.
Instruction Building & Serialization
sdk-libs/client/src/interface/initialize_config.rs, sdk-libs/client/src/interface/instructions.rs
Updated instruction builders to return std::io::Result. Replaced direct try_to_vec().expect() with serialize_anchor_data helper for proper error propagation. Added new serialize module with serialize_anchor_data function.
Validator & CLI Utilities
sdk-libs/client/src/local_test_validator.rs, sdk-libs/client/src/utils.rs
Refactored spawn_validator to return io::Result<()> with explicit project root detection and error handling. Replaced panicking expects in command execution with safe .ok()? patterns. Removed unnecessary clones in RPC transaction decoding and token conversions.
Indexer Options & Conversions
sdk-libs/client/src/indexer/options.rs, sdk-libs/client/src/indexer/photon_indexer.rs, sdk-libs/client/src/indexer/error.rs
Changed GetCompressedAccountsFilter Into impl to reference-based From impl. Updated photon_indexer to use merkle_tree_pubkey parameter (no longer unused) and return NotImplemented with formatted error. Fixed CustomError Clone to preserve payload instead of replacing with generic message. Improved proof decoding with explicit Base58DecodeError mapping.
LightProgram Test Integration
sdk-libs/program-test/src/program_test/compressible_setup.rs, sdk-libs/program-test/src/program_test/light_program_test.rs
Updated instruction construction calls to propagate errors with ?. Added error mapping for spawn_prover in both devenv and production branches with CustomError conversion.
Macro & Cleanup
sdk-libs/macros/src/light_pdas/seeds/extract.rs, sdk-libs/client/src/fee.rs
Removed unnecessary clone in token stream collection. Added graceful None-to-RpcError handling for missing payer accounts instead of unwrapping get_account results.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant EpochManager
    participant TaskPool as FuturesUnordered<Tasks>
    participant TreeWorker as NewTreeWorker
    participant ProcessQueue

    Client->>EpochManager: process_queue(Arc<Self>, epoch_info, ...)
    EpochManager->>EpochManager: Create FuturesUnordered for concurrent tasks
    
    loop For each epoch task
        EpochManager->>TaskPool: Spawn async task with Arc clone
        TaskPool->>ProcessQueue: process_light_slot(Arc<Self>, ...)
        ProcessQueue->>ProcessQueue: Execute processing
        ProcessQueue-->>TaskPool: Return Result
    end
    
    par Task Monitoring
        EpochManager->>TaskPool: Poll next completed task
        TaskPool-->>EpochManager: Task result or panic
    and Tree Worker Management
        EpochManager->>TreeWorker: spawn NewTreeWorker for background processing
        TreeWorker->>TreeWorker: Run independently
        EpochManager->>TreeWorker: shutdown with timeout on cleanup
    end
    
    EpochManager-->>Client: Return updated state (ForesterEpochPda, TreeForesterSchedule)
Loading
sequenceDiagram
    participant ProofClient
    participant Serializer as Proof Serializer
    participant Compressor
    participant ProverBackend

    ProofClient->>Serializer: to_json(proof_input)?
    
    alt Serialization Success
        Serializer->>Serializer: create_json_from_struct returns Result
        Serializer-->>ProofClient: Ok(json_string)
    else Serialization Failure
        Serializer-->>ProofClient: Err(ProverClientError::JsonSerialization)
        ProofClient->>ProofClient: Propagate error with ?
    end

    ProofClient->>ProverBackend: submit_and_poll_proof(json)
    
    alt Proof Generation Success
        ProverBackend-->>ProofClient: Ok(proof_json)
        ProofClient->>Compressor: compress_proof(proof_json)?
        Compressor-->>ProofClient: Ok(CompressedProofParts)
    else Error in Any Step
        ProofClient->>ProofClient: Return ProofJobResult with error
        ProofClient-->>ProofClient: Log error and terminate
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The review involves substantial logic changes across multiple systems. Key areas demanding careful attention:

  1. EpochManager Arc refactoring (15 min) - Verify async task spawning, Arc cloning, and shared ownership doesn't introduce race conditions or lifetime issues
  2. Proof reconstruction with const generics (15 min) - Check HEIGHT parameter threading, array sizing, and batch reconstruction correctness
  3. Error propagation paths (15 min) - Validate Result chains don't mask underlying errors and error variants are correctly mapped
  4. Task orchestration & panic handling (10 min) - Ensure NewTreeWorker shutdown completes properly and panic payloads capture useful context
  5. Indexer queue changes (5 min) - Verify proof lookup optimization doesn't break existing correctness

Possibly related PRs

Suggested reviewers

  • SwenSchaeferjohann
  • ananas-block

🔄 From Chaos, Order Emerges

Arc and Result hand in hand,
Panics banished—error handling stands,
Proofs now sung in generic HEIGHT,
Where Tasks dance concurrent in moonlight ✨
Error chains guide the weary knight.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sergey/forester-concurrency

@sergeytimoshin sergeytimoshin force-pushed the sergey/forester-concurrency branch 4 times, most recently from dc3b8b4 to fe08bcd Compare March 14, 2026 19:12
@sergeytimoshin sergeytimoshin force-pushed the sergey/forester-concurrency branch from fe08bcd to ef9d7fe Compare March 14, 2026 19:21
@sergeytimoshin sergeytimoshin force-pushed the sergey/forester-concurrency branch 3 times, most recently from aaaa67c to fbf48e5 Compare March 14, 2026 21:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)
prover/client/src/helpers.rs (1)

38-48: ⚠️ Potential issue | 🟠 Major

Reject negative values in bigint_to_u8_32.

to_bytes_be() returns (sign, magnitude). Right now -1 serializes as 0x…01 instead of failing, which can turn invalid proof inputs into a different root/hash without any error.

Suggested fix
 pub fn bigint_to_u8_32(n: &BigInt) -> Result<[u8; 32], ProverClientError> {
-    let (_, bytes_be) = n.to_bytes_be();
+    let (sign, bytes_be) = n.to_bytes_be();
+    if sign == num_bigint::Sign::Minus {
+        return Err(ProverClientError::InvalidProofData(
+            "Negative values are not valid 32-byte field elements".to_string(),
+        ));
+    }
     if bytes_be.len() > 32 {
         return Err(ProverClientError::InvalidProofData(
             "Number too large to fit in [u8; 32]".to_string(),
         ));
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/helpers.rs` around lines 38 - 48, The function
bigint_to_u8_32 currently ignores the sign from BigInt::to_bytes_be and
therefore accepts negative values; update bigint_to_u8_32 to inspect the sign
returned by to_bytes_be and return Err(ProverClientError::InvalidProofData(...))
when the sign is negative (i.e., not Sign::Plus), leaving the rest of the logic
unchanged for non-negative values; reference the bigint_to_u8_32 function and
the ProverClientError::InvalidProofData variant when implementing this check.
forester/tests/legacy/batched_state_async_indexer_test.rs (1)

79-90: ⚠️ Potential issue | 🟠 Major

Remove the second prover startup.

init() already starts the local validator with enable_prover: true. Spawning another prover here can race for port 3001 and fail the test before the worker-concurrency path is exercised.

Simplest fix
-    spawn_prover().await.unwrap();
Based on learnings The light test validator must spawn three background processes: solana test validator on port 8899, prover server on port 3001, and photon indexer on port 8784.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/tests/legacy/batched_state_async_indexer_test.rs` around lines 79 -
90, The test starts a prover twice causing a port race: remove the explicit
spawn_prover().await.unwrap() call so the prover started by
init(Some(LightValidatorConfig { enable_prover: true, ... })) is the only
prover; ensure init() is called with enable_indexer: true and the
LightValidatorConfig remains as shown so the test validator still spawns the
Solana validator, prover (port 3001) and photon indexer (port 8784) in the
background.
prover/client/src/proof_client.rs (1)

671-677: ⚠️ Potential issue | 🔴 Critical

Serialize low_element_next_indices in the address-append payload.

The BatchAddressAppendInputs struct carries low_element_next_indices, but the JSON serializer at prover/client/src/proof_types/batch_address_append/json.rs omits this field entirely. The prover circuit needs both the values and indices of boundary elements to validate indexed Merkle tree insertions (verifying that new addresses fit between low_value and low_next_value at the correct tree positions). Dropping the indices breaks this validation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_client.rs` around lines 671 - 677, The JSON
serializer for BatchAddressAppendInputs is omitting the low_element_next_indices
field, breaking prover validation; update the batch_address_append JSON
serialization (the module that produces the payload consumed by
generate_batch_address_append_proof) to include low_element_next_indices
alongside low_element_values/low_next_values, serialize it with the exact
key/name the prover circuit expects, and ensure the values are converted to JSON
(e.g., a numeric array of indices) so to_json(&inputs) used in
generate_batch_address_append_proof contains both the indices and values
required by the circuit.
forester/src/forester_status.rs (1)

674-685: ⚠️ Potential issue | 🟠 Major

Don’t silently mask queue parse failures as “empty queue.”

At Line 674 and Line 729, deserialization errors are collapsed into (0, 0). That makes malformed/unreadable queue data indistinguishable from truly empty queues and can produce misleading status/metrics output.

Please either propagate the parse error (preferred) or at least emit a warning with tree/queue identifiers before fallback.

Also applies to: 729-740

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/forester_status.rs` around lines 674 - 685, The code currently
swallows deserialization errors by calling
parse_hash_set_from_bytes::<QueueAccount>(&acc.data).ok().map(...).unwrap_or((0,0)),
which makes malformed queue data indistinguishable from an empty queue; replace
these silent fallbacks (the two occurrences using
parse_hash_set_from_bytes::<QueueAccount>) with proper error handling: ideally
change the enclosing function to return Result and propagate the parse error
(use parse_hash_set_from_bytes::<QueueAccount>(&acc.data)? and compute (len,
cap) on Ok), otherwise at minimum catch Err(e) and emit a warning including
identifying info (e.g., the tree/queue identifiers available in scope such as
acc.pubkey or tree_id) before returning the fallback (0,0); apply the same fix
at both locations (the parse_hash_set_from_bytes::<QueueAccount> usages around
the shown block and the 729–740 block).
prover/client/src/proof_types/batch_append/proof_inputs.rs (1)

115-130: ⚠️ Potential issue | 🟠 Major

Reject length mismatches before the zipped loop.

zip() stops at the shortest input. If old_leaves, leaves, or merkle_proofs is shorter than batch_size, this still returns batch_size unchanged but computes the root from only a partial batch. That will generate inconsistent public inputs and very opaque prover failures.

Suggested guard
 ) -> Result<(BatchAppendsCircuitInputs, Vec<ChangelogEntry<HEIGHT>>), ProverClientError> {
+    let expected_len = batch_size as usize;
+    if leaves.len() != expected_len
+        || old_leaves.len() != expected_len
+        || merkle_proofs.len() != expected_len
+    {
+        return Err(ProverClientError::GenericError(format!(
+            "batch append input length mismatch: leaves={}, old_leaves={}, merkle_proofs={}, expected={}",
+            leaves.len(),
+            old_leaves.len(),
+            merkle_proofs.len(),
+            expected_len,
+        )));
+    }
+
     let mut new_root = [0u8; 32];

Also applies to: 135-200

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_append/proof_inputs.rs` around lines 115
- 130, The function get_batch_append_inputs currently zips old_leaves, leaves,
and merkle_proofs and can silently process a partial batch if any of those
vectors are shorter than batch_size; add explicit length checks before the
zipped loop: verify old_leaves.len() == batch_size as usize, leaves.len() ==
batch_size as usize, and merkle_proofs.len() == batch_size as usize (or
otherwise reject mismatches), and return an appropriate ProverClientError when
any mismatch is detected so the function fails fast and never computes
roots/inputs from a partial batch.
sdk-libs/client/src/interface/initialize_config.rs (1)

20-27: ⚠️ Potential issue | 🔴 Critical

Consolidate to shared InitializeLightConfigParams payload and use anchor_lang::Discriminator.

The InitializeRentFreeConfig::build() method will produce an incompatible instruction payload. It serializes InitializeCompressionConfigAnchorData (which lacks config_bump), but the sibling initialize_config() function in instructions.rs correctly uses InitializeLightConfigParams with config_bump included. Both target the same on-chain instruction, so they must produce identical wire formats—presently they do not.

Additionally, line 122 hardcodes the discriminator bytes. Replace both issues by:

  1. Reusing InitializeLightConfigParams here instead of the local struct
  2. Using anchor_lang::Discriminator trait to derive the discriminator (as required by coding guidelines for sdk-libs/**/src/**/*.rs) rather than hardcoding bytes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/interface/initialize_config.rs` around lines 20 - 27, The
build path is serializing the wrong payload and hardcodes the discriminator:
replace the local struct InitializeCompressionConfigAnchorData with the shared
InitializeLightConfigParams so InitializeRentFreeConfig::build() emits the same
wire format as the existing initialize_config() function, and remove the manual
discriminator bytes in favor of deriving and using anchor_lang::Discriminator
(call the trait implementation to obtain the discriminator) rather than
hardcoding values; update any serialization calls in
InitializeRentFreeConfig::build() to use InitializeLightConfigParams and the
Discriminator API so both instruction creators produce identical payloads.
sdk-libs/program-test/src/indexer/test_indexer.rs (1)

2502-2535: ⚠️ Potential issue | 🟠 Major

Handle prover HTTP failures explicitly instead of spinning or panicking.

This loop only decrements retries on transport errors. A prover that keeps returning 4xx/5xx will spin forever, and a 200 with an unreadable body still hits unwrap() on text() / deserialize_gnark_proof_json(). Please convert both HTTP-status failures and parse failures into IndexerError and consume a retry in those branches.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 2502 - 2535,
The HTTP loop currently only decrements retries on transport errors and uses
unwrap() on response.text() and deserialize_gnark_proof_json(), so server
4xx/5xx and parse failures can spin or panic; update the client.post handling
(the block around client.post(...).send().await and the success branch that
calls response.text(), deserialize_gnark_proof_json, proof_from_json_struct, and
compress_proof) to: treat non-success status codes as IndexerError::CustomError
(include status and body if available), convert any failures from
response.text(), deserialize_gnark_proof_json, proof_from_json_struct, or
compress_proof into IndexerError::CustomError rather than unwrap/panic, and
ensure each of those error branches decrements retries and follows the same
retry/sleep logic so the function eventually returns
Err(IndexerError::CustomError("Failed to get proof from server")) after
exhausting retries; reference functions/values: client.post, response.text(),
deserialize_gnark_proof_json, proof_from_json_struct, compress_proof,
IndexerError, and ValidityProofWithContext.
forester/src/epoch_manager.rs (2)

4775-4785: ⚠️ Potential issue | 🟠 Major

Shutdown path can bypass tracked new-tree worker teardown.

When shutdown wins the tokio::select! in Line 4775, epoch_manager.run() is dropped before reaching Line 680 cleanup, so tracked std::thread workers may outlive service shutdown and continue processing.

💡 Proposed fix
-                        let result = tokio::select! {
-                            result = epoch_manager.run() => result,
+                        let epoch_manager_for_run = epoch_manager.clone();
+                        let result = tokio::select! {
+                            result = epoch_manager_for_run.run() => result,
                             _ = shutdown => {
                                 info!(
                                     event = "shutdown_received",
                                     run_id = %run_id_for_logs,
                                     phase = "service_run",
                                     "Received shutdown signal. Stopping the service."
                                 );
+                                epoch_manager
+                                    .shutdown_new_tree_workers(NEW_TREE_WORKER_SHUTDOWN_TIMEOUT)
+                                    .await;
                                 Ok(())
                             }
                         };

Also applies to: 680-681

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 4775 - 4785, The shutdown branch
in the tokio::select! currently just returns Ok(()) which drops
epoch_manager.run() and skips the cleanup that joins tracked std::thread
workers; instead, on shutdown you must trigger a graceful stop on the
EpochManager and await its run future to complete so its cleanup code (the
tracked-worker teardown at/around epoch_manager.run() and the cleanup near lines
~680) runs: send the same shutdown signal or call the
epoch_manager.shutdown()/stop() method (or set a shutdown flag) and then await
epoch_manager.run() (or call a provided join/await method) before returning
Ok(()) so all tracked std::thread workers are torn down and joined. Ensure you
reference and use the epoch_manager.run() future and any explicit join/teardown
methods on EpochManager to perform the worker joins instead of dropping the
future.

2181-2192: ⚠️ Potential issue | 🔴 Critical

RegistrationTracker claim can get stuck on early error paths.

After try_claim_refinalize() succeeds, Line 2188 and Line 2191 use ?. If either fails, complete_refinalize() is never called, leaving refinalize_in_progress=true and potentially blocking other workers forever in wait_for_refinalize().

💡 Proposed fix
         if registration_tracker.try_claim_refinalize() {
-                // This task sends the finalize_registration tx
-                let ix = create_finalize_registration_instruction(
-                    &self.config.payer_keypair.pubkey(),
-                    &self.config.derivation_pubkey,
-                    epoch_info.epoch,
-                );
-                let priority_fee = self
-                    .resolve_epoch_priority_fee(&*rpc, epoch_info.epoch)
-                    .await?;
-                let current_slot = rpc.get_slot().await?;
-                let Some(confirmation_deadline) = scheduled_confirmation_deadline(
-                    epoch_info.phases.active.end.saturating_sub(current_slot),
-                ) else {
-                    info!(...);
-                    registration_tracker.complete_refinalize(cached_weight);
-                    return Ok((forester_epoch_pda, tree_schedule));
-                };
-                ...
-                match send_smart_transaction(...).await.map_err(RpcError::from) {
-                    Ok(_) => { ... registration_tracker.complete_refinalize(post_finalize_weight); }
-                    Err(e) => {
-                        registration_tracker.complete_refinalize(cached_weight);
-                        return Err(e.into());
-                    }
-                }
+                let mut completion_weight = cached_weight;
+                let claim_result: Result<()> = async {
+                    let ix = create_finalize_registration_instruction(
+                        &self.config.payer_keypair.pubkey(),
+                        &self.config.derivation_pubkey,
+                        epoch_info.epoch,
+                    );
+                    let priority_fee = self
+                        .resolve_epoch_priority_fee(&*rpc, epoch_info.epoch)
+                        .await?;
+                    let current_slot = rpc.get_slot().await?;
+                    let Some(confirmation_deadline) = scheduled_confirmation_deadline(
+                        epoch_info.phases.active.end.saturating_sub(current_slot),
+                    ) else {
+                        info!(...);
+                        return Ok(());
+                    };
+                    match send_smart_transaction(...).await.map_err(RpcError::from) {
+                        Ok(_) => {
+                            completion_weight = match rpc.get_anchor_account::<EpochPda>(&epoch_pda_address).await {
+                                Ok(Some(pda)) => pda.registered_weight,
+                                _ => on_chain_weight,
+                            };
+                            Ok(())
+                        }
+                        Err(e) => Err(e.into()),
+                    }
+                }.await;
+
+                registration_tracker.complete_refinalize(completion_weight);
+                claim_result?;
             } else {
                 registration_tracker.wait_for_refinalize().await;
             }

Also applies to: 2246-2250

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 2181 - 2192, After a successful
RegistrationTracker::try_claim_refinalize() return ensure
RegistrationTracker::complete_refinalize() is always called even if subsequent
calls like resolve_epoch_priority_fee(), rpc.get_slot(), or
scheduled_confirmation_deadline(...) fail; wrap the post-claim work in a scope
that guarantees cleanup (e.g., a scope guard or an explicit match/let guard) so
complete_refinalize() runs on all early-return/error paths, and apply the same
pattern around the other similar block (lines ~2246-2250) to avoid leaving
refinalize_in_progress = true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@forester/src/config.rs`:
- Around line 341-347: The current use of partition() makes the Err branch
already returned, so the map_err fallback on valid.into_iter().collect(...) is
unreachable and gives a misleading generic error; change the logic in the tree
ID parsing flow (the variables/blocks using partition(), valid, invalid and the
ConfigError::InvalidArguments with field "tree_ids") to either directly
unwrap/collect the already-validated `valid` Vec<Pubkey> (since `invalid` was
handled) or perform a single-pass parse that collects invalid inputs and returns
ConfigError::InvalidArguments with the actual invalid_values; update the code
around the partition() usage to return the real invalid strings instead of the
generic "failed to parse tree_ids".

In `@forester/src/processor/v2/helpers.rs`:
- Around line 495-524: The function currently treats truncated batches as valid
by using actual_end = end.min(available) and proceeding; change the guard so
that if wait_for_batch(end) returns available < end the function returns
Ok(None) (i.e., do not proceed with partial batches). Specifically, in
get_batch_snapshot replace the current if start >= available { return Ok(None);
} / actual_end logic with a check like if available < end { return Ok(None); }
(still keep the start >= available check as needed), so code paths that call
lock_recover, read data.addresses[start..actual_end], or derive
leaves_hash_chain (data.leaves_hash_chains, create_hash_chain_from_slice) never
run for undersized batches.

In `@forester/src/processor/v2/proof_worker.rs`:
- Around line 197-214: The serialization-failure path uses two different
timestamps (submitted_at = Instant::now() here and round_trip_start elsewhere)
and silently ignores closed result_tx; consolidate this by adding a helper
(e.g., send_error_result or report_serialization_failure) used by both the
job.inputs.to_json error branch and the other serialization-before-submit branch
that: captures one Instant::now() into a single variable (round_trip_start or
submitted_at) at the failure site, logs the error with error! including the
error details, constructs the ProofJobResult with that single timestamp and
consistent round_trip_ms/proof_duration_ms values, and attempts to send via
job.result_tx while handling a closed channel by logging the send failure;
update both locations (the error branch around job.inputs.to_json and the other
branch that currently reuses round_trip_start) to call this helper so latency
metrics and logging are consistent.

In `@forester/tests/test_batch_append_spent.rs`:
- Around line 332-346: The spawned closure passed to tokio::task::spawn_blocking
(bound to service_handle) can fail at runtime construction
(tokio::runtime::Builder::build), during
runtime.block_on(run_pipeline::<LightClient>(...)), and when joining the spawned
task; currently all results are ignored—update the test to explicitly handle
each error: unwrap or propagate the runtime build Result (ensure the build error
is surfaced), propagate/unwrap the Result returned by
run_pipeline::<LightClient> so pipeline failures fail the test, and await/join
the JoinHandle and check its JoinError (or unwrap) instead of discarding with
let _ = ..., failing the test on panic or join failure; reference
service_handle, runtime.block_on, run_pipeline::<LightClient>,
shutdown_receiver, shutdown_compressible_receiver, and work_report_sender when
making these changes.

In `@forester/tests/test_compressible_pda.rs`:
- Around line 312-313: Replace the bare .unwrap() calls on builder results with
.expect(...) that includes a descriptive message; specifically change the
.build().unwrap() invocations that call InitializeRentFreeConfig::build() (and
the two other similar setup builder calls in this test) to
.build().expect("InitializeRentFreeConfig::build() failed") or an equivalent
message that names the failing builder so failures in setup are clear. Ensure
each replaced call uses a distinct message if helpful to identify which builder
failed.

In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs`:
- Around line 190-195: The code currently slices
new_element_values[..zkp_batch_size] and then indexes parallel inputs
(low_element_values, low_element_next_values, low_element_indices,
low_element_next_indices, low_element_proofs) by that length, which can panic if
any of those slices are shorter; add explicit guards at the start of the
function that compute required_len = zkp_batch_size and verify each parallel
input's len() >= required_len (for low_element_proofs check len() >=
required_len and each inner-proof has HEIGHT elements) and return a
ProverClientError on failure instead of allowing a panic; reference the
parameter names low_element_values, low_element_next_values,
low_element_indices, low_element_next_indices, low_element_proofs,
new_element_values and zkp_batch_size when adding these checks.

In `@prover/client/src/proof_types/batch_append/json.rs`:
- Around line 76-79: Rename the inherent fallible method `to_string(&self) ->
Result<String, ProverClientError>` to a clearly-fallible name like
`try_to_json_string(&self) -> Result<String, ProverClientError>` (or
`to_json_string` if you prefer), update all call sites that invoke `to_string()`
to call the new name, keep the same return type and body
(`create_json_from_struct(&self)`), and remove the
`#[allow(clippy::inherent_to_string)]` attribute since the method no longer
shadows the conventional `to_string`. Ensure unit tests and any trait impls or
uses are updated to the new identifier.

In `@prover/client/src/proof_types/batch_update/proof_inputs.rs`:
- Around line 115-118: The public_inputs method currently indexes self.0[0],
which will panic on an empty batch and only returns the first entry; instead
iterate over the inner slice (self.0), call public_inputs_arr() on each item,
collect the results into the Vec<[u8; 32]>, and propagate any errors as
ProverClientError; update public_inputs to map/try_collect (or equivalent) over
self.0 so all batch entries are included and empty slices are handled without
panicking (or explicitly return an error if an empty batch is considered
invalid).

In `@prover/client/src/proof_types/non_inclusion/v2/json.rs`:
- Around line 54-62: The JSON serialization currently forces path_index to a u32
by converting input.index_hashed_indexed_element_leaf via to_u32(), which will
fail for trees deeper than 32 bits; update the contract to allow larger indices
by switching path_index to a wider representation (for example: use to_u64() and
a u64 field, or serialize the index as a string/BigInt) and change the
conversion call from to_u32() to the chosen conversion (e.g., to_u64() or
to_string()) and update any consumers of path_index accordingly so indices >
u32::MAX are supported; ensure error handling still returns
ProverClientError::IntegerConversion with an appropriate message if conversion
to the new representation fails.

In `@prover/client/src/proof.rs`:
- Around line 167-170: The negate_g1 function currently deserializes the
incoming G1 point with Validate::No allowing malformed/off-curve points through;
change the call to G1::deserialize_with_mode(g1_le.as_slice(), Compress::No,
Validate::Yes) so the point is validated at deserialization and any error still
maps to ProverClientError::InvalidProofData; this uses the existing
convert_endianness and error mapping, so no further changes to error handling
are needed.

In `@sdk-libs/client/src/indexer/types/proof.rs`:
- Around line 214-223: The loop that computes output tree indices (using
account.tree_info.next_tree_info and account.tree_info.pack_output_tree_index on
each Account) currently keeps the first index in output_tree_index and ignores
mismatches; update PackedStateTreeInfos construction to validate that every call
to pack_output_tree_index returns the same index and return an error if any
subsequent index differs. Specifically, in the code paths that set
output_tree_index (the blocks referencing output_tree_index,
account.tree_info.next_tree_info, and pack_output_tree_index), compare the newly
computed index to the previously stored Some(index) and propagate a descriptive
Err when they diverge instead of silently ignoring the difference; apply the
same validation to the other occurrence noted (around lines 243-248).

In `@sdk-libs/client/src/interface/load_accounts.rs`:
- Around line 120-124: The missing-PDA error loses the original position because
collect_pda_hashes restarts enumeration for each group; update the flow so the
original flat index from cold_pdas is preserved: modify
group_pda_specs/collect_pda_hashes usage so each group carries a base_index (or
have collect_pda_hashes return (hash, original_index) pairs) and use that
original index when constructing MissingPdaCompressed; apply the same change to
the other occurrence referenced (lines ~164-175) so every MissingPdaCompressed
contains the flattened index from cold_pdas rather than the per-group 0-based
index.

In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 127-136: The non-surfpool branch currently spawns the validator
with Command::new(...).spawn() and immediately drops the Child, so failures
during the sleep are ignored; change this to keep the Child (e.g., let mut child
= Command::new(...).spawn()?), await the
tokio::time::sleep(Duration::from_secs(config.wait_time)).await while retaining
the child, then call child.try_wait() (or child.wait() if blocking is
acceptable) and inspect the ExitStatus; if the child has exited with a
non-success status return an io::Error (or map the ExitStatus to Err) so callers
receive an error instead of Ok(()). Ensure you mirror the surfpool branch
pattern (spawn -> keep handle -> wait/try_wait -> check status) and propagate
any non-zero exit as an io::Error.
- Around line 69-115: The current code concatenates command pieces into a single
shell string and spawns it via Command::new("sh").arg("-c").arg(path), which
breaks on spaces/metacharacters (see path construction and uses of
config.enable_indexer, config.limit_ledger_size, config.sbf_programs,
config.upgradeable_programs, config.validator_args and config.use_surfpool).
Change to build a Vec<String> (or Vec<&str>) of argv: start with the executable
built from project_root + "/cli/test_bin/run", then push "test-validator" and
each flag/value as separate .arg(...) entries (one per sbf_program pair and per
upgradeable_program fields, and each validator_arg as its own arg) and call
Command::new(executable).args(&args).spawn() instead of using "sh -c". Also make
the non-surfpool branch behave like the surfpool branch: keep the child handle,
call child.wait(), check the exit Status and propagate or log failures instead
of dropping the child and sleeping unconditionally.

---

Outside diff comments:
In `@forester/src/epoch_manager.rs`:
- Around line 4775-4785: The shutdown branch in the tokio::select! currently
just returns Ok(()) which drops epoch_manager.run() and skips the cleanup that
joins tracked std::thread workers; instead, on shutdown you must trigger a
graceful stop on the EpochManager and await its run future to complete so its
cleanup code (the tracked-worker teardown at/around epoch_manager.run() and the
cleanup near lines ~680) runs: send the same shutdown signal or call the
epoch_manager.shutdown()/stop() method (or set a shutdown flag) and then await
epoch_manager.run() (or call a provided join/await method) before returning
Ok(()) so all tracked std::thread workers are torn down and joined. Ensure you
reference and use the epoch_manager.run() future and any explicit join/teardown
methods on EpochManager to perform the worker joins instead of dropping the
future.
- Around line 2181-2192: After a successful
RegistrationTracker::try_claim_refinalize() return ensure
RegistrationTracker::complete_refinalize() is always called even if subsequent
calls like resolve_epoch_priority_fee(), rpc.get_slot(), or
scheduled_confirmation_deadline(...) fail; wrap the post-claim work in a scope
that guarantees cleanup (e.g., a scope guard or an explicit match/let guard) so
complete_refinalize() runs on all early-return/error paths, and apply the same
pattern around the other similar block (lines ~2246-2250) to avoid leaving
refinalize_in_progress = true.

In `@forester/src/forester_status.rs`:
- Around line 674-685: The code currently swallows deserialization errors by
calling
parse_hash_set_from_bytes::<QueueAccount>(&acc.data).ok().map(...).unwrap_or((0,0)),
which makes malformed queue data indistinguishable from an empty queue; replace
these silent fallbacks (the two occurrences using
parse_hash_set_from_bytes::<QueueAccount>) with proper error handling: ideally
change the enclosing function to return Result and propagate the parse error
(use parse_hash_set_from_bytes::<QueueAccount>(&acc.data)? and compute (len,
cap) on Ok), otherwise at minimum catch Err(e) and emit a warning including
identifying info (e.g., the tree/queue identifiers available in scope such as
acc.pubkey or tree_id) before returning the fallback (0,0); apply the same fix
at both locations (the parse_hash_set_from_bytes::<QueueAccount> usages around
the shown block and the 729–740 block).

In `@forester/tests/legacy/batched_state_async_indexer_test.rs`:
- Around line 79-90: The test starts a prover twice causing a port race: remove
the explicit spawn_prover().await.unwrap() call so the prover started by
init(Some(LightValidatorConfig { enable_prover: true, ... })) is the only
prover; ensure init() is called with enable_indexer: true and the
LightValidatorConfig remains as shown so the test validator still spawns the
Solana validator, prover (port 3001) and photon indexer (port 8784) in the
background.

In `@prover/client/src/helpers.rs`:
- Around line 38-48: The function bigint_to_u8_32 currently ignores the sign
from BigInt::to_bytes_be and therefore accepts negative values; update
bigint_to_u8_32 to inspect the sign returned by to_bytes_be and return
Err(ProverClientError::InvalidProofData(...)) when the sign is negative (i.e.,
not Sign::Plus), leaving the rest of the logic unchanged for non-negative
values; reference the bigint_to_u8_32 function and the
ProverClientError::InvalidProofData variant when implementing this check.

In `@prover/client/src/proof_client.rs`:
- Around line 671-677: The JSON serializer for BatchAddressAppendInputs is
omitting the low_element_next_indices field, breaking prover validation; update
the batch_address_append JSON serialization (the module that produces the
payload consumed by generate_batch_address_append_proof) to include
low_element_next_indices alongside low_element_values/low_next_values, serialize
it with the exact key/name the prover circuit expects, and ensure the values are
converted to JSON (e.g., a numeric array of indices) so to_json(&inputs) used in
generate_batch_address_append_proof contains both the indices and values
required by the circuit.

In `@prover/client/src/proof_types/batch_append/proof_inputs.rs`:
- Around line 115-130: The function get_batch_append_inputs currently zips
old_leaves, leaves, and merkle_proofs and can silently process a partial batch
if any of those vectors are shorter than batch_size; add explicit length checks
before the zipped loop: verify old_leaves.len() == batch_size as usize,
leaves.len() == batch_size as usize, and merkle_proofs.len() == batch_size as
usize (or otherwise reject mismatches), and return an appropriate
ProverClientError when any mismatch is detected so the function fails fast and
never computes roots/inputs from a partial batch.

In `@sdk-libs/client/src/interface/initialize_config.rs`:
- Around line 20-27: The build path is serializing the wrong payload and
hardcodes the discriminator: replace the local struct
InitializeCompressionConfigAnchorData with the shared
InitializeLightConfigParams so InitializeRentFreeConfig::build() emits the same
wire format as the existing initialize_config() function, and remove the manual
discriminator bytes in favor of deriving and using anchor_lang::Discriminator
(call the trait implementation to obtain the discriminator) rather than
hardcoding values; update any serialization calls in
InitializeRentFreeConfig::build() to use InitializeLightConfigParams and the
Discriminator API so both instruction creators produce identical payloads.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 2502-2535: The HTTP loop currently only decrements retries on
transport errors and uses unwrap() on response.text() and
deserialize_gnark_proof_json(), so server 4xx/5xx and parse failures can spin or
panic; update the client.post handling (the block around
client.post(...).send().await and the success branch that calls response.text(),
deserialize_gnark_proof_json, proof_from_json_struct, and compress_proof) to:
treat non-success status codes as IndexerError::CustomError (include status and
body if available), convert any failures from response.text(),
deserialize_gnark_proof_json, proof_from_json_struct, or compress_proof into
IndexerError::CustomError rather than unwrap/panic, and ensure each of those
error branches decrements retries and follows the same retry/sleep logic so the
function eventually returns Err(IndexerError::CustomError("Failed to get proof
from server")) after exhausting retries; reference functions/values:
client.post, response.text(), deserialize_gnark_proof_json,
proof_from_json_struct, compress_proof, IndexerError, and
ValidityProofWithContext.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b8aeca30-1184-4e0f-84c8-f3cb3237bc1f

📥 Commits

Reviewing files that changed from the base of the PR and between 2644529 and fbf48e5.

⛔ Files ignored due to path filters (63)
  • forester-utils/src/account_zero_copy.rs is excluded by none and included by none
  • forester-utils/src/address_merkle_tree_config.rs is excluded by none and included by none
  • forester-utils/src/address_staging_tree.rs is excluded by none and included by none
  • forester-utils/src/forester_epoch.rs is excluded by none and included by none
  • forester-utils/src/instructions/compress_and_close_mint.rs is excluded by none and included by none
  • forester-utils/src/rate_limiter.rs is excluded by none and included by none
  • forester-utils/src/registry.rs is excluded by none and included by none
  • program-tests/account-compression-test/tests/address_merkle_tree_tests.rs is excluded by none and included by none
  • program-tests/account-compression-test/tests/merkle_tree_tests.rs is excluded by none and included by none
  • program-tests/batched-merkle-tree-test/tests/merkle_tree.rs is excluded by none and included by none
  • program-tests/compressed-token-test/tests/freeze/functional.rs is excluded by none and included by none
  • program-tests/compressed-token-test/tests/v1.rs is excluded by none and included by none
  • program-tests/system-cpi-v2-test/tests/event.rs is excluded by none and included by none
  • program-tests/system-cpi-v2-test/tests/invoke_cpi_with_read_only.rs is excluded by none and included by none
  • program-tests/utils/src/account_zero_copy.rs is excluded by none and included by none
  • program-tests/utils/src/actions/legacy/instructions/transfer2.rs is excluded by none and included by none
  • program-tests/utils/src/address_tree_rollover.rs is excluded by none and included by none
  • program-tests/utils/src/assert_compressed_tx.rs is excluded by none and included by none
  • program-tests/utils/src/assert_merkle_tree.rs is excluded by none and included by none
  • program-tests/utils/src/assert_queue.rs is excluded by none and included by none
  • program-tests/utils/src/batched_address_tree.rs is excluded by none and included by none
  • program-tests/utils/src/e2e_test_env.rs is excluded by none and included by none
  • program-tests/utils/src/lib.rs is excluded by none and included by none
  • program-tests/utils/src/mock_batched_forester.rs is excluded by none and included by none
  • program-tests/utils/src/state_tree_rollover.rs is excluded by none and included by none
  • program-tests/utils/src/test_batch_forester.rs is excluded by none and included by none
  • program-tests/utils/src/test_forester.rs is excluded by none and included by none
  • sdk-tests/anchor-manual-test/tests/shared.rs is excluded by none and included by none
  • sdk-tests/anchor-semi-manual-test/tests/shared/mod.rs is excluded by none and included by none
  • sdk-tests/anchor-semi-manual-test/tests/stress_test.rs is excluded by none and included by none
  • sdk-tests/client-test/tests/light_client.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/amm_stress_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/amm_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/basic_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/d10_ata_idempotent_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/d10_token_accounts_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/d11_zero_copy_test.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/failing_tests.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/integration_tests.rs is excluded by none and included by none
  • sdk-tests/csdk-anchor-full-derived-test/tests/shared.rs is excluded by none and included by none
  • sdk-tests/pinocchio-light-program-test/tests/shared/mod.rs is excluded by none and included by none
  • sdk-tests/pinocchio-light-program-test/tests/stress_test.rs is excluded by none and included by none
  • sdk-tests/pinocchio-manual-test/tests/shared.rs is excluded by none and included by none
  • sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/read_only.rs is excluded by none and included by none
  • sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-native-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-pinocchio-v1-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-pinocchio-v2-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/ctoken_pda.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/decompress_full_cpi.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/pda_ctoken.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_4_invocations.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_4_transfer2.rs is excluded by none and included by none
  • sdk-tests/sdk-token-test/tests/test_deposit.rs is excluded by none and included by none
  • sdk-tests/sdk-v1-native-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-account-loader-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-ata-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-mint-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-pda-test/tests/test.rs is excluded by none and included by none
  • sdk-tests/single-token-test/tests/test.rs is excluded by none and included by none
  • sparse-merkle-tree/src/indexed_changelog.rs is excluded by none and included by none
  • sparse-merkle-tree/tests/indexed_changelog.rs is excluded by none and included by none
📒 Files selected for processing (66)
  • forester/src/config.rs
  • forester/src/epoch_manager.rs
  • forester/src/forester_status.rs
  • forester/src/main.rs
  • forester/src/metrics.rs
  • forester/src/priority_fee.rs
  • forester/src/processor/v2/helpers.rs
  • forester/src/processor/v2/proof_worker.rs
  • forester/src/processor/v2/strategy/address.rs
  • forester/src/queue_helpers.rs
  • forester/tests/e2e_test.rs
  • forester/tests/legacy/batched_state_async_indexer_test.rs
  • forester/tests/legacy/test_utils.rs
  • forester/tests/test_batch_append_spent.rs
  • forester/tests/test_compressible_pda.rs
  • forester/tests/test_indexer_interface.rs
  • forester/tests/test_utils.rs
  • prover/client/src/errors.rs
  • prover/client/src/helpers.rs
  • prover/client/src/proof.rs
  • prover/client/src/proof_client.rs
  • prover/client/src/proof_types/batch_address_append/json.rs
  • prover/client/src/proof_types/batch_address_append/proof_inputs.rs
  • prover/client/src/proof_types/batch_append/json.rs
  • prover/client/src/proof_types/batch_append/proof_inputs.rs
  • prover/client/src/proof_types/batch_update/json.rs
  • prover/client/src/proof_types/batch_update/proof_inputs.rs
  • prover/client/src/proof_types/combined/v1/json.rs
  • prover/client/src/proof_types/combined/v2/json.rs
  • prover/client/src/proof_types/combined/v2/proof_inputs.rs
  • prover/client/src/proof_types/inclusion/v1/json.rs
  • prover/client/src/proof_types/inclusion/v1/proof_inputs.rs
  • prover/client/src/proof_types/inclusion/v2/json.rs
  • prover/client/src/proof_types/inclusion/v2/proof_inputs.rs
  • prover/client/src/proof_types/non_inclusion/v1/json.rs
  • prover/client/src/proof_types/non_inclusion/v1/proof_inputs.rs
  • prover/client/src/proof_types/non_inclusion/v2/json.rs
  • prover/client/src/proof_types/non_inclusion/v2/proof_inputs.rs
  • prover/client/src/prover.rs
  • prover/client/tests/batch_address_append.rs
  • prover/client/tests/batch_append.rs
  • prover/client/tests/batch_update.rs
  • prover/client/tests/combined.rs
  • prover/client/tests/inclusion.rs
  • prover/client/tests/init_merkle_tree.rs
  • prover/client/tests/non_inclusion.rs
  • sdk-libs/client/src/fee.rs
  • sdk-libs/client/src/indexer/error.rs
  • sdk-libs/client/src/indexer/options.rs
  • sdk-libs/client/src/indexer/photon_indexer.rs
  • sdk-libs/client/src/indexer/types/proof.rs
  • sdk-libs/client/src/indexer/types/queue.rs
  • sdk-libs/client/src/indexer/types/token.rs
  • sdk-libs/client/src/interface/initialize_config.rs
  • sdk-libs/client/src/interface/instructions.rs
  • sdk-libs/client/src/interface/load_accounts.rs
  • sdk-libs/client/src/interface/mod.rs
  • sdk-libs/client/src/interface/pack.rs
  • sdk-libs/client/src/interface/serialize.rs
  • sdk-libs/client/src/local_test_validator.rs
  • sdk-libs/client/src/rpc/client.rs
  • sdk-libs/client/src/utils.rs
  • sdk-libs/macros/src/light_pdas/seeds/extract.rs
  • sdk-libs/program-test/src/indexer/test_indexer.rs
  • sdk-libs/program-test/src/program_test/compressible_setup.rs
  • sdk-libs/program-test/src/program_test/light_program_test.rs

Comment on lines +341 to +347
valid
.into_iter()
.collect::<std::result::Result<Vec<_>, _>>()
.map_err(|_| ConfigError::InvalidArguments {
field: "tree_ids",
invalid_values: vec!["failed to parse tree_ids".to_string()],
})?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

This fallback parse error is unreachable after the partition() above.

valid can only contain Ok(Pubkey) values once the invalid branch has returned, so the generic "failed to parse tree_ids" path never executes. I'd either unwrap the partitioned values directly or collapse the parse into a single pass so the error surface stays aligned with the real invalid inputs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/config.rs` around lines 341 - 347, The current use of
partition() makes the Err branch already returned, so the map_err fallback on
valid.into_iter().collect(...) is unreachable and gives a misleading generic
error; change the logic in the tree ID parsing flow (the variables/blocks using
partition(), valid, invalid and the ConfigError::InvalidArguments with field
"tree_ids") to either directly unwrap/collect the already-validated `valid`
Vec<Pubkey> (since `invalid` was handled) or perform a single-pass parse that
collects invalid inputs and returns ConfigError::InvalidArguments with the
actual invalid_values; update the code around the partition() usage to return
the real invalid strings instead of the generic "failed to parse tree_ids".

Comment on lines 495 to +524
let available = self.wait_for_batch(end);
if start >= available {
return None;
return Ok(None);
}
let actual_end = end.min(available);
let data = lock_recover(&self.data, "streaming_address_queue.data");
Some(BatchDataSlice {
addresses: data.addresses[start..actual_end].to_vec(),

let addresses = data.addresses[start..actual_end].to_vec();
if addresses.is_empty() {
return Err(anyhow!("Empty batch at start={}", start));
}

let leaves_hashchain = match data.leaves_hash_chains.get(hashchain_idx).copied() {
Some(hashchain) => hashchain,
None => {
tracing::debug!(
"Missing leaves_hash_chain for batch {} (available: {}), deriving from addresses",
hashchain_idx,
data.leaves_hash_chains.len()
);
create_hash_chain_from_slice(&addresses).map_err(|error| {
anyhow!(
"Failed to derive leaves_hash_chain for batch {} from {} addresses: {}",
hashchain_idx,
addresses.len(),
error
)
})?
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

get_batch_snapshot should not return a partial batch as a successful snapshot.

When wait_for_batch(end) returns with available < end, this code truncates to actual_end and still returns Some(...). Downstream callers treat that as a real ZKP batch and can derive a hashchain/proof inputs for an undersized batch instead of waiting or cleanly skipping it.

Suggested guard
-        if start >= available {
+        if available < end || start >= available {
             return Ok(None);
         }
-        let actual_end = end.min(available);
+        let actual_end = end;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/processor/v2/helpers.rs` around lines 495 - 524, The function
currently treats truncated batches as valid by using actual_end =
end.min(available) and proceeding; change the guard so that if
wait_for_batch(end) returns available < end the function returns Ok(None) (i.e.,
do not proceed with partial batches). Specifically, in get_batch_snapshot
replace the current if start >= available { return Ok(None); } / actual_end
logic with a check like if available < end { return Ok(None); } (still keep the
start >= available check as needed), so code paths that call lock_recover, read
data.addresses[start..actual_end], or derive leaves_hash_chain
(data.leaves_hash_chains, create_hash_chain_from_slice) never run for undersized
batches.

Comment on lines +197 to +214
let inputs_json = match job.inputs.to_json(&job.tree_id, job.seq) {
Ok(inputs_json) => inputs_json,
Err(error) => {
let _ = job
.result_tx
.send(ProofJobResult {
seq: job.seq,
result: Err(format!("Failed to serialize proof inputs: {}", error)),
old_root: [0u8; 32],
new_root: [0u8; 32],
proof_duration_ms: 0,
round_trip_ms: 0,
submitted_at: std::time::Instant::now(),
})
.await;
return;
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use one timestamp/logging path for serialization failures.

Line 209 stamps submitted_at with Instant::now(), while Line 306 reuses round_trip_start for the same “serialization failed before submit” outcome. That makes latency metrics path-dependent, and these branches also go silent if result_tx is closed. Factor this into one helper with a single timestamp convention and an error! before the send.

Suggested direction
+async fn send_serialization_failure(
+    result_tx: &mpsc::Sender<ProofJobResult>,
+    seq: u64,
+    error: &ProverClientError,
+    submitted_at: std::time::Instant,
+    round_trip_ms: u64,
+) {
+    error!("Failed to serialize proof inputs for seq={}: {}", seq, error);
+    let _ = result_tx
+        .send(ProofJobResult {
+            seq,
+            result: Err(format!("Failed to serialize proof inputs: {}", error)),
+            old_root: [0u8; 32],
+            new_root: [0u8; 32],
+            proof_duration_ms: 0,
+            round_trip_ms,
+            submitted_at,
+        })
+        .await;
+}

Also applies to: 296-311

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/processor/v2/proof_worker.rs` around lines 197 - 214, The
serialization-failure path uses two different timestamps (submitted_at =
Instant::now() here and round_trip_start elsewhere) and silently ignores closed
result_tx; consolidate this by adding a helper (e.g., send_error_result or
report_serialization_failure) used by both the job.inputs.to_json error branch
and the other serialization-before-submit branch that: captures one
Instant::now() into a single variable (round_trip_start or submitted_at) at the
failure site, logs the error with error! including the error details, constructs
the ProofJobResult with that single timestamp and consistent
round_trip_ms/proof_duration_ms values, and attempts to send via job.result_tx
while handling a closed channel by logging the send failure; update both
locations (the error branch around job.inputs.to_json and the other branch that
currently reuses round_trip_start) to call this helper so latency metrics and
logging are consistent.

Comment on lines +332 to +346
let service_handle = tokio::task::spawn_blocking(move || {
let runtime = tokio::runtime::Builder::new_multi_thread()
.worker_threads(2)
.enable_all()
.build()?;
runtime.block_on(run_pipeline::<LightClient>(
config,
None,
None,
shutdown_receiver,
Some(shutdown_compressible_receiver),
None, // shutdown_bootstrap
work_report_sender,
))
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the task result is currently ignored and where it happens.
rg -n "spawn_blocking|timeout\\(Duration::from_secs\\(5\\), service_handle\\)\\.await" forester/tests/test_batch_append_spent.rs

Repository: Lightprotocol/light-protocol

Length of output: 209


🏁 Script executed:

sed -n '330,355p' forester/tests/test_batch_append_spent.rs | cat -n

Repository: Lightprotocol/light-protocol

Length of output: 1079


🏁 Script executed:

rg -B2 "fn run_pipeline" forester/src/lib.rs | head -20

Repository: Lightprotocol/light-protocol

Length of output: 217


🏁 Script executed:

rg -A8 "pub async fn run_pipeline<R: Rpc" forester/src/lib.rs | head -30

Repository: Lightprotocol/light-protocol

Length of output: 480


Surface pipeline task failures instead of dropping them.

This closure can fail at three levels—runtime construction (line 334-337), run_pipeline execution, and task joining—but the caller discards the join result with let _ = ... at line 352. This silently hides startup and runtime failures, producing false-positive test passes.

The timeout wrapping a JoinHandle<Result<()>> creates nested error layers that all need explicit handling. Currently all are ignored.

💡 Suggested fix
-    let _ = timeout(Duration::from_secs(5), service_handle).await;
+    match timeout(Duration::from_secs(5), service_handle).await {
+        Ok(Ok(Ok(()))) => {}
+        Ok(Ok(Err(e))) => panic!("forester pipeline returned error: {e:#}"),
+        Ok(Err(e)) => panic!("forester task join error: {e}"),
+        Err(_) => panic!("timed out waiting for forester shutdown"),
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/tests/test_batch_append_spent.rs` around lines 332 - 346, The
spawned closure passed to tokio::task::spawn_blocking (bound to service_handle)
can fail at runtime construction (tokio::runtime::Builder::build), during
runtime.block_on(run_pipeline::<LightClient>(...)), and when joining the spawned
task; currently all results are ignored—update the test to explicitly handle
each error: unwrap or propagate the runtime build Result (ensure the build error
is surfaced), propagate/unwrap the Result returned by
run_pipeline::<LightClient> so pipeline failures fail the test, and await/join
the JoinHandle and check its JoinError (or unwrap) instead of discarding with
let _ = ..., failing the test on panic or join failure; reference
service_handle, runtime.block_on, run_pipeline::<LightClient>,
shutdown_receiver, shutdown_compressible_receiver, and work_report_sender when
making these changes.

Comment on lines +312 to +313
.build()
.unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Prefer expect(...) over bare unwrap() in these setup builders.

These are high-churn setup steps, and the rest of the file already uses descriptive expects. Keeping the message here makes it much faster to tell whether InitializeRentFreeConfig::build() failed or the later transaction path broke.

Minimal tweak
-    .build()
-    .unwrap();
+    .build()
+    .expect("InitializeRentFreeConfig::build should succeed");

Also applies to: 508-509, 750-751

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/tests/test_compressible_pda.rs` around lines 312 - 313, Replace the
bare .unwrap() calls on builder results with .expect(...) that includes a
descriptive message; specifically change the .build().unwrap() invocations that
call InitializeRentFreeConfig::build() (and the two other similar setup builder
calls in this test) to .build().expect("InitializeRentFreeConfig::build()
failed") or an equivalent message that names the failing builder so failures in
setup are clear. Ensure each replaced call uses a distinct message if helpful to
identify which builder failed.

Comment on lines +35 to +70
if health_check(10, 1).await {
return Ok(());
}

let _ = command.wait_with_output();
let loading_guard = IS_LOADING
.compare_exchange(false, true, Ordering::AcqRel, Ordering::Acquire)
.is_ok();

let health_result = health_check(120, 1).await;
if health_result {
info!("Prover started successfully");
} else {
panic!("Failed to start prover, health check failed.");
}
if !loading_guard {
return if health_check(120, 1).await {
Ok(())
} else {
Err(ProverClientError::HealthCheckFailed)
};
}

let spawn_result = async {
let command = Command::new(&prover_path)
.arg("start-prover")
.spawn()
.map_err(|error| ProverClientError::ProcessStart(error.to_string()))?;

command
.wait_with_output()
.map_err(|error| ProverClientError::ProcessWait(error.to_string()))?;

if health_check(120, 1).await {
info!("Prover started successfully");
Ok(())
} else {
Err(ProverClientError::HealthCheckFailed)
}
} else {
panic!("Failed to find project root.");
};
}
.await;

IS_LOADING.store(false, Ordering::Release);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== spawn_prover / health_check blocking calls =="
rg -n --glob '*.rs' 'spawn_prover\s*\(|async fn health_check|wait_with_output|thread::sleep|std::thread::sleep'

echo
echo "== start-prover command implementation =="
rg -n --glob '*.rs' 'start-prover|start_prover'

Repository: Lightprotocol/light-protocol

Length of output: 4314


🏁 Script executed:

cat -n prover/client/src/prover.rs | head -100

Repository: Lightprotocol/light-protocol

Length of output: 3115


🏁 Script executed:

find . -name "*.rs" -type f | xargs grep -l "start-prover" | grep -v test | grep -v target

Repository: Lightprotocol/light-protocol

Length of output: 101


🏁 Script executed:

find . -type f -name "*.rs" | xargs grep -l "start_prover\|start-prover" | head -20

Repository: Lightprotocol/light-protocol

Length of output: 151


🏁 Script executed:

find . -name "cli" -type d | head -5

Repository: Lightprotocol/light-protocol

Length of output: 77


🏁 Script executed:

find ./cli -type f -name "*.rs" | head -20

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

ls -la ./cli/

Repository: Lightprotocol/light-protocol

Length of output: 1226


🏁 Script executed:

ls -la ./cli/test_bin/

Repository: Lightprotocol/light-protocol

Length of output: 284


🏁 Script executed:

cat ./cli/test_bin/run

Repository: Lightprotocol/light-protocol

Length of output: 177


🏁 Script executed:

find ./cli/src -type f \( -name "*.ts" -o -name "*.js" \) | xargs grep -l "start-prover\|start_prover"

Repository: Lightprotocol/light-protocol

Length of output: 56


🏁 Script executed:

find ./cli/src -type f -name "*.ts" | head -20

Repository: Lightprotocol/light-protocol

Length of output: 841


🏁 Script executed:

cat ./cli/src/commands/start-prover/index.ts

Repository: Lightprotocol/light-protocol

Length of output: 1362


🏁 Script executed:

cat ./cli/src/utils/processProverServer.ts

Repository: Lightprotocol/light-protocol

Length of output: 4457


The async startup path blocks the Tokio executor with synchronous I/O; this will deadlock if the prover daemonizes.

The health_check() function blocks on std::thread::sleep() (line 89) within an async context, and more critically, wait_with_output() (line 58) blocks the entire async block waiting for the spawned process to exit. If start-prover is designed to daemonize (as the TypeScript CLI suggests), this call never returns, IS_LOADING never clears, and subsequent calls retry for 120 seconds each while blocking a Tokio worker thread.

Three specific problems:

  1. Blocking sleep in async: Each retry loop in health_check() sleeps synchronously, wasting a worker thread
  2. Child process wait blocks executor: wait_with_output() blocks indefinitely if the prover detaches
  3. State machine wedge: If the child blocks indefinitely, line 70 never executes, leaving IS_LOADING=true permanently; any concurrent caller then runs health_check(120, 1), sleeping up to 120 seconds in the executor

Replace std::thread::sleep with tokio::time::sleep, and either spawn the prover as a detached child (if it daemonizes) or refactor to poll readiness without blocking the executor. Note: the code passes start-prover as the subcommand, but the TypeScript CLI expects start—verify which is correct.

Comment on lines 214 to 223
if let Some(next) = account.tree_info.next_tree_info {
// SAFETY: account will always have a state Merkle tree context.
// pack_output_tree_index only panics on an address Merkle tree context.
let index = next.pack_output_tree_index(packed_accounts).unwrap();
let index = next.pack_output_tree_index(packed_accounts)?;
if output_tree_index.is_none() {
output_tree_index = Some(index);
}
} else {
// SAFETY: account will always have a state Merkle tree context.
// pack_output_tree_index only panics on an address Merkle tree context.
let index = account
.tree_info
.pack_output_tree_index(packed_accounts)
.unwrap();
let index = account.tree_info.pack_output_tree_index(packed_accounts)?;
if output_tree_index.is_none() {
output_tree_index = Some(index);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate that every state input resolves to the same output tree.

PackedStateTreeInfos only carries one output_tree_index, but this loop keeps the first computed index and silently ignores different indices from later accounts. A proof that mixes accounts whose current/next output trees diverge will therefore pack the wrong output tree without error.

Also applies to: 243-248

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/indexer/types/proof.rs` around lines 214 - 223, The loop
that computes output tree indices (using account.tree_info.next_tree_info and
account.tree_info.pack_output_tree_index on each Account) currently keeps the
first index in output_tree_index and ignores mismatches; update
PackedStateTreeInfos construction to validate that every call to
pack_output_tree_index returns the same index and return an error if any
subsequent index differs. Specifically, in the code paths that set
output_tree_index (the blocks referencing output_tree_index,
account.tree_info.next_tree_info, and pack_output_tree_index), compare the newly
computed index to the previously stored Some(index) and propagate a descriptive
Err when they diverge instead of silently ignoring the difference; apply the
same validation to the other occurrence noted (around lines 243-248).

Comment on lines +120 to +124
let pda_groups = group_pda_specs(&cold_pdas, MAX_PDAS_PER_IX);
let pda_hashes = pda_groups
.iter()
.map(|group| collect_pda_hashes(group))
.collect::<Result<Vec<_>, _>>()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Preserve the original PDA index in MissingPdaCompressed.

After group_pda_specs, each collect_pda_hashes(group) starts enumerating from zero again, so the 9th/17th/etc. cold PDA now reports index 0 instead of its position in cold_pdas. Carry the flattened index through grouping so this error stays actionable once there is more than one PDA batch.

Also applies to: 164-175

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/interface/load_accounts.rs` around lines 120 - 124, The
missing-PDA error loses the original position because collect_pda_hashes
restarts enumeration for each group; update the flow so the original flat index
from cold_pdas is preserved: modify group_pda_specs/collect_pda_hashes usage so
each group carries a base_index (or have collect_pda_hashes return (hash,
original_index) pairs) and use that original index when constructing
MissingPdaCompressed; apply the same change to the other occurrence referenced
(lines ~164-175) so every MissingPdaCompressed contains the flattened index from
cold_pdas rather than the per-group 0-based index.

Comment on lines +69 to +115
let path = "cli/test_bin/run test-validator";
let mut path = format!("{}/{}", project_root.trim(), path);
if !config.enable_indexer {
path.push_str(" --skip-indexer");
}

for sbf_program in config.sbf_programs.iter() {
path.push_str(&format!(
" --sbf-program {} {}",
sbf_program.0, sbf_program.1
));
}
if let Some(limit_ledger_size) = config.limit_ledger_size {
path.push_str(&format!(" --limit-ledger-size {}", limit_ledger_size));
}

for upgradeable_program in config.upgradeable_programs.iter() {
path.push_str(&format!(
" --upgradeable-program {} {} {}",
upgradeable_program.program_id,
upgradeable_program.program_path,
upgradeable_program.upgrade_authority
));
}
for sbf_program in config.sbf_programs.iter() {
path.push_str(&format!(
" --sbf-program {} {}",
sbf_program.0, sbf_program.1
));
}

if !config.enable_prover {
path.push_str(" --skip-prover");
}
for upgradeable_program in config.upgradeable_programs.iter() {
path.push_str(&format!(
" --upgradeable-program {} {} {}",
upgradeable_program.program_id,
upgradeable_program.program_path,
upgradeable_program.upgrade_authority
));
}

if config.use_surfpool {
path.push_str(" --use-surfpool");
}
if !config.enable_prover {
path.push_str(" --skip-prover");
}

for arg in config.validator_args.iter() {
path.push_str(&format!(" {}", arg));
}
if config.use_surfpool {
path.push_str(" --use-surfpool");
}

println!("Starting validator with command: {}", path);

if config.use_surfpool {
// The CLI starts surfpool, prover, and photon, then exits once all
// services are ready. Wait for it to finish so we know everything
// is up before the test proceeds.
let mut child = Command::new("sh")
.arg("-c")
.arg(path)
.stdin(Stdio::null())
.stdout(Stdio::inherit())
.stderr(Stdio::inherit())
.spawn()
.expect("Failed to start server process");
let status = child.wait().expect("Failed to wait for CLI process");
assert!(status.success(), "CLI exited with error: {}", status);
} else {
let child = Command::new("sh")
.arg("-c")
.arg(path)
.stdin(Stdio::null())
.stdout(Stdio::null())
.stderr(Stdio::null())
.spawn()
.expect("Failed to start server process");
std::mem::drop(child);
tokio::time::sleep(tokio::time::Duration::from_secs(config.wait_time)).await;
for arg in config.validator_args.iter() {
path.push_str(&format!(" {}", arg));
}

println!("Starting validator with command: {}", path);

if config.use_surfpool {
// The CLI starts surfpool, prover, and photon, then exits once all
// services are ready. Wait for it to finish so we know everything
// is up before the test proceeds.
let mut child = Command::new("sh")
.arg("-c")
.arg(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n sdk-libs/client/src/local_test_validator.rs | head -150

Repository: Lightprotocol/light-protocol

Length of output: 5342


🏁 Script executed:

rg "validator_args" sdk-libs/client/src/ -A 2 -B 2

Repository: Lightprotocol/light-protocol

Length of output: 1418


🏁 Script executed:

rg "LightValidatorConfig" sdk-libs/client/src/ -A 5 -B 2

Repository: Lightprotocol/light-protocol

Length of output: 2832


Build and pass validator command as structured argv, not a concatenated shell string.

Lines 69–105 concatenate project_root, program paths, and validator_args into a single shell string. This breaks when paths or arguments contain spaces or shell metacharacters—for example, a project root like /home/user/light protocol/... or a validator arg like --account ADDRESS /path/with spaces/file.json. The documented example for validator_args (line 42) explicitly shows file paths, confirming this is a real use case.

Instead of passing the full command to sh -c, invoke cli/test_bin/run directly as the executable and build the argument list as separate items: Command::new("cli/test_bin/run").arg("test-validator").arg("--skip-indexer")... This avoids shell tokenization entirely.

Additionally, the non-surfpool branch (lines 127–136) drops the child process and sleeps unconditionally, so startup failures are hidden. The surfpool branch (lines 109–126) correctly waits for the process and checks the exit status; apply the same pattern to the non-surfpool path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/local_test_validator.rs` around lines 69 - 115, The
current code concatenates command pieces into a single shell string and spawns
it via Command::new("sh").arg("-c").arg(path), which breaks on
spaces/metacharacters (see path construction and uses of config.enable_indexer,
config.limit_ledger_size, config.sbf_programs, config.upgradeable_programs,
config.validator_args and config.use_surfpool). Change to build a Vec<String>
(or Vec<&str>) of argv: start with the executable built from project_root +
"/cli/test_bin/run", then push "test-validator" and each flag/value as separate
.arg(...) entries (one per sbf_program pair and per upgradeable_program fields,
and each validator_arg as its own arg) and call
Command::new(executable).args(&args).spawn() instead of using "sh -c". Also make
the non-surfpool branch behave like the surfpool branch: keep the child handle,
call child.wait(), check the exit Status and propagate or log failures instead
of dropping the child and sleeping unconditionally.

Comment on lines +127 to +136
} else {
let child = Command::new("sh")
.arg("-c")
.arg(path)
.stdin(Stdio::null())
.stdout(Stdio::null())
.stderr(Stdio::null())
.spawn()?;
std::mem::drop(child);
tokio::time::sleep(tokio::time::Duration::from_secs(config.wait_time)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd sdk-libs/client && head -200 src/local_test_validator.rs | tail -100

Repository: Lightprotocol/light-protocol

Length of output: 3074


🏁 Script executed:

cd sdk-libs/client && cat -n src/local_test_validator.rs | sed -n '100,160p'

Repository: Lightprotocol/light-protocol

Length of output: 1593


🏁 Script executed:

cd sdk-libs/client && cat -n src/local_test_validator.rs | sed -n '1,50p'

Repository: Lightprotocol/light-protocol

Length of output: 2023


Keep the child handle through the delay and check for process failures in the non-surfpool branch.

The non-surfpool branch spawns the validator process but immediately drops the handle without checking if it succeeds. If the process dies during wait_time due to a bad flag, missing binary, occupied port, or other issue, callers still receive Ok(()). The surfpool branch (lines 109-126) shows the correct pattern: hold the child handle, call wait(), and check the exit status. Apply the same approach here: keep the child handle through the sleep, then call try_wait() afterward and surface any non-zero exit status as an io::Error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/local_test_validator.rs` around lines 127 - 136, The
non-surfpool branch currently spawns the validator with
Command::new(...).spawn() and immediately drops the Child, so failures during
the sleep are ignored; change this to keep the Child (e.g., let mut child =
Command::new(...).spawn()?), await the
tokio::time::sleep(Duration::from_secs(config.wait_time)).await while retaining
the child, then call child.try_wait() (or child.wait() if blocking is
acceptable) and inspect the ExitStatus; if the child has exited with a
non-success status return an io::Error (or map the ExitStatus to Err) so callers
receive an error instead of Ok(()). Ensure you mirror the surfpool branch
pattern (spawn -> keep handle -> wait/try_wait -> check status) and propagate
any non-zero exit as an io::Error.

@sergeytimoshin sergeytimoshin force-pushed the sergey/forester-concurrency branch from fbf48e5 to 46b80e9 Compare March 14, 2026 21:34
@sergeytimoshin sergeytimoshin force-pushed the sergey/forester-concurrency branch from 46b80e9 to 786422b Compare March 15, 2026 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant