Draft
Conversation
Add context files for Claude Code covering build, test, lint, architecture, code conventions, and PR requirements. Subdirectory files cover arbos storage model, precompile three-layer pattern, and system test NodeBuilder usage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WaitForTx previously relied on bind.WaitMined which polls for transaction receipts every 1 second. For setup.DeployRollup, which sends 5 sequential transactions, this added ~5 seconds of pure polling latency to every test that deploys a rollup. When the backend supports SubscribeNewHead (ethclient.Client, protocol.ChainBackend), use head subscriptions for near-instant notification instead. Falls back to bind.WaitMined for backends without subscription support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test used a hardcoded sleep of MaxEmptyBatchDelay + 15s, which was insufficient because the simulated beacon's block timestamps race ahead of wall clock time (each block gets lastBlockTime+1 when mined faster than 1/second). After ~27 blocks during setup, L1 timestamps are ~27s ahead, but the batch poster compares against time.Now(). Replace the hardcoded sleep with a dynamic calculation based on the actual L1 head timestamp offset, and use the lighter legacy deployment path to reduce setup blocks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 1-second sleep ran 15 times per test iteration, adding ~15 seconds of pure wait time. The comment claimed gas estimation could underestimate if done in the same block, but gas estimation runs in an isolated EVM context and the simulated beacon already guarantees distinct timestamps per block. The SendWaitTestTransactions call that follows is sufficient to produce a new L1 block. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split DeployFullRollupStack and DeployLegacyOnParentChain into separate DeployCreator/DeployRollup steps so the expensive creator deployment (~20 contracts: BridgeCreator, OSPs, EdgeChallengeManager, RollupCreator, etc.) can be cached across tests while still running CreateRollup at test time to produce the L1 events the inbox reader needs. The cache deploys creator contracts on a temporary L1 chain once per test run, dumps the state via RawDump, and injects contract accounts into subsequent L1 chain genesis allocs. A unique deployer key avoids CREATE address collisions with RollupOwner. In-process deduplication uses sync.Map + sync.Once. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract shared deploy cache infrastructure from system_tests/ into util/testhelpers/deploycache/ for reuse by bold tests. When mock OSP is enabled, ChainsWithEdgeChallengeManager now uses a cached creator deployment (initialized eagerly from TestMain) and calls DeployRollup instead of DeployFullRollupStack, skipping ~12 expensive creator contract deploys. Batch UpgradeExecutor transactions in DeployRollup into a single block to reduce simulated time drift on test backends. Replace time.Sleep + Commit patterns in TestPostAssertion with require.Eventually polling that commits blocks on each iteration. TestPostAssertion drops from ~30s to ~7s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The deploy cache optimization hardcoded boldCreatorCache.creator and legacyCreatorCache.creator in deployOnParentChain, but those addresses only exist on L1 (via genesis alloc). When BuildL3OnL2 called deployOnParentChain with L2 as the parent chain, it tried to bind to creator contracts that don't exist on L2, producing "no contract code at given address". Add optional BoldCreator/LegacyCreator fields to DeployConfig so BuildL1 can pass cached addresses while BuildL3OnL2 leaves them nil, falling back to DeployFullRollupStack/DeployLegacyOnParentChain which deploy the creator from scratch on the parent chain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test was producing hundreds of tiny batches that never hit the calldata size limit, passing without actually validating resizing. Block the batch poster with a custom error during tx submission so messages accumulate in the streamer, then trigger fallback with the full backlog available. Replace the heavyweight two-phase approach (time.Sleep, manual L1 block loops, Phase 1 CustomDA verification) with checkBatchPosting + WaitForTx. Add per-batch assertions that every non-final batch is near-full, catching the prior failure mode where message arrival timing rather than the size constraint determined batch boundaries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test claimed to verify that ErrMessageTooLarge triggers in-place batch resizing, but SetMaxMessageSize changes both Store() rejection and GetMaxMessageSize() simultaneously. The batch poster calls GetMaxMessageSize() before Store(), so it picked up the new limit before ever building an oversized batch — never triggering ErrMessageTooLarge. Add SetStoreRejectSize to controllableWriter: Store() rejects messages exceeding this size and atomically sets overrideMaxSize, simulating a DA provider that reports its new smaller limit only after rejection. This creates the necessary gap where GetMaxMessageSize() still returns 10KB while Store() rejects at 5KB. Also remove MaxDelay=60s override (use TestBatchPosterConfig default of 0) and adopt the block-accumulate-release pattern from 42efbf4, reducing test runtime from 2+ minutes to ~3 seconds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace bind.WaitMined (1s polling) with ethutil.WaitForTx (200ms polling) across 11 setup calls and the auctioneer's resolveAuction. Reduce auction round duration from 12s to 3s, express lane advantage from 5s to 500ms, and background chain-progression ticker from 750ms to 200ms. Relax roundtiminginfo validation minimums to allow shorter test rounds. Restructure TestTimeboostExpressLaneTransactionHandling to verify bad-nonce tx rejection via missing receipt instead of waiting for a 5s EnsureTxSucceeded timeout, and front-load express lane client setup before bid placement to maximize time within the active round. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move waitForL1DelayBlocks out of the insertRetriables loop so it runs once after all retryables are submitted instead of 50 times. Use WaitForTx instead of EnsureTxSucceeded for L1 receipts to skip the safe-block polling overhead. Add KeepL1Advancing helper to common_test.go that produces background L1 blocks so the header reader and delayed sequencer keep advancing while the test waits for L2 receipts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 65-second time.Sleep with a synchronous Freeze() call on the underlying freezerdb. The freezerdb has a trigger channel designed for test determinism, but it's hidden behind wrapper types (dbWithWasmEntry, closeTrackingDB) that don't forward the method. Use reflect to unwrap through the embedded Database fields to reach it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace sleep-based polling loops (500ms × 91 iterations) in testBatchPosterParallel and TestRedisBatchPosterHandoff with KeepL1Advancing (100ms background L1 blocks) and WaitForTx (header-subscription-based waiting). This speeds up TestBatchPosterParallel, TestRedisBatchPosterParallel, and TestRedisBatchPosterHandoff from ~50-64s to ~3s each. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add KeepL1Advancing to prevent L1 stalls under parallel test contention, and validate only the receipt block instead of all blocks from 1. The JIT override in validateBlocks/validateBlockRange forced validation of every block, causing 5 sequential WASM replays at 5-10s each under contention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace synchronous L1 transaction burst loops (30-100 iterations of SendWaitTestTransactions) and waitForL1DelayBlocks helper with background KeepL1Advancing + subscription-based waiting. This sends L1 transactions at 100ms intervals only as long as needed, instead of bursting unnecessary transactions upfront. Files changed: retryable_test.go (13 call sites + deleted waitForL1DelayBlocks), twonodes_test.go, seqcompensation_test.go, block_validator_test.go, delayedinboxlong_test.go, debugapi_test.go, multi_constraint_pricer_test.go. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the hard-coded 20s sleep with a polling loop that checks for finalized/safe block propagation every 100ms. Add KeepL1Advancing so the simulated L1 reaches the epoch boundary (every 32 blocks) promptly. Reduce L2 block generation from 100 to 20 since the test only needs finality data to exist, not a large chain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 450 sequential L1 transactions (15 loops x 30 txs) with KeepL1Advancing running in the background. This eliminates the main source of contention-sensitive overhead that caused the test to balloon from ~5s to ~43s when run alongside other tests. Also wait for the last direct L2 transfer on node B to ensure all batches are synced before checking balances. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… tests Extract keepL1Advancing(ctx, l1info, l1client) as a lower-level helper so Send*ViaL1 functions can use background L1 advancement without a *NodeBuilder. Convert checkBatchPosting, SendSignedTxesInBatchViaL1, SendSignedTxViaL1, SendUnsignedTxViaL1, and tests in eth_sync, inbox_blob_failure, multi_writer_fallback, delayed_message_filter, and validation_inputs_at from synchronous AdvanceL1 + time.Sleep to KeepL1Advancing + condition polling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- TestBatchPosterWithDelayProofsAndBacklog: Use KeepL1Advancing instead of relying on synchronous L1 block production so the batch poster has time to detect delay buffer threshold crossings asynchronously. - TestMultiWriterFallback_CustomDAToCalldataWithBatchResizing and TestBatchResizingWithoutFallback_MessageTooLarge: Flush pending batch poster L1 txs with AdvanceL1 after SetCustomError to prevent stale batches from contaminating the measurement range. - TestProgramCacheManager: Increase gas limit for WASM activation which requires more gas than the default TransferGas. - TestStylusOpcodeTraceCreate: Use gas estimation with fallback for transactions that may include variable L1 poster costs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r iteration SendSignedTxViaL1 internally calls EnsureTxSucceeded on the L2 delayed tx. When the same tx is reused across iterations, the receipt from the first call is found immediately, causing keepL1Advancing to stop before the batch poster can detect the delay buffer threshold. Creating a fresh delayed tx each iteration ensures the receipt wait produces enough L1 blocks for the batch poster to react. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TestBlocksReExecutorCommitState & TestTrieGCTimestampCondition: Replace StateAt-based assertions with NodeSource checks. The hashdb dirty cache may retain roots under heavy parallel load, but dirty entries are transient (lost on restart). Only disk/clean presence indicates committed state. TestValidationInputsAtWithWasmTarget: Fix off-by-one in batch message count comparison (MessageCount is an exclusive upper bound) and add explicit timeout detection. TestNitroNodeVersionAlerter: Set initial upgrade deadline far in future to avoid race with Build() startup time, use live config updates for Warn/Error cases. Eliminates 6s sleep. go-ethereum: Add NodeSource method to distinguish dirty-cache from disk persistence. Add CLAUDE.md documenting trie GC architecture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge independent test functions that each spin up their own node into parent tests with subtests sharing a single node. This avoids redundant node setup and WASM deployment, reducing total test time. - program_ink_test.go: 14 TestXxxInkUsage -> TestInkUsage with subtests - program_test.go: TestProgram* -> TestProgramJIT + TestProgramPebble - debugapi_test.go: TestDebugAPI + TestPrestateTracingSimple -> TestDebugTracing - retryable_test.go: consolidated into shared subtests - stylus_trace_test.go: consolidated into shared subtests - batch_poster_test.go: use batch.Serialize directly - test_info.go: add helpers for shared setup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #4393 +/- ##
==========================================
- Coverage 34.73% 33.67% -1.06%
==========================================
Files 489 493 +4
Lines 58064 58358 +294
==========================================
- Hits 20170 19654 -516
- Misses 34307 35315 +1008
+ Partials 3587 3389 -198 |
Contributor
❌ 18 Tests Failed:
View the top 3 failed tests by shortest run time
📣 Thoughts on this report? Let Codecov know! | Powered by Codecov |
The pathdb step takes ~16 min with ~25 flaky failures from context deadline exceeded under CPU/memory contention. Tests taking 4s locally balloon to 300-430s in CI because Go's -parallel flag (defaulting to GOMAXPROCS) unblocks far more tests than can run productively. Add --reduce-parallelism (which sets -p 1 -parallel nproc/4) to the pathdb, hashdb A-batch, hashdb B-batch, and race test steps, matching what pebble tests already use for the same reason. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge individual TestProgramArbitrator{Errors,Storage,TransientStorage,
Math,Calls,ReturnData,Logs,ActivateFails,EarlyExit} into a single
TestProgramArbitrator test with subtests. This shares one node and
deploys shared WASMs once instead of per-test, reducing total setup
overhead.
Also remove a stray blank line in retryable_test.go.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
eljobe
reviewed
Feb 17, 2026
|
|
||
| ## Test | ||
|
|
||
| **Prerequisites**: `make test-go-deps` must run first -- it builds WASM artifacts, stylus test wasms, and replay environment. Without it, tests fail with missing file errors. |
Member
There was a problem hiding this comment.
I don't think that's true anymore. I think everything will work if you just clone recursively with submodules and make test.
|
|
||
| - Add a changelog fragment to `changelog/` (keepachangelog format: `### Added`, `### Changed`, `### Fixed`). Use `### Ignored` for non-noteworthy changes (CI, deps). Filename convention: `<author>-<ticket>.md` | ||
| - CI validates the changelog via `unclog` | ||
| - Branch naming: `<author>/<ticket>-<description>` |
Member
There was a problem hiding this comment.
I don't actually use this convention.
With -p 1, all ~90 packages run sequentially, adding 24 min of serial compilation overhead. Meanwhile -parallel nproc/4 (=8) still caused 22 context deadline exceeded failures in system_tests. Drop -p 1 so packages compile and run in parallel as normal. Reduce -parallel from nproc/4 to nproc/8 (=4 on 32-core CI). Since only system_tests has many t.Parallel() calls, this effectively throttles only the heavyweight package without penalizing the rest. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a test flag that sets the testCollection semaphore room directly, replacing the previous approach of only using Go's -parallel flag. Both mechanisms now work together with the same value (nproc/8): - -parallel N: prevents Go from unblocking more than N tests past t.Parallel(), so test timers reflect actual execution time instead of including time blocked in WaitAndRun - --test_max_concurrent=N: sets the semaphore room to N, where weight-2 tests (L1 builds) consume 2 slots, naturally reducing concurrency further for heavier tests Previously the semaphore room was always GOMAXPROCS (32 on CI), which never blocked anything when -parallel was <= 32. Dropped -p 1 which was serializing all ~90 packages and adding 24 min of overhead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ency" This reverts commit 1e1e797.
Replace flat semaphore weights with nodeWeight(n) that scales proportionally to GOMAXPROCS/8, so concurrency adapts to machine size. Add t.Logf timing in BuildL1/BuildL2 when parallel+semaphore wait exceeds 1s to aid CI debugging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
With maxConcurrentNodes=8 on a 32-core runner, nodeWeight(1)=4 which over-serializes lightweight test suites like the flaky suite (9 tests totaling ~84 weight against room=32). Increasing to 16 gives nodeWeight(1)=2, allowing better concurrency while still providing weighted resource control that adapts to test heaviness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…odes Instead of inflating weights via nodeWeight(n) = n * GOMAXPROCS / max to fill an oversized room, set room = min(GOMAXPROCS, maxConcurrentNodes) and use flat weights. Same concurrency limits, simpler code, and flat weights preserve proportionality on small machines where nodeWeight clamped everything to 1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d weights Replace the custom testCollection (atomic.Int64 + sync.Cond + maps) with golang.org/x/sync/semaphore.Weighted. Each test now pre-declares its total weight via WithExtraWeight(n) before Build(), and secondary builders (Build2ndNode, BuildL3OnL2, etc.) validate at runtime via useExtraWeight(). Key changes: - Add WithExtraWeight/computeWeight/useExtraWeight to NodeBuilder - Replace WaitAndRun/DontWaitAndRun/CurrentlyRunning with semaphore.Weighted - Add timing logs: semaphore wait (>1s) and actual test duration - Add WithExtraWeight(n) to ~35 test files based on their secondary nodes - Remove --reduce-parallelism from CI, use unconditional -parallel cap Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract the time-warping logic from retryable_test.go's warpL1Time into a common AdvanceL2Time helper that works with or without L1. This sequences a dummy transaction with a future timestamp to advance L2 block time deterministically without wall-clock delay. Replace the 15-second time.Sleep in TestNativeTokenManagementDisabledByDefault with AdvanceL2Time and rewrite the test to use block timestamps instead of time.Now(), making the time relationships exact and deterministic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the 63-case nested loop in TestSkippingSavingStateAndRecreatingAfterRestart with TestSparseArchiveCommit in go-ethereum/core/blockchain_arbitrum_test.go. The unit test verifies the same commit/skip patterns using NodeSource on an in-memory DB (~0.5s total) instead of spinning up Nitro nodes with Pebble and restarting (~5s each under contention). Keep 4 representative system test cases for end-to-end confidence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move wasm recompilation logic coverage from the heavyweight system test into a focused unit test (TestGetCompiledProgram) that exercises: - all targets already present (no recompilation needed) - some targets missing (recompile only missing, reuse existing) - all targets missing (full recompilation from on-chain code) - fewer targets requested than stored (target shrink) - multiple modules (no cross-contamination between module hashes) Simplify the system test to a smoke test with 2 configs (with/without wasmDB removal). The original 16 sub-tests were illusory — both testWasmRecreateWithCall and testWasmRecreateWithDelegatecall ignored their targetsBefore/targetsAfter/removeWasmDBBetween parameters and used hardcoded values. Also fix checkWasmStoreContent assertion after restart with preserved wasmDB: prior targets persist alongside newly compiled ones. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ming - Remove pruning test mode/parallel subtests, hardcode full mode, reduce transaction count from 200 to 50. Move parallel storage traversal coverage to a geth-level unit test (go-ethereum submodule). - Fix retryable expiry tests: add +1 second to AdvanceL2Time calls to avoid landing exactly on the timeout boundary where the retryable is still considered alive. - Add phase timing instrumentation to TestAnyTrustRekey for profiling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move ArbitrumContractTx and RedeemBlockGasUsage out of TestRetryableBasic into standalone test functions. When these shared the same simulated backend, KeepL1Advancing accumulated L1 timestamp drift (each block adds ~1s but is produced every 100ms), eventually causing the sequencer to reject transactions with "L1 timestamp too far from local clock time". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add subscription-based watcher triggering, headerProvider in tests, execution run caching across subchallenge levels, and subscription-based WaitMined to eliminate polling latency from bisection round-trips. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lenge tail Extract createSecondL2Node from the duplicated node-creation logic in create2ndNodeWithConfigForBoldProtocol and createNodeBWithSharedContracts. This also fixes a bug in createNodeBWithSharedContracts where the L1 chain ID was hardcoded as big.NewInt(1337) instead of queried from the client. Simplify create2ndNodeWithConfigForBoldProtocol: remove unused stackConfig parameter (always nil), remove dead assertion chain creation code (caller discarded it), and delegate to createSecondL2Node for the shared core. Delete createNodeBWithSharedContracts entirely, replacing its single call site with a direct call to createSecondL2Node. Extract runFastChallengeAndAssertHonestWin from the identical challenge-run tail in both low-level tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the L3-specific `startL3BoldChallengeManager` with the shared `startBoldChallengeManager` via `boldChallengeManagerParams`, and replace the inline OSP wait loop with `waitForHonestOSPWin`. Removes ~150 lines of duplicated code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avoid re-creating the machine for consecutive requests with the same validation input by caching the execution run. Also reduce the empty queue polling interval from 1s to 100ms and remove the 1s delay after successful result submission. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.