System Test Optimization Exploration by hkalodner · Pull Request #4393 · OffchainLabs/nitro

hkalodner · 2026-02-17T05:47:06Z

No description provided.

Add context files for Claude Code covering build, test, lint, architecture, code conventions, and PR requirements. Subdirectory files cover arbos storage model, precompile three-layer pattern, and system test NodeBuilder usage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

WaitForTx previously relied on bind.WaitMined which polls for transaction receipts every 1 second. For setup.DeployRollup, which sends 5 sequential transactions, this added ~5 seconds of pure polling latency to every test that deploys a rollup. When the backend supports SubscribeNewHead (ethclient.Client, protocol.ChainBackend), use head subscriptions for near-instant notification instead. Falls back to bind.WaitMined for backends without subscription support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The test used a hardcoded sleep of MaxEmptyBatchDelay + 15s, which was insufficient because the simulated beacon's block timestamps race ahead of wall clock time (each block gets lastBlockTime+1 when mined faster than 1/second). After ~27 blocks during setup, L1 timestamps are ~27s ahead, but the batch poster compares against time.Now(). Replace the hardcoded sleep with a dynamic calculation based on the actual L1 head timestamp offset, and use the lighter legacy deployment path to reduce setup blocks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The 1-second sleep ran 15 times per test iteration, adding ~15 seconds of pure wait time. The comment claimed gas estimation could underestimate if done in the same block, but gas estimation runs in an isolated EVM context and the simulated beacon already guarantees distinct timestamps per block. The SendWaitTestTransactions call that follows is sufficient to produce a new L1 block. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Split DeployFullRollupStack and DeployLegacyOnParentChain into separate DeployCreator/DeployRollup steps so the expensive creator deployment (~20 contracts: BridgeCreator, OSPs, EdgeChallengeManager, RollupCreator, etc.) can be cached across tests while still running CreateRollup at test time to produce the L1 events the inbox reader needs. The cache deploys creator contracts on a temporary L1 chain once per test run, dumps the state via RawDump, and injects contract accounts into subsequent L1 chain genesis allocs. A unique deployer key avoids CREATE address collisions with RollupOwner. In-process deduplication uses sync.Map + sync.Once. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Extract shared deploy cache infrastructure from system_tests/ into util/testhelpers/deploycache/ for reuse by bold tests. When mock OSP is enabled, ChainsWithEdgeChallengeManager now uses a cached creator deployment (initialized eagerly from TestMain) and calls DeployRollup instead of DeployFullRollupStack, skipping ~12 expensive creator contract deploys. Batch UpgradeExecutor transactions in DeployRollup into a single block to reduce simulated time drift on test backends. Replace time.Sleep + Commit patterns in TestPostAssertion with require.Eventually polling that commits blocks on each iteration. TestPostAssertion drops from ~30s to ~7s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The deploy cache optimization hardcoded boldCreatorCache.creator and legacyCreatorCache.creator in deployOnParentChain, but those addresses only exist on L1 (via genesis alloc). When BuildL3OnL2 called deployOnParentChain with L2 as the parent chain, it tried to bind to creator contracts that don't exist on L2, producing "no contract code at given address". Add optional BoldCreator/LegacyCreator fields to DeployConfig so BuildL1 can pass cached addresses while BuildL3OnL2 leaves them nil, falling back to DeployFullRollupStack/DeployLegacyOnParentChain which deploy the creator from scratch on the parent chain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The test was producing hundreds of tiny batches that never hit the calldata size limit, passing without actually validating resizing. Block the batch poster with a custom error during tx submission so messages accumulate in the streamer, then trigger fallback with the full backlog available. Replace the heavyweight two-phase approach (time.Sleep, manual L1 block loops, Phase 1 CustomDA verification) with checkBatchPosting + WaitForTx. Add per-batch assertions that every non-final batch is near-full, catching the prior failure mode where message arrival timing rather than the size constraint determined batch boundaries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The test claimed to verify that ErrMessageTooLarge triggers in-place batch resizing, but SetMaxMessageSize changes both Store() rejection and GetMaxMessageSize() simultaneously. The batch poster calls GetMaxMessageSize() before Store(), so it picked up the new limit before ever building an oversized batch — never triggering ErrMessageTooLarge. Add SetStoreRejectSize to controllableWriter: Store() rejects messages exceeding this size and atomically sets overrideMaxSize, simulating a DA provider that reports its new smaller limit only after rejection. This creates the necessary gap where GetMaxMessageSize() still returns 10KB while Store() rejects at 5KB. Also remove MaxDelay=60s override (use TestBatchPosterConfig default of 0) and adopt the block-accumulate-release pattern from 42efbf4, reducing test runtime from 2+ minutes to ~3 seconds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace bind.WaitMined (1s polling) with ethutil.WaitForTx (200ms polling) across 11 setup calls and the auctioneer's resolveAuction. Reduce auction round duration from 12s to 3s, express lane advantage from 5s to 500ms, and background chain-progression ticker from 750ms to 200ms. Relax roundtiminginfo validation minimums to allow shorter test rounds. Restructure TestTimeboostExpressLaneTransactionHandling to verify bad-nonce tx rejection via missing receipt instead of waiting for a 5s EnsureTxSucceeded timeout, and front-load express lane client setup before bid placement to maximize time within the active round. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move waitForL1DelayBlocks out of the insertRetriables loop so it runs once after all retryables are submitted instead of 50 times. Use WaitForTx instead of EnsureTxSucceeded for L1 receipts to skip the safe-block polling overhead. Add KeepL1Advancing helper to common_test.go that produces background L1 blocks so the header reader and delayed sequencer keep advancing while the test waits for L2 receipts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace 65-second time.Sleep with a synchronous Freeze() call on the underlying freezerdb. The freezerdb has a trigger channel designed for test determinism, but it's hidden behind wrapper types (dbWithWasmEntry, closeTrackingDB) that don't forward the method. Use reflect to unwrap through the embedded Database fields to reach it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace sleep-based polling loops (500ms × 91 iterations) in testBatchPosterParallel and TestRedisBatchPosterHandoff with KeepL1Advancing (100ms background L1 blocks) and WaitForTx (header-subscription-based waiting). This speeds up TestBatchPosterParallel, TestRedisBatchPosterParallel, and TestRedisBatchPosterHandoff from ~50-64s to ~3s each. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add KeepL1Advancing to prevent L1 stalls under parallel test contention, and validate only the receipt block instead of all blocks from 1. The JIT override in validateBlocks/validateBlockRange forced validation of every block, causing 5 sequential WASM replays at 5-10s each under contention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace synchronous L1 transaction burst loops (30-100 iterations of SendWaitTestTransactions) and waitForL1DelayBlocks helper with background KeepL1Advancing + subscription-based waiting. This sends L1 transactions at 100ms intervals only as long as needed, instead of bursting unnecessary transactions upfront. Files changed: retryable_test.go (13 call sites + deleted waitForL1DelayBlocks), twonodes_test.go, seqcompensation_test.go, block_validator_test.go, delayedinboxlong_test.go, debugapi_test.go, multi_constraint_pricer_test.go. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the hard-coded 20s sleep with a polling loop that checks for finalized/safe block propagation every 100ms. Add KeepL1Advancing so the simulated L1 reaches the epoch boundary (every 32 blocks) promptly. Reduce L2 block generation from 100 to 20 since the test only needs finality data to exist, not a large chain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace 450 sequential L1 transactions (15 loops x 30 txs) with KeepL1Advancing running in the background. This eliminates the main source of contention-sensitive overhead that caused the test to balloon from ~5s to ~43s when run alongside other tests. Also wait for the last direct L2 transfer on node B to ensure all batches are synced before checking balances. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… tests Extract keepL1Advancing(ctx, l1info, l1client) as a lower-level helper so Send*ViaL1 functions can use background L1 advancement without a *NodeBuilder. Convert checkBatchPosting, SendSignedTxesInBatchViaL1, SendSignedTxViaL1, SendUnsignedTxViaL1, and tests in eth_sync, inbox_blob_failure, multi_writer_fallback, delayed_message_filter, and validation_inputs_at from synchronous AdvanceL1 + time.Sleep to KeepL1Advancing + condition polling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- TestBatchPosterWithDelayProofsAndBacklog: Use KeepL1Advancing instead of relying on synchronous L1 block production so the batch poster has time to detect delay buffer threshold crossings asynchronously. - TestMultiWriterFallback_CustomDAToCalldataWithBatchResizing and TestBatchResizingWithoutFallback_MessageTooLarge: Flush pending batch poster L1 txs with AdvanceL1 after SetCustomError to prevent stale batches from contaminating the measurement range. - TestProgramCacheManager: Increase gas limit for WASM activation which requires more gas than the default TransferGas. - TestStylusOpcodeTraceCreate: Use gas estimation with fallback for transactions that may include variable L1 poster costs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…r iteration SendSignedTxViaL1 internally calls EnsureTxSucceeded on the L2 delayed tx. When the same tx is reused across iterations, the receipt from the first call is found immediately, causing keepL1Advancing to stop before the batch poster can detect the delay buffer threshold. Creating a fresh delayed tx each iteration ensures the receipt wait produces enough L1 blocks for the batch poster to react. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TestBlocksReExecutorCommitState & TestTrieGCTimestampCondition: Replace StateAt-based assertions with NodeSource checks. The hashdb dirty cache may retain roots under heavy parallel load, but dirty entries are transient (lost on restart). Only disk/clean presence indicates committed state. TestValidationInputsAtWithWasmTarget: Fix off-by-one in batch message count comparison (MessageCount is an exclusive upper bound) and add explicit timeout detection. TestNitroNodeVersionAlerter: Set initial upgrade deadline far in future to avoid race with Build() startup time, use live config updates for Warn/Error cases. Eliminates 6s sleep. go-ethereum: Add NodeSource method to distinguish dirty-cache from disk persistence. Add CLAUDE.md documenting trie GC architecture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge independent test functions that each spin up their own node into parent tests with subtests sharing a single node. This avoids redundant node setup and WASM deployment, reducing total test time. - program_ink_test.go: 14 TestXxxInkUsage -> TestInkUsage with subtests - program_test.go: TestProgram* -> TestProgramJIT + TestProgramPebble - debugapi_test.go: TestDebugAPI + TestPrestateTracingSimple -> TestDebugTracing - retryable_test.go: consolidated into shared subtests - stylus_trace_test.go: consolidated into shared subtests - batch_poster_test.go: use batch.Serialize directly - test_info.go: add helpers for shared setup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codecov · 2026-02-17T06:03:47Z

Codecov Report

❌ Patch coverage is 53.80577% with 176 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.67%. Comparing base (d29fc5c) to head (9b56cb4).
⚠️ Report is 29 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4393      +/-   ##
==========================================
- Coverage   34.73%   33.67%   -1.06%     
==========================================
  Files         489      493       +4     
  Lines       58064    58358     +294     
==========================================
- Hits        20170    19654     -516     
- Misses      34307    35315    +1008     
+ Partials     3587     3389     -198

github-actions · 2026-02-17T06:34:53Z

❌ 18 Tests Failed:

Tests completed	Failed	Passed	Skipped
4059	18	4041	0

View the top 3 failed tests by shortest run time

TestNewEmptyStake

Stack Traces | 0.620s run time

... [CONTENT TRUNCATED: Keeping last 20 lines]
INFO [02-18|20:51:04.371] Stopping work on payload                 id=0x0349e076c171e019 reason=delivery
INFO [02-18|20:51:04.372] Imported new potential chain segment     number=15 hash=980d61..458526 blocks=1 txs=1 mgas=5.314  elapsed=2.046ms     mgasps=2596.773 triediffs=29.51KiB triedirty=0.00B
INFO [02-18|20:51:04.372] Chain head was updated                   number=15 hash=980d61..458526 root=bdd552..d4d7f4 elapsed="455.357µs"
INFO [02-18|20:51:04.380] Creating rollup
INFO [02-18|20:51:04.425] Submitted transaction                    hash=0x16828a7e157797723c335ae4cf0f1747e81aebd4af8bc8f552bdb64f52fe8589 from=0xBaA31Eb3c2AF54AEd2fb1FBB9033eF525c8029d1 nonce=25 recipient=0x2D8C73033A8A0220c4390EEcEE056754EADEEE75 value=0
INFO [02-18|20:51:04.427] Starting work on payload                 id=0x03f58e5fe4105b4e
INFO [02-18|20:51:04.438] Updated payload                          id=0x03f58e5fe4105b4e number=16 hash=b00c7d..43dc76 txs=1 withdrawals=0 gas=7,003,006  fees=7.003006e-06  root=3c3b60..7c64ae elapsed=11.354ms
INFO [02-18|20:51:04.444] Stopping work on payload                 id=0x03f58e5fe4105b4e reason=delivery
INFO [02-18|20:51:04.462] Imported new potential chain segment     number=16 hash=b00c7d..43dc76 blocks=1 txs=1 mgas=7.003  elapsed=22.862ms    mgasps=306.315  triediffs=57.83KiB triedirty=0.00B
INFO [02-18|20:51:04.462] Chain head was updated                   number=16 hash=b00c7d..43dc76 root=3c3b60..7c64ae elapsed="373.095µs"
INFO [02-18|20:51:04.525] Submitted transaction                    hash=0x1636e245fede65a569971fc41f53a31584c729cb12a2728c4025396fa2e15be9 from=0xBaA31Eb3c2AF54AEd2fb1FBB9033eF525c8029d1 nonce=26 recipient=0x6D72409f7A5B5a2348666dEC8af16EE796AB1A04 value=0
INFO [02-18|20:51:04.530] Submitted transaction                    hash=0xbb3dbff2e2f2dd232e3af9916e9e058db1514b4a3f7633c7e9ff8b21e9ca9e87 from=0xBaA31Eb3c2AF54AEd2fb1FBB9033eF525c8029d1 nonce=27 recipient=0x6D72409f7A5B5a2348666dEC8af16EE796AB1A04 value=0
INFO [02-18|20:51:04.535] Submitted transaction                    hash=0x1677ded879519d340b2da59579578c903ba67dd85329f29357e307e49dc59d2c from=0xBaA31Eb3c2AF54AEd2fb1FBB9033eF525c8029d1 nonce=28 recipient=0x6D72409f7A5B5a2348666dEC8af16EE796AB1A04 value=0
WARN [02-18|20:51:04.539] Served eth_sendRawTransaction            reqid=235 duration="216.453µs" err="replacement transaction underpriced"
    assertion_chain_test.go:32: 
        	Error Trace:	/home/runner/work/nitro/nitro/bold/protocol/sol/assertion_chain_test.go:32
        	Error:      	Received unexpected error:
        	            	sending UpgradeExecutor call: replacement transaction underpriced
        	Test:       	TestNewEmptyStake
--- FAIL: TestNewEmptyStake (0.62s)

TestDelayInboxSimple

Stack Traces | 8.310s run time

... [CONTENT TRUNCATED: Keeping last 20 lines]
        github.com/offchainlabs/nitro/system_tests.(*TestClient).SendSignedTx(0xc06e1e9980, 0xc001ae6fc0, 0xc0f015d5a0, 0xc0a8a48f00, 0xc00d14b720)
        	/home/runner/work/nitro/nitro/system_tests/common_test.go:151 +0x5d
        github.com/offchainlabs/nitro/system_tests.TestDelayInboxSimple(0xc001ae6fc0)
        	/home/runner/work/nitro/nitro/system_tests/delayedinbox_test.go:49 +0x1ee
        testing.tRunner(0xc001ae6fc0, 0x4032f98)
        	/opt/hostedtoolcache/go/1.25.7/x64/src/testing/testing.go:1934 +0xea
        created by testing.(*T).Run in goroutine 1
        	/opt/hostedtoolcache/go/1.25.7/x64/src/testing/testing.go:1997 +0x465
        
    common_test.go:1548: �[31;1m [] waitForTx (tx=0x81a559c5d245ed40fa6295671cc849d3785606adbbc9523c2eae8c1503c00779) got: context deadline exceeded �[0;0m
INFO [02-18|21:06:49.140] Persisting dirty state                   head=2    root=b961ab..0e8164 layers=2
INFO [02-18|21:06:49.141] Persisted dirty state to disk            size=11.11KiB   elapsed="223.456µs"
INFO [02-18|21:06:49.141] Blockchain stopped
INFO [02-18|21:06:49.142] Ethereum protocol stopped
INFO [02-18|21:06:49.174] Transaction pool stopped
INFO [02-18|21:06:49.174] Persisting dirty state                   head=23   root=942f2d..5dd88e layers=23
INFO [02-18|21:06:49.177] Persisted dirty state to disk            size=131.80KiB  elapsed=2.098ms
INFO [02-18|21:06:49.177] Blockchain stopped
    common_test.go:629: test ran for 6.332897319s (weight 2)
--- FAIL: TestDelayInboxSimple (8.31s)

TestDelayedMessageFilterNonFilteredPasses

Stack Traces | 11.140s run time

... [CONTENT TRUNCATED: Keeping last 20 lines]
INFO [02-18|21:06:54.228] Chain head was updated                   number=7013 hash=cfcf11..e8586d root=e6ec5b..f1fe44 elapsed="76.412µs"
INFO [02-18|21:06:54.229] Stopping work on payload                 id=0x03ef648c24b59ed4 reason=delivery
INFO [02-18|21:06:54.229] Updated payload                          id=0x037756762ff42efc number=7016 hash=461c99..23cb80 txs=1   withdrawals=0 gas=21000     fees=0.0021         root=0b3877..6b9ba6 elapsed=1.231ms
INFO [02-18|21:06:54.229] Stopping work on payload                 id=0x037756762ff42efc reason=delivery
INFO [02-18|21:06:54.230] Imported new potential chain segment     number=7021 hash=927c68..54fa6d blocks=1  txs=1   mgas=0.021 elapsed=1.905ms     mgasps=11.018  triediffs=797.57KiB  triedirty=3.16MiB
INFO [02-18|21:06:54.230] Chain head was updated                   number=7021 hash=927c68..54fa6d root=7665e3..f32bf6 elapsed="78.517µs"
INFO [02-18|21:06:54.231] Imported new potential chain segment     number=7016 hash=461c99..23cb80 blocks=1  txs=1   mgas=0.021 elapsed=1.543ms     mgasps=13.604  triediffs=793.37KiB  triedirty=3.15MiB
INFO [02-18|21:06:54.231] Chain head was updated                   number=7016 hash=461c99..23cb80 root=0b3877..6b9ba6 elapsed="72.024µs"
INFO [02-18|21:06:54.239] Submitted transaction                    hash=0x1944deaf470182443099b8b7220f16afb1e29c9b63b3ea516588e8a9d9938c80 from=0xb386a74Dcab67b66F8AC07B4f08365d37495Dd23 nonce=172  recipient=0xbE95a754961BE37BF76a0ae7a3a47bae29105757 value=0
INFO [02-18|21:06:54.239] DataPoster sent transaction              nonce=172  hash=1944de..938c80 feeCap=50,000,000,080  tipCap=5,000,000,000 blobFeeCap=&lt;nil&gt; gas=153,128
INFO [02-18|21:06:54.239] BatchPoster: batch sent                  sequenceNumber=173 from=174 to=175 prevDelayed=3   currentDelayed=3   totalSegments=3   numBlobs=0
INFO [02-18|21:06:54.241] Starting work on payload                 id=0x031894e34ee8d64d
INFO [02-18|21:06:54.242] Transaction pool stopped
INFO [02-18|21:06:54.242] Persisting dirty state                   head=19   root=84fd7b..4ff47f layers=19
INFO [02-18|21:06:54.244] Updated payload                          id=0x031894e34ee8d64d number=213  hash=c24fb5..c87366 txs=1   withdrawals=0 gas=140,843   fees=0.000704215    root=4bad0c..8c630a elapsed=2.442ms
INFO [02-18|21:06:54.244] Persisted dirty state to disk            size=116.11KiB  elapsed=1.796ms
INFO [02-18|21:06:54.245] Blockchain stopped
INFO [02-18|21:06:54.245] HTTP server stopped                      endpoint=127.0.0.1:41577
    common_test.go:629: test ran for 1.833193153s (weight 2)
--- FAIL: TestDelayedMessageFilterNonFilteredPasses (11.14s)

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

The pathdb step takes ~16 min with ~25 flaky failures from context deadline exceeded under CPU/memory contention. Tests taking 4s locally balloon to 300-430s in CI because Go's -parallel flag (defaulting to GOMAXPROCS) unblocks far more tests than can run productively. Add --reduce-parallelism (which sets -p 1 -parallel nproc/4) to the pathdb, hashdb A-batch, hashdb B-batch, and race test steps, matching what pebble tests already use for the same reason. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge individual TestProgramArbitrator{Errors,Storage,TransientStorage, Math,Calls,ReturnData,Logs,ActivateFails,EarlyExit} into a single TestProgramArbitrator test with subtests. This shares one node and deploys shared WASMs once instead of per-test, reducing total setup overhead. Also remove a stray blank line in retryable_test.go. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

eljobe · 2026-02-17T09:48:10Z

+
+## Test
+
+**Prerequisites**: `make test-go-deps` must run first -- it builds WASM artifacts, stylus test wasms, and replay environment. Without it, tests fail with missing file errors.


I don't think that's true anymore. I think everything will work if you just clone recursively with submodules and make test.

eljobe · 2026-02-17T09:49:17Z

+
+- Add a changelog fragment to `changelog/` (keepachangelog format: `### Added`, `### Changed`, `### Fixed`). Use `### Ignored` for non-noteworthy changes (CI, deps). Filename convention: `<author>-<ticket>.md`
+- CI validates the changelog via `unclog`
+- Branch naming: `<author>/<ticket>-<description>`


I don't actually use this convention.

With -p 1, all ~90 packages run sequentially, adding 24 min of serial compilation overhead. Meanwhile -parallel nproc/4 (=8) still caused 22 context deadline exceeded failures in system_tests. Drop -p 1 so packages compile and run in parallel as normal. Reduce -parallel from nproc/4 to nproc/8 (=4 on 32-core CI). Since only system_tests has many t.Parallel() calls, this effectively throttles only the heavyweight package without penalizing the rest. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add a test flag that sets the testCollection semaphore room directly, replacing the previous approach of only using Go's -parallel flag. Both mechanisms now work together with the same value (nproc/8): - -parallel N: prevents Go from unblocking more than N tests past t.Parallel(), so test timers reflect actual execution time instead of including time blocked in WaitAndRun - --test_max_concurrent=N: sets the semaphore room to N, where weight-2 tests (L1 builds) consume 2 slots, naturally reducing concurrency further for heavier tests Previously the semaphore room was always GOMAXPROCS (32 on CI), which never blocked anything when -parallel was <= 32. Dropped -p 1 which was serializing all ~90 packages and adding 24 min of overhead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ency" This reverts commit 1e1e797.

Replace flat semaphore weights with nodeWeight(n) that scales proportionally to GOMAXPROCS/8, so concurrency adapts to machine size. Add t.Logf timing in BuildL1/BuildL2 when parallel+semaphore wait exceeds 1s to aid CI debugging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

With maxConcurrentNodes=8 on a 32-core runner, nodeWeight(1)=4 which over-serializes lightweight test suites like the flaky suite (9 tests totaling ~84 weight against room=32). Increasing to 16 gives nodeWeight(1)=2, allowing better concurrency while still providing weighted resource control that adapts to test heaviness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…odes Instead of inflating weights via nodeWeight(n) = n * GOMAXPROCS / max to fill an oversized room, set room = min(GOMAXPROCS, maxConcurrentNodes) and use flat weights. Same concurrency limits, simpler code, and flat weights preserve proportionality on small machines where nodeWeight clamped everything to 1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…d weights Replace the custom testCollection (atomic.Int64 + sync.Cond + maps) with golang.org/x/sync/semaphore.Weighted. Each test now pre-declares its total weight via WithExtraWeight(n) before Build(), and secondary builders (Build2ndNode, BuildL3OnL2, etc.) validate at runtime via useExtraWeight(). Key changes: - Add WithExtraWeight/computeWeight/useExtraWeight to NodeBuilder - Replace WaitAndRun/DontWaitAndRun/CurrentlyRunning with semaphore.Weighted - Add timing logs: semaphore wait (>1s) and actual test duration - Add WithExtraWeight(n) to ~35 test files based on their secondary nodes - Remove --reduce-parallelism from CI, use unconditional -parallel cap Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Extract the time-warping logic from retryable_test.go's warpL1Time into a common AdvanceL2Time helper that works with or without L1. This sequences a dummy transaction with a future timestamp to advance L2 block time deterministically without wall-clock delay. Replace the 15-second time.Sleep in TestNativeTokenManagementDisabledByDefault with AdvanceL2Time and rewrite the test to use block timestamps instead of time.Now(), making the time relationships exact and deterministic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the 63-case nested loop in TestSkippingSavingStateAndRecreatingAfterRestart with TestSparseArchiveCommit in go-ethereum/core/blockchain_arbitrum_test.go. The unit test verifies the same commit/skip patterns using NodeSource on an in-memory DB (~0.5s total) instead of spinning up Nitro nodes with Pebble and restarting (~5s each under contention). Keep 4 representative system test cases for end-to-end confidence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move wasm recompilation logic coverage from the heavyweight system test into a focused unit test (TestGetCompiledProgram) that exercises: - all targets already present (no recompilation needed) - some targets missing (recompile only missing, reuse existing) - all targets missing (full recompilation from on-chain code) - fewer targets requested than stored (target shrink) - multiple modules (no cross-contamination between module hashes) Simplify the system test to a smoke test with 2 configs (with/without wasmDB removal). The original 16 sub-tests were illusory — both testWasmRecreateWithCall and testWasmRecreateWithDelegatecall ignored their targetsBefore/targetsAfter/removeWasmDBBetween parameters and used hardcoded values. Also fix checkWasmStoreContent assertion after restart with preserved wasmDB: prior targets persist alongside newly compiled ones. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ming - Remove pruning test mode/parallel subtests, hardcode full mode, reduce transaction count from 200 to 50. Move parallel storage traversal coverage to a geth-level unit test (go-ethereum submodule). - Fix retryable expiry tests: add +1 second to AdvanceL2Time calls to avoid landing exactly on the timeout boundary where the retryable is still considered alive. - Add phase timing instrumentation to TestAnyTrustRekey for profiling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move ArbitrumContractTx and RedeemBlockGasUsage out of TestRetryableBasic into standalone test functions. When these shared the same simulated backend, KeepL1Advancing accumulated L1 timestamp drift (each block adds ~1s but is produced every 100ms), eventually causing the sequencer to reject transactions with "L1 timestamp too far from local clock time". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add subscription-based watcher triggering, headerProvider in tests, execution run caching across subchallenge levels, and subscription-based WaitMined to eliminate polling latency from bisection round-trips. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…lenge tail Extract createSecondL2Node from the duplicated node-creation logic in create2ndNodeWithConfigForBoldProtocol and createNodeBWithSharedContracts. This also fixes a bug in createNodeBWithSharedContracts where the L1 chain ID was hardcoded as big.NewInt(1337) instead of queried from the client. Simplify create2ndNodeWithConfigForBoldProtocol: remove unused stackConfig parameter (always nil), remove dead assertion chain creation code (caller discarded it), and delegate to createSecondL2Node for the shared core. Delete createNodeBWithSharedContracts entirely, replacing its single call site with a direct call to createSecondL2Node. Extract runFastChallengeAndAssertHonestWin from the identical challenge-run tail in both low-level tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the L3-specific `startL3BoldChallengeManager` with the shared `startBoldChallengeManager` via `boldChallengeManagerParams`, and replace the inline OSP wait loop with `waitForHonestOSPWin`. Removes ~150 lines of duplicated code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Avoid re-creating the machine for consecutive requests with the same validation input by caching the execution run. Also reduce the empty queue polling interval from 1s to 100ms and remove the 1s delay after successful result submission. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

hkalodner and others added 22 commits February 15, 2026 22:29

hkalodner and others added 2 commits February 17, 2026 01:44

eljobe reviewed Feb 17, 2026

View reviewed changes

hkalodner and others added 3 commits February 17, 2026 07:38

Revert "Add --test_max_concurrent flag to control system test concurr…

be9ae7c

…ency" This reverts commit 1e1e797.

hkalodner and others added 13 commits February 17, 2026 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System Test Optimization Exploration#4393

System Test Optimization Exploration#4393
hkalodner wants to merge 40 commits intomasterfrom
hkalodner/test-optimization

hkalodner commented Feb 17, 2026

Uh oh!

codecov Bot commented Feb 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Feb 17, 2026 •

edited

Loading

Uh oh!

eljobe Feb 17, 2026

Uh oh!

eljobe Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## Test

		Prerequisites: `make test-go-deps` must run first -- it builds WASM artifacts, stylus test wasms, and replay environment. Without it, tests fail with missing file errors.

Conversation

hkalodner commented Feb 17, 2026

Uh oh!

codecov Bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ 18 Tests Failed:

Uh oh!

eljobe Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

eljobe Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Feb 17, 2026 •

edited

Loading

github-actions Bot commented Feb 17, 2026 •

edited

Loading