Skip to content

System Test Optimization Exploration#4393

Draft
hkalodner wants to merge 40 commits intomasterfrom
hkalodner/test-optimization
Draft

System Test Optimization Exploration#4393
hkalodner wants to merge 40 commits intomasterfrom
hkalodner/test-optimization

Conversation

@hkalodner
Copy link
Copy Markdown
Contributor

No description provided.

hkalodner and others added 22 commits February 15, 2026 22:29
Add context files for Claude Code covering build, test, lint, architecture,
code conventions, and PR requirements. Subdirectory files cover arbos storage
model, precompile three-layer pattern, and system test NodeBuilder usage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WaitForTx previously relied on bind.WaitMined which polls for
transaction receipts every 1 second. For setup.DeployRollup, which
sends 5 sequential transactions, this added ~5 seconds of pure polling
latency to every test that deploys a rollup.

When the backend supports SubscribeNewHead (ethclient.Client,
protocol.ChainBackend), use head subscriptions for near-instant
notification instead. Falls back to bind.WaitMined for backends
without subscription support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test used a hardcoded sleep of MaxEmptyBatchDelay + 15s, which was
insufficient because the simulated beacon's block timestamps race ahead
of wall clock time (each block gets lastBlockTime+1 when mined faster
than 1/second). After ~27 blocks during setup, L1 timestamps are ~27s
ahead, but the batch poster compares against time.Now().

Replace the hardcoded sleep with a dynamic calculation based on the
actual L1 head timestamp offset, and use the lighter legacy deployment
path to reduce setup blocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 1-second sleep ran 15 times per test iteration, adding ~15 seconds
of pure wait time. The comment claimed gas estimation could underestimate
if done in the same block, but gas estimation runs in an isolated EVM
context and the simulated beacon already guarantees distinct timestamps
per block. The SendWaitTestTransactions call that follows is sufficient
to produce a new L1 block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split DeployFullRollupStack and DeployLegacyOnParentChain into separate
DeployCreator/DeployRollup steps so the expensive creator deployment
(~20 contracts: BridgeCreator, OSPs, EdgeChallengeManager, RollupCreator,
etc.) can be cached across tests while still running CreateRollup at
test time to produce the L1 events the inbox reader needs.

The cache deploys creator contracts on a temporary L1 chain once per
test run, dumps the state via RawDump, and injects contract accounts
into subsequent L1 chain genesis allocs. A unique deployer key avoids
CREATE address collisions with RollupOwner. In-process deduplication
uses sync.Map + sync.Once.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract shared deploy cache infrastructure from system_tests/ into
util/testhelpers/deploycache/ for reuse by bold tests. When mock OSP
is enabled, ChainsWithEdgeChallengeManager now uses a cached creator
deployment (initialized eagerly from TestMain) and calls DeployRollup
instead of DeployFullRollupStack, skipping ~12 expensive creator
contract deploys.

Batch UpgradeExecutor transactions in DeployRollup into a single block
to reduce simulated time drift on test backends. Replace time.Sleep +
Commit patterns in TestPostAssertion with require.Eventually polling
that commits blocks on each iteration.

TestPostAssertion drops from ~30s to ~7s.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The deploy cache optimization hardcoded boldCreatorCache.creator and
legacyCreatorCache.creator in deployOnParentChain, but those addresses
only exist on L1 (via genesis alloc). When BuildL3OnL2 called
deployOnParentChain with L2 as the parent chain, it tried to bind to
creator contracts that don't exist on L2, producing "no contract code
at given address".

Add optional BoldCreator/LegacyCreator fields to DeployConfig so
BuildL1 can pass cached addresses while BuildL3OnL2 leaves them nil,
falling back to DeployFullRollupStack/DeployLegacyOnParentChain which
deploy the creator from scratch on the parent chain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test was producing hundreds of tiny batches that never hit the
calldata size limit, passing without actually validating resizing.
Block the batch poster with a custom error during tx submission so
messages accumulate in the streamer, then trigger fallback with the
full backlog available. Replace the heavyweight two-phase approach
(time.Sleep, manual L1 block loops, Phase 1 CustomDA verification)
with checkBatchPosting + WaitForTx. Add per-batch assertions that
every non-final batch is near-full, catching the prior failure mode
where message arrival timing rather than the size constraint determined
batch boundaries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test claimed to verify that ErrMessageTooLarge triggers in-place
batch resizing, but SetMaxMessageSize changes both Store() rejection and
GetMaxMessageSize() simultaneously. The batch poster calls
GetMaxMessageSize() before Store(), so it picked up the new limit before
ever building an oversized batch — never triggering ErrMessageTooLarge.

Add SetStoreRejectSize to controllableWriter: Store() rejects messages
exceeding this size and atomically sets overrideMaxSize, simulating a DA
provider that reports its new smaller limit only after rejection. This
creates the necessary gap where GetMaxMessageSize() still returns 10KB
while Store() rejects at 5KB.

Also remove MaxDelay=60s override (use TestBatchPosterConfig default of
0) and adopt the block-accumulate-release pattern from 42efbf4,
reducing test runtime from 2+ minutes to ~3 seconds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace bind.WaitMined (1s polling) with ethutil.WaitForTx (200ms polling)
across 11 setup calls and the auctioneer's resolveAuction. Reduce auction
round duration from 12s to 3s, express lane advantage from 5s to 500ms,
and background chain-progression ticker from 750ms to 200ms. Relax
roundtiminginfo validation minimums to allow shorter test rounds.

Restructure TestTimeboostExpressLaneTransactionHandling to verify bad-nonce
tx rejection via missing receipt instead of waiting for a 5s
EnsureTxSucceeded timeout, and front-load express lane client setup before
bid placement to maximize time within the active round.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move waitForL1DelayBlocks out of the insertRetriables loop so it runs
once after all retryables are submitted instead of 50 times. Use
WaitForTx instead of EnsureTxSucceeded for L1 receipts to skip the
safe-block polling overhead.

Add KeepL1Advancing helper to common_test.go that produces background
L1 blocks so the header reader and delayed sequencer keep advancing
while the test waits for L2 receipts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 65-second time.Sleep with a synchronous Freeze() call on the
underlying freezerdb. The freezerdb has a trigger channel designed for
test determinism, but it's hidden behind wrapper types (dbWithWasmEntry,
closeTrackingDB) that don't forward the method. Use reflect to unwrap
through the embedded Database fields to reach it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace sleep-based polling loops (500ms × 91 iterations) in
testBatchPosterParallel and TestRedisBatchPosterHandoff with
KeepL1Advancing (100ms background L1 blocks) and WaitForTx
(header-subscription-based waiting). This speeds up
TestBatchPosterParallel, TestRedisBatchPosterParallel, and
TestRedisBatchPosterHandoff from ~50-64s to ~3s each.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add KeepL1Advancing to prevent L1 stalls under parallel test contention,
and validate only the receipt block instead of all blocks from 1. The JIT
override in validateBlocks/validateBlockRange forced validation of every
block, causing 5 sequential WASM replays at 5-10s each under contention.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace synchronous L1 transaction burst loops (30-100 iterations of
SendWaitTestTransactions) and waitForL1DelayBlocks helper with
background KeepL1Advancing + subscription-based waiting. This sends
L1 transactions at 100ms intervals only as long as needed, instead of
bursting unnecessary transactions upfront.

Files changed: retryable_test.go (13 call sites + deleted
waitForL1DelayBlocks), twonodes_test.go, seqcompensation_test.go,
block_validator_test.go, delayedinboxlong_test.go, debugapi_test.go,
multi_constraint_pricer_test.go.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the hard-coded 20s sleep with a polling loop that checks for
finalized/safe block propagation every 100ms. Add KeepL1Advancing so the
simulated L1 reaches the epoch boundary (every 32 blocks) promptly.
Reduce L2 block generation from 100 to 20 since the test only needs
finality data to exist, not a large chain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 450 sequential L1 transactions (15 loops x 30 txs) with
KeepL1Advancing running in the background. This eliminates the main
source of contention-sensitive overhead that caused the test to balloon
from ~5s to ~43s when run alongside other tests.

Also wait for the last direct L2 transfer on node B to ensure all
batches are synced before checking balances.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… tests

Extract keepL1Advancing(ctx, l1info, l1client) as a lower-level helper so
Send*ViaL1 functions can use background L1 advancement without a *NodeBuilder.
Convert checkBatchPosting, SendSignedTxesInBatchViaL1, SendSignedTxViaL1,
SendUnsignedTxViaL1, and tests in eth_sync, inbox_blob_failure,
multi_writer_fallback, delayed_message_filter, and validation_inputs_at
from synchronous AdvanceL1 + time.Sleep to KeepL1Advancing + condition polling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- TestBatchPosterWithDelayProofsAndBacklog: Use KeepL1Advancing instead
  of relying on synchronous L1 block production so the batch poster has
  time to detect delay buffer threshold crossings asynchronously.

- TestMultiWriterFallback_CustomDAToCalldataWithBatchResizing and
  TestBatchResizingWithoutFallback_MessageTooLarge: Flush pending batch
  poster L1 txs with AdvanceL1 after SetCustomError to prevent stale
  batches from contaminating the measurement range.

- TestProgramCacheManager: Increase gas limit for WASM activation which
  requires more gas than the default TransferGas.

- TestStylusOpcodeTraceCreate: Use gas estimation with fallback for
  transactions that may include variable L1 poster costs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r iteration

SendSignedTxViaL1 internally calls EnsureTxSucceeded on the L2 delayed
tx. When the same tx is reused across iterations, the receipt from the
first call is found immediately, causing keepL1Advancing to stop before
the batch poster can detect the delay buffer threshold. Creating a fresh
delayed tx each iteration ensures the receipt wait produces enough L1
blocks for the batch poster to react.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TestBlocksReExecutorCommitState & TestTrieGCTimestampCondition: Replace
StateAt-based assertions with NodeSource checks. The hashdb dirty cache
may retain roots under heavy parallel load, but dirty entries are
transient (lost on restart). Only disk/clean presence indicates
committed state.

TestValidationInputsAtWithWasmTarget: Fix off-by-one in batch message
count comparison (MessageCount is an exclusive upper bound) and add
explicit timeout detection.

TestNitroNodeVersionAlerter: Set initial upgrade deadline far in future
to avoid race with Build() startup time, use live config updates for
Warn/Error cases. Eliminates 6s sleep.

go-ethereum: Add NodeSource method to distinguish dirty-cache from disk
persistence. Add CLAUDE.md documenting trie GC architecture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge independent test functions that each spin up their own node into
parent tests with subtests sharing a single node. This avoids redundant
node setup and WASM deployment, reducing total test time.

- program_ink_test.go: 14 TestXxxInkUsage -> TestInkUsage with subtests
- program_test.go: TestProgram* -> TestProgramJIT + TestProgramPebble
- debugapi_test.go: TestDebugAPI + TestPrestateTracingSimple -> TestDebugTracing
- retryable_test.go: consolidated into shared subtests
- stylus_trace_test.go: consolidated into shared subtests
- batch_poster_test.go: use batch.Serialize directly
- test_info.go: add helpers for shared setup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 17, 2026

Codecov Report

❌ Patch coverage is 53.80577% with 176 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.67%. Comparing base (d29fc5c) to head (9b56cb4).
⚠️ Report is 29 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4393      +/-   ##
==========================================
- Coverage   34.73%   33.67%   -1.06%     
==========================================
  Files         489      493       +4     
  Lines       58064    58358     +294     
==========================================
- Hits        20170    19654     -516     
- Misses      34307    35315    +1008     
+ Partials     3587     3389     -198     

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 17, 2026

❌ 18 Tests Failed:

Tests completed Failed Passed Skipped
4059 18 4041 0
View the top 3 failed tests by shortest run time
TestNewEmptyStake
Stack Traces | 0.620s run time
... [CONTENT TRUNCATED: Keeping last 20 lines]
INFO [02-18|20:51:04.371] Stopping work on payload                 id=0x0349e076c171e019 reason=delivery
INFO [02-18|20:51:04.372] Imported new potential chain segment     number=15 hash=980d61..458526 blocks=1 txs=1 mgas=5.314  elapsed=2.046ms     mgasps=2596.773 triediffs=29.51KiB triedirty=0.00B
INFO [02-18|20:51:04.372] Chain head was updated                   number=15 hash=980d61..458526 root=bdd552..d4d7f4 elapsed="455.357µs"
INFO [02-18|20:51:04.380] Creating rollup
INFO [02-18|20:51:04.425] Submitted transaction                    hash=0x16828a7e157797723c335ae4cf0f1747e81aebd4af8bc8f552bdb64f52fe8589 from=0xBaA31Eb3c2AF54AEd2fb1FBB9033eF525c8029d1 nonce=25 recipient=0x2D8C73033A8A0220c4390EEcEE056754EADEEE75 value=0
INFO [02-18|20:51:04.427] Starting work on payload                 id=0x03f58e5fe4105b4e
INFO [02-18|20:51:04.438] Updated payload                          id=0x03f58e5fe4105b4e number=16 hash=b00c7d..43dc76 txs=1 withdrawals=0 gas=7,003,006  fees=7.003006e-06  root=3c3b60..7c64ae elapsed=11.354ms
INFO [02-18|20:51:04.444] Stopping work on payload                 id=0x03f58e5fe4105b4e reason=delivery
INFO [02-18|20:51:04.462] Imported new potential chain segment     number=16 hash=b00c7d..43dc76 blocks=1 txs=1 mgas=7.003  elapsed=22.862ms    mgasps=306.315  triediffs=57.83KiB triedirty=0.00B
INFO [02-18|20:51:04.462] Chain head was updated                   number=16 hash=b00c7d..43dc76 root=3c3b60..7c64ae elapsed="373.095µs"
INFO [02-18|20:51:04.525] Submitted transaction                    hash=0x1636e245fede65a569971fc41f53a31584c729cb12a2728c4025396fa2e15be9 from=0xBaA31Eb3c2AF54AEd2fb1FBB9033eF525c8029d1 nonce=26 recipient=0x6D72409f7A5B5a2348666dEC8af16EE796AB1A04 value=0
INFO [02-18|20:51:04.530] Submitted transaction                    hash=0xbb3dbff2e2f2dd232e3af9916e9e058db1514b4a3f7633c7e9ff8b21e9ca9e87 from=0xBaA31Eb3c2AF54AEd2fb1FBB9033eF525c8029d1 nonce=27 recipient=0x6D72409f7A5B5a2348666dEC8af16EE796AB1A04 value=0
INFO [02-18|20:51:04.535] Submitted transaction                    hash=0x1677ded879519d340b2da59579578c903ba67dd85329f29357e307e49dc59d2c from=0xBaA31Eb3c2AF54AEd2fb1FBB9033eF525c8029d1 nonce=28 recipient=0x6D72409f7A5B5a2348666dEC8af16EE796AB1A04 value=0
WARN [02-18|20:51:04.539] Served eth_sendRawTransaction            reqid=235 duration="216.453µs" err="replacement transaction underpriced"
    assertion_chain_test.go:32: 
        	Error Trace:	/home/runner/work/nitro/nitro/bold/protocol/sol/assertion_chain_test.go:32
        	Error:      	Received unexpected error:
        	            	sending UpgradeExecutor call: replacement transaction underpriced
        	Test:       	TestNewEmptyStake
--- FAIL: TestNewEmptyStake (0.62s)
TestDelayInboxSimple
Stack Traces | 8.310s run time
... [CONTENT TRUNCATED: Keeping last 20 lines]
        github.com/offchainlabs/nitro/system_tests.(*TestClient).SendSignedTx(0xc06e1e9980, 0xc001ae6fc0, 0xc0f015d5a0, 0xc0a8a48f00, 0xc00d14b720)
        	/home/runner/work/nitro/nitro/system_tests/common_test.go:151 +0x5d
        github.com/offchainlabs/nitro/system_tests.TestDelayInboxSimple(0xc001ae6fc0)
        	/home/runner/work/nitro/nitro/system_tests/delayedinbox_test.go:49 +0x1ee
        testing.tRunner(0xc001ae6fc0, 0x4032f98)
        	/opt/hostedtoolcache/go/1.25.7/x64/src/testing/testing.go:1934 +0xea
        created by testing.(*T).Run in goroutine 1
        	/opt/hostedtoolcache/go/1.25.7/x64/src/testing/testing.go:1997 +0x465
        
    common_test.go:1548: �[31;1m [] waitForTx (tx=0x81a559c5d245ed40fa6295671cc849d3785606adbbc9523c2eae8c1503c00779) got: context deadline exceeded �[0;0m
INFO [02-18|21:06:49.140] Persisting dirty state                   head=2    root=b961ab..0e8164 layers=2
INFO [02-18|21:06:49.141] Persisted dirty state to disk            size=11.11KiB   elapsed="223.456µs"
INFO [02-18|21:06:49.141] Blockchain stopped
INFO [02-18|21:06:49.142] Ethereum protocol stopped
INFO [02-18|21:06:49.174] Transaction pool stopped
INFO [02-18|21:06:49.174] Persisting dirty state                   head=23   root=942f2d..5dd88e layers=23
INFO [02-18|21:06:49.177] Persisted dirty state to disk            size=131.80KiB  elapsed=2.098ms
INFO [02-18|21:06:49.177] Blockchain stopped
    common_test.go:629: test ran for 6.332897319s (weight 2)
--- FAIL: TestDelayInboxSimple (8.31s)
TestDelayedMessageFilterNonFilteredPasses
Stack Traces | 11.140s run time
... [CONTENT TRUNCATED: Keeping last 20 lines]
INFO [02-18|21:06:54.228] Chain head was updated                   number=7013 hash=cfcf11..e8586d root=e6ec5b..f1fe44 elapsed="76.412µs"
INFO [02-18|21:06:54.229] Stopping work on payload                 id=0x03ef648c24b59ed4 reason=delivery
INFO [02-18|21:06:54.229] Updated payload                          id=0x037756762ff42efc number=7016 hash=461c99..23cb80 txs=1   withdrawals=0 gas=21000     fees=0.0021         root=0b3877..6b9ba6 elapsed=1.231ms
INFO [02-18|21:06:54.229] Stopping work on payload                 id=0x037756762ff42efc reason=delivery
INFO [02-18|21:06:54.230] Imported new potential chain segment     number=7021 hash=927c68..54fa6d blocks=1  txs=1   mgas=0.021 elapsed=1.905ms     mgasps=11.018  triediffs=797.57KiB  triedirty=3.16MiB
INFO [02-18|21:06:54.230] Chain head was updated                   number=7021 hash=927c68..54fa6d root=7665e3..f32bf6 elapsed="78.517µs"
INFO [02-18|21:06:54.231] Imported new potential chain segment     number=7016 hash=461c99..23cb80 blocks=1  txs=1   mgas=0.021 elapsed=1.543ms     mgasps=13.604  triediffs=793.37KiB  triedirty=3.15MiB
INFO [02-18|21:06:54.231] Chain head was updated                   number=7016 hash=461c99..23cb80 root=0b3877..6b9ba6 elapsed="72.024µs"
INFO [02-18|21:06:54.239] Submitted transaction                    hash=0x1944deaf470182443099b8b7220f16afb1e29c9b63b3ea516588e8a9d9938c80 from=0xb386a74Dcab67b66F8AC07B4f08365d37495Dd23 nonce=172  recipient=0xbE95a754961BE37BF76a0ae7a3a47bae29105757 value=0
INFO [02-18|21:06:54.239] DataPoster sent transaction              nonce=172  hash=1944de..938c80 feeCap=50,000,000,080  tipCap=5,000,000,000 blobFeeCap=&lt;nil&gt; gas=153,128
INFO [02-18|21:06:54.239] BatchPoster: batch sent                  sequenceNumber=173 from=174 to=175 prevDelayed=3   currentDelayed=3   totalSegments=3   numBlobs=0
INFO [02-18|21:06:54.241] Starting work on payload                 id=0x031894e34ee8d64d
INFO [02-18|21:06:54.242] Transaction pool stopped
INFO [02-18|21:06:54.242] Persisting dirty state                   head=19   root=84fd7b..4ff47f layers=19
INFO [02-18|21:06:54.244] Updated payload                          id=0x031894e34ee8d64d number=213  hash=c24fb5..c87366 txs=1   withdrawals=0 gas=140,843   fees=0.000704215    root=4bad0c..8c630a elapsed=2.442ms
INFO [02-18|21:06:54.244] Persisted dirty state to disk            size=116.11KiB  elapsed=1.796ms
INFO [02-18|21:06:54.245] Blockchain stopped
INFO [02-18|21:06:54.245] HTTP server stopped                      endpoint=127.0.0.1:41577
    common_test.go:629: test ran for 1.833193153s (weight 2)
--- FAIL: TestDelayedMessageFilterNonFilteredPasses (11.14s)

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

hkalodner and others added 2 commits February 17, 2026 01:44
The pathdb step takes ~16 min with ~25 flaky failures from context
deadline exceeded under CPU/memory contention. Tests taking 4s locally
balloon to 300-430s in CI because Go's -parallel flag (defaulting to
GOMAXPROCS) unblocks far more tests than can run productively.

Add --reduce-parallelism (which sets -p 1 -parallel nproc/4) to the
pathdb, hashdb A-batch, hashdb B-batch, and race test steps, matching
what pebble tests already use for the same reason.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge individual TestProgramArbitrator{Errors,Storage,TransientStorage,
Math,Calls,ReturnData,Logs,ActivateFails,EarlyExit} into a single
TestProgramArbitrator test with subtests. This shares one node and
deploys shared WASMs once instead of per-test, reducing total setup
overhead.

Also remove a stray blank line in retryable_test.go.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread CLAUDE.md

## Test

**Prerequisites**: `make test-go-deps` must run first -- it builds WASM artifacts, stylus test wasms, and replay environment. Without it, tests fail with missing file errors.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's true anymore. I think everything will work if you just clone recursively with submodules and make test.

Comment thread CLAUDE.md

- Add a changelog fragment to `changelog/` (keepachangelog format: `### Added`, `### Changed`, `### Fixed`). Use `### Ignored` for non-noteworthy changes (CI, deps). Filename convention: `<author>-<ticket>.md`
- CI validates the changelog via `unclog`
- Branch naming: `<author>/<ticket>-<description>`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually use this convention.

hkalodner and others added 3 commits February 17, 2026 07:38
With -p 1, all ~90 packages run sequentially, adding 24 min of serial
compilation overhead. Meanwhile -parallel nproc/4 (=8) still caused 22
context deadline exceeded failures in system_tests.

Drop -p 1 so packages compile and run in parallel as normal. Reduce
-parallel from nproc/4 to nproc/8 (=4 on 32-core CI). Since only
system_tests has many t.Parallel() calls, this effectively throttles
only the heavyweight package without penalizing the rest.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a test flag that sets the testCollection semaphore room directly,
replacing the previous approach of only using Go's -parallel flag.
Both mechanisms now work together with the same value (nproc/8):

- -parallel N: prevents Go from unblocking more than N tests past
  t.Parallel(), so test timers reflect actual execution time instead
  of including time blocked in WaitAndRun
- --test_max_concurrent=N: sets the semaphore room to N, where
  weight-2 tests (L1 builds) consume 2 slots, naturally reducing
  concurrency further for heavier tests

Previously the semaphore room was always GOMAXPROCS (32 on CI), which
never blocked anything when -parallel was <= 32. Dropped -p 1 which
was serializing all ~90 packages and adding 24 min of overhead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
hkalodner and others added 13 commits February 17, 2026 08:11
Replace flat semaphore weights with nodeWeight(n) that scales
proportionally to GOMAXPROCS/8, so concurrency adapts to machine size.
Add t.Logf timing in BuildL1/BuildL2 when parallel+semaphore wait
exceeds 1s to aid CI debugging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
With maxConcurrentNodes=8 on a 32-core runner, nodeWeight(1)=4 which
over-serializes lightweight test suites like the flaky suite (9 tests
totaling ~84 weight against room=32). Increasing to 16 gives
nodeWeight(1)=2, allowing better concurrency while still providing
weighted resource control that adapts to test heaviness.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…odes

Instead of inflating weights via nodeWeight(n) = n * GOMAXPROCS / max to
fill an oversized room, set room = min(GOMAXPROCS, maxConcurrentNodes)
and use flat weights. Same concurrency limits, simpler code, and flat
weights preserve proportionality on small machines where nodeWeight
clamped everything to 1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d weights

Replace the custom testCollection (atomic.Int64 + sync.Cond + maps) with
golang.org/x/sync/semaphore.Weighted. Each test now pre-declares its total
weight via WithExtraWeight(n) before Build(), and secondary builders
(Build2ndNode, BuildL3OnL2, etc.) validate at runtime via useExtraWeight().

Key changes:
- Add WithExtraWeight/computeWeight/useExtraWeight to NodeBuilder
- Replace WaitAndRun/DontWaitAndRun/CurrentlyRunning with semaphore.Weighted
- Add timing logs: semaphore wait (>1s) and actual test duration
- Add WithExtraWeight(n) to ~35 test files based on their secondary nodes
- Remove --reduce-parallelism from CI, use unconditional -parallel cap

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract the time-warping logic from retryable_test.go's warpL1Time into
a common AdvanceL2Time helper that works with or without L1. This
sequences a dummy transaction with a future timestamp to advance L2
block time deterministically without wall-clock delay.

Replace the 15-second time.Sleep in TestNativeTokenManagementDisabledByDefault
with AdvanceL2Time and rewrite the test to use block timestamps instead
of time.Now(), making the time relationships exact and deterministic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the 63-case nested loop in TestSkippingSavingStateAndRecreatingAfterRestart
with TestSparseArchiveCommit in go-ethereum/core/blockchain_arbitrum_test.go. The
unit test verifies the same commit/skip patterns using NodeSource on an in-memory DB
(~0.5s total) instead of spinning up Nitro nodes with Pebble and restarting (~5s each
under contention). Keep 4 representative system test cases for end-to-end confidence.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move wasm recompilation logic coverage from the heavyweight system test
into a focused unit test (TestGetCompiledProgram) that exercises:
- all targets already present (no recompilation needed)
- some targets missing (recompile only missing, reuse existing)
- all targets missing (full recompilation from on-chain code)
- fewer targets requested than stored (target shrink)
- multiple modules (no cross-contamination between module hashes)

Simplify the system test to a smoke test with 2 configs (with/without
wasmDB removal). The original 16 sub-tests were illusory — both
testWasmRecreateWithCall and testWasmRecreateWithDelegatecall ignored
their targetsBefore/targetsAfter/removeWasmDBBetween parameters and
used hardcoded values.

Also fix checkWasmStoreContent assertion after restart with preserved
wasmDB: prior targets persist alongside newly compiled ones.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ming

- Remove pruning test mode/parallel subtests, hardcode full mode, reduce
  transaction count from 200 to 50. Move parallel storage traversal
  coverage to a geth-level unit test (go-ethereum submodule).
- Fix retryable expiry tests: add +1 second to AdvanceL2Time calls to
  avoid landing exactly on the timeout boundary where the retryable is
  still considered alive.
- Add phase timing instrumentation to TestAnyTrustRekey for profiling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move ArbitrumContractTx and RedeemBlockGasUsage out of TestRetryableBasic
into standalone test functions. When these shared the same simulated
backend, KeepL1Advancing accumulated L1 timestamp drift (each block adds
~1s but is produced every 100ms), eventually causing the sequencer to
reject transactions with "L1 timestamp too far from local clock time".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add subscription-based watcher triggering, headerProvider in tests,
execution run caching across subchallenge levels, and subscription-based
WaitMined to eliminate polling latency from bisection round-trips.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lenge tail

Extract createSecondL2Node from the duplicated node-creation logic in
create2ndNodeWithConfigForBoldProtocol and createNodeBWithSharedContracts.
This also fixes a bug in createNodeBWithSharedContracts where the L1 chain
ID was hardcoded as big.NewInt(1337) instead of queried from the client.

Simplify create2ndNodeWithConfigForBoldProtocol: remove unused stackConfig
parameter (always nil), remove dead assertion chain creation code (caller
discarded it), and delegate to createSecondL2Node for the shared core.

Delete createNodeBWithSharedContracts entirely, replacing its single call
site with a direct call to createSecondL2Node.

Extract runFastChallengeAndAssertHonestWin from the identical challenge-run
tail in both low-level tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the L3-specific `startL3BoldChallengeManager` with the shared
`startBoldChallengeManager` via `boldChallengeManagerParams`, and replace
the inline OSP wait loop with `waitForHonestOSPWin`. Removes ~150 lines
of duplicated code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avoid re-creating the machine for consecutive requests with the same
validation input by caching the execution run. Also reduce the empty
queue polling interval from 1s to 100ms and remove the 1s delay after
successful result submission.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants