vm: opcode-scoped intern cache on CallContext to eliminate duplicate unique.Make()#20552
vm: opcode-scoped intern cache on CallContext to eliminate duplicate unique.Make()#20552AskAlexSharov wants to merge 9 commits intomainfrom
unique.Make()#20552Conversation
yperbasis
left a comment
There was a problem hiding this comment.
Suggestions
- opExtCodeCopy — _ = stack.pop() in a var block: This is valid Go but unusual to read. Consider pulling it out as a standalone statement:
addr := scope.peekAddress()
stack := &scope.Stack
stack.pop() // addr already consumed above
var (
memOffset = stack.pop()
codeOffset = stack.pop()
length = stack.pop()
)
Or just drop the var block entirely since it's no longer grouping a clean set of pops. Minor style nit.
- PR description: The body is empty. Worth adding a sentence about the motivation (avoid double unique.Make() on gas+execute) and any benchmark delta. The prior callAddrTmp commit showed up to -23.6% on
repeated-CALL microbenchmarks — it'd be good to show numbers for the storage key side too. - makeCallVariantGasCallEIP2929 (line 180): This still does accounts.InternAddress(callContext.Stack.Back(1).Bytes20()) — reading from stack position 1, not 0, so peekAddress() can't help. Might be worth a
follow-up for a position-aware cache, but that's a separate concern. - No coverage of getCallContext initialization: The new cache fields start zero-valued (cachedKeyOk = false, cachedAddrOk = false), which is correct by default. But getCallContext doesn't explicitly reset them
— it relies on put() having already cleared them. This is fine as long as every CallContext goes through put() before being returned to the pool, which it does (checked run() in interpreter.go). Just noting
for awareness.
Verdict
The logic is correct. No semantic changes to EVM behavior — purely a performance optimization that avoids redundant intern-table lookups. Once the WIP items are resolved (description, benchmarks, possibly the
style nit), this is ready to merge.
There was a problem hiding this comment.
Pull request overview
This PR reduces repeated interning work in the EVM by adding small caches on CallContext for the current top-of-stack storage key and address, and then routing relevant gas/opcode paths through those cached helpers.
Changes:
- Add
CallContext.peekStorageKey()/CallContext.peekAddress()with per-context caching to avoid double-interning across dynamic gas + execution. - Replace direct
accounts.InternKey/InternAddresscalls in several opcode and EIP-2929 gas paths with the cached helpers. - Reset cache-valid flags when returning
CallContextto the pool.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
execution/vm/operations_acl.go |
Switch EIP-2929 access-list gas paths to use cached stack-to-interned conversions. |
execution/vm/interpreter.go |
Add cached interned key/address fields + helper methods on CallContext. |
execution/vm/instructions.go |
Use cached conversions in BALANCE / EXTCODE* / SLOAD / SSTORE / SELFDESTRUCT opcode implementations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
unique.Make()
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
yperbasis
left a comment
There was a problem hiding this comment.
Review: vm: opcode-scoped intern cache on CallContext
Summary
Adds a generation-counter cache (cacheGen + cachedKeyGen/cachedAddrGen) to CallContext so that opcodes like SLOAD, SSTORE, BALANCE, EXTCODE*, and SELFDESTRUCT don't call unique.Make twice per dispatch (once in the gas function, once in execute). The approach is clean — cacheGen increments at the top of the interpreter loop, and peekStorageKey()/peekAddress() only call InternKey/InternAddress when their local generation doesn't match.
Correctness: Verified all modified opcodes (SLOAD, SSTORE, BALANCE, EXTCODESIZE, EXTCODECOPY, EXTCODEHASH, SELFDESTRUCT, SELFDESTRUCT6780). Each consistently reads from stack position 0 in both the gas and execute phases, so the single-slot cache is sufficient. Generation invalidation works correctly: cacheGen starts at 0, gets incremented to 1 before the first opcode, and the pool put() resets all three counters to 0. No window where a stale cache value could be read.
The opExtCodeCopy rewrite from pop-first to peek-then-pop is semantically equivalent — peekAddress() reads the top entry, then the explicit pop() discards it.
Concrete concerns
1. put() doesn't nil the handle fields
func (c *CallContext) put() {
c.Memory.reset()
c.Stack.Reset()
c.cacheGen = 0
c.cachedKeyGen = 0
c.cachedAddrGen = 0
// cachedKey and cachedAddr still hold unique.Handle values
contextPool.Put(c)
}cachedKey (unique.Handle[common.Hash]) and cachedAddr (unique.Handle[common.Address]) aren't zeroed. While the generation counters prevent them from being used, the live handles keep their entries pinned in the global canonMap (preventing GC) for as long as the CallContext sits idle in the pool. In practice this is negligible — pool size is bounded by goroutine count — but for hygiene consider adding:
var zeroKey accounts.StorageKey
var zeroAddr accounts.Address
c.cachedKey = zeroKey
c.cachedAddr = zeroAddrOr define package-level zero vars to avoid the allocation. Non-blocking, just flagging.
2. Small regressions on pure-compute benchmarks
PureArithmetic/mul shows +3.16%, MemoryOps/mstore-mload +1.34%, DelegateCallProxy +1.9%. These opcodes don't touch the cache but pay the cacheGen++ cost every dispatch. The increment itself is one instruction (non-atomic u64 on a local struct), so the regression likely comes from the changed struct layout: inserting ~56 bytes of cache fields between Memory and Stack shifts the 32KB Stack.data array, altering cache-line alignment for the hot loop. The -4.4% geomean makes this an acceptable tradeoff overall, but worth being aware of.
Observations (non-blocking)
-
makeCallVariantGasCallEIP2929(operations_acl.go:180) still does rawInternAddress(callContext.Stack.Back(1).Bytes20()). Since it reads from position 1 instead of 0, the current single-slot cache can't help. A position-aware cache could address this in a follow-up. -
uint64overflow oncacheGen: At 1 gas per opcode and max ~30M gas per block, overflow would require ~6×10¹¹ blocks. Not a concern.
Verdict
The logic is correct, the benchmark data is compelling (SLOAD/SSTORE/ERC20 improvements are significant), and the implementation is minimal. The handle-pinning nit in put() is the only concrete suggestion.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // peekStorageKey returns the top-of-stack value as an interned StorageKey. | ||
| // The result is cached for the lifetime of one opcode dispatch (gas phase + | ||
| // execute phase share the same cacheGen), so unique.Make is called at most | ||
| // once per opcode. | ||
| func (ctx *CallContext) peekStorageKey() accounts.StorageKey { | ||
| if ctx.cachedKeyGen == ctx.cacheGen { | ||
| return ctx.cachedKey | ||
| } | ||
| ctx.cachedKey = accounts.InternKey(ctx.Stack.peek().Bytes32()) | ||
| ctx.cachedKeyGen = ctx.cacheGen | ||
| return ctx.cachedKey | ||
| } | ||
|
|
||
| // peekAddress returns the top-of-stack value as an interned Address. | ||
| // Cached like peekStorageKey. | ||
| func (ctx *CallContext) peekAddress() accounts.Address { | ||
| if ctx.cachedAddrGen == ctx.cacheGen { | ||
| return ctx.cachedAddr | ||
| } | ||
| ctx.cachedAddr = accounts.InternAddress(ctx.Stack.peek().Bytes20()) | ||
| ctx.cachedAddrGen = ctx.cacheGen | ||
| return ctx.cachedAddr | ||
| } |
There was a problem hiding this comment.
peekStorageKey/peekAddress cache validity is keyed only by cacheGen. Within the same opcode dispatch, if the stack top is modified (e.g., pop, swap, dup, or writing to *scope.Stack.peek()), subsequent calls in the same generation may return a stale interned key/address while the comment says it returns the current top-of-stack value. To prevent subtle misuse, either (a) explicitly document that these helpers must be called before any stack mutation in that opcode, or (b) include an additional validity check (e.g., stack index/value fingerprint) so the cache is invalidated when the top-of-stack changes.
| c.cachedKeyGen = 0 | ||
| c.cachedAddrGen = 0 |
There was a problem hiding this comment.
put() resets cacheGen, cachedKeyGen, and cachedAddrGen all to 0, which makes the cache look valid (cached*Gen == cacheGen) on a freshly pooled CallContext before the interpreter loop has incremented cacheGen. If peekStorageKey/peekAddress is ever called before the first cacheGen++, it can return stale cachedKey/cachedAddr from a prior use. Consider initializing cachedKeyGen/cachedAddrGen to a sentinel value (e.g. ^uint64(0)) on put(), or initializing cacheGen to 1 in getCallContext, so the first peek is always a miss unless explicitly populated in the current dispatch.
| c.cachedKeyGen = 0 | |
| c.cachedAddrGen = 0 | |
| c.cachedKeyGen = ^uint64(0) | |
| c.cachedAddrGen = ^uint64(0) |
Problem
Every EVM opcode that reads or writes storage (SLOAD, SSTORE,
BALANCE, EXTCODESIZE, etc.) calls accounts.InternKey or
accounts.InternAddress to convert a raw stack value into a
unique.Handle. For opcodes like SLOAD and SSTORE, this conversion
happens twice per dispatch: once in the gas function (e.g.
gasSLoadEIP2929) and once in the execute function (e.g. opSload).
These two phases are separated by the dispatch table and cannot
directly share a local variable.
unique.Make is not free: it traverses a global lock-free hash-trie
(canonMap) with several atomic loads per call. Under parallel
execution with many goroutines, cache misses on the global map become
a measurable bottleneck. Profiling BenchmarkSLOADWarm shows
canonMap.Load consuming ~16% of execution time.
Solution
Add a generation-counter cache to CallContext:
cacheGen uint64 // incremented once per opcode dispatch
cachedKeyGen uint64 // generation at which cachedKey was populated
cachedAddrGen uint64
cachedKey accounts.StorageKey
cachedAddr accounts.Address