Skip to content

Conversation

@BrzVlad
Copy link
Member

@BrzVlad BrzVlad commented Jan 30, 2026

We create a simple hashtable (InterpDispatchCache) that maps DispatchToken + target MT to target method to be called. The cache is similar to the DispatchCache used by VSD. It holds a single mapping per index, when a collision happens the entry will be replaced with the new one. Replaced entries are freed during GC. The expectation is that there will be few collisions given only a subset of methods are being interpreted.

This makes a microbenchmark from the suite 4x faster.

We create a simple hashtable (InterpDispatchCache) that maps DispatchToken + target MT to target method to be called. The cache is similar to the `DispatchCache` used by VSD. It holds a single mapping per index, when a collision happens the entry will be replaced with the new one. Replaced entries are freed during GC. The expectation is that there will be few collisions given only a subset of methods are being interpreted.
Copilot AI review requested due to automatic review settings January 30, 2026 20:52
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

AddIns(tailcall ? INTOP_CALLVIRT_TAIL : INTOP_CALLVIRT);
m_pLastNewIns->data[0] = GetDataItemIndex(callInfo.hMethod);
// Reserved slot for caching DispatchToken
m_pLastNewIns->data[1] = GetNewDataItemIndex(nullptr);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computing the DispatchToken here didn't seem to provide much benefit and it would require adding a lot of code for the addition to the EE interface, only for it to be used in the interpreter. Unless there is some existing api that might make it easy to compute something useful to use in the dispatch cache.

{
LIMITED_METHOD_CONTRACT;

uint16_t dispatchTokenHash = DispatchToken::From_SIZE_T(dispatchToken).GetHash();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rather expensive, would be great to use something else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least we could store the precomputed dispatch token hash into the instruction.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements cached virtual and interface method dispatch for the CoreCLR interpreter to improve performance of virtual method calls. The implementation adds a simple hash table (InterpDispatchCache) that caches the mapping from (DispatchToken, MethodTable*) to the resolved MethodDesc*, similar to the DispatchCache used by Virtual Stub Dispatch (VSD).

Changes:

  • Implements a new InterpDispatchCache structure that holds 4096 entries (12-bit cache) with collision replacement strategy
  • Refactors DispatchToken::GetHash() to be shared between VSD and interpreter caches
  • Integrates cache cleanup into GC sync points
  • Adds dispatch token caching slots to INTOP_CALLVIRT and INTOP_CALLVIRT_TAIL instructions

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/coreclr/vm/interpexec.cpp Adds InterpDispatchCache implementation with lookup/insert/reclaim logic, CreateDispatchTokenForMethod helper, and integrates cache into virtual call execution path
src/coreclr/vm/virtualcallstub.cpp Refactors hash function by moving token hashing logic into DispatchToken::GetHash() method to enable sharing with interpreter cache
src/coreclr/vm/contractimpl.h Adds GetHash() method declaration to DispatchToken struct for computing 12-bit cache hash
src/coreclr/vm/syncclean.cpp Adds call to InterpDispatchCache_ReclaimAll() during GC sync point to clean up dead cache entries
src/coreclr/interpreter/inc/intops.def Changes instruction size from 4 to 5 bytes for INTOP_CALLVIRT and INTOP_CALLVIRT_TAIL to accommodate dispatch token cache slot
src/coreclr/interpreter/compiler.h Adds GetNewDataItemIndex() helper method to allocate non-shared data item slots for runtime caching
src/coreclr/interpreter/compiler.cpp Updates EmitCall to allocate dispatch token cache slots for virtual call instructions

Comment on lines +2810 to +2814
size_t dispatchToken = (size_t)pMethod->pDataItems[ip[4]];
if (dispatchToken == 0)
{
dispatchToken = CreateDispatchTokenForMethod(pMD);
pMethod->pDataItems[ip[4]] = (void*)dispatchToken;
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dispatch token is cached by writing to pMethod->pDataItems[ip[4]] without synchronization. If multiple threads execute the same callvirt instruction simultaneously, they might race to initialize the same dispatch token slot. While writing the same value should be benign, consider using InterlockedCompareExchangeT with nullptr as the expected value to ensure only one thread performs the initialization, similar to common double-checked locking patterns. This would make the intent clearer and avoid potential issues with torn writes on some architectures.

Copilot uses AI. Check for mistakes.
#define DISPATCH_CACHE_BITS 12
#define DISPATCH_CACHE_SIZE (1 << DISPATCH_CACHE_BITS)
#define DISPATCH_CACHE_MASK (DISPATCH_CACHE_SIZE - 1)

Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a static assertion to verify that DISPATCH_CACHE_BITS equals CALL_STUB_CACHE_NUM_BITS, since the hash function in DispatchToken::GetHash assumes a 12-bit cache. This ensures consistency between the interpreter dispatch cache and the VSD cache hash function, and would catch any future changes that inadvertently modify only one of these constants.

Suggested change
static_assert(DISPATCH_CACHE_BITS == CALL_STUB_CACHE_NUM_BITS, "Interpreter dispatch cache size must match VSD dispatch cache size");

Copilot uses AI. Check for mistakes.
#include "threadsuspend.h"

#ifdef FEATURE_INTERPRETER
void InterpDispatchCache_ReclaimAll();
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forward declaration for InterpDispatchCache_ReclaimAll could be moved to a header file (e.g., interpexec.h) to make it more maintainable and discoverable. This would avoid duplication if other parts of the codebase need to reference this function in the future, and follows the general pattern of declaring public functions in headers rather than using forward declarations at call sites.

Suggested change
void InterpDispatchCache_ReclaimAll();
#include "interpexec.h"

Copilot uses AI. Check for mistakes.
};

// Global interpreter dispatch cache instance
static InterpDispatchCache g_InterpDispatchCache;
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global InterpDispatchCache variable g_InterpDispatchCache lacks explicit initialization. While C++ zero-initializes global variables (so m_cache will be all nullptr and m_pDeadList will be nullptr), it would be more explicit and safer to add a constructor to InterpDispatchCache that initializes these members, similar to how DispatchCache has an explicit constructor. This makes the initialization contract clear and prevents potential issues if the struct is used in other contexts.

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +97
InterpDispatchCacheEntry* pDeadList = m_pDeadList;

while (pDeadList != nullptr)
{
InterpDispatchCacheEntry* pNext = pDeadList->pNextDead;
delete pDeadList;
pDeadList = pNext;
}

m_pDeadList = nullptr;
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReclaimDeadEntries method reads m_pDeadList without synchronization before clearing it. While this is called during GC sync point when no other threads are running (as documented), the assignment m_pDeadList = nullptr on line 97 should use a volatile write or InterlockedExchangeT for consistency with how m_pDeadList is updated elsewhere via InterlockedCompareExchangeT. This ensures proper memory ordering and makes the code more robust if calling contexts change in the future.

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +52
InterpDispatchCacheEntry* pEntry = m_cache[idx];
// Data dependency ensures field reads are ordered after loading of `pEntry`
// The entry struct is immutable once created, so these reads are safe
if (pEntry != nullptr && pEntry->pMT == pMT && pEntry->dispatchToken == dispatchToken)
return pEntry->pTargetMD;
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Lookup method reads m_cache[idx] without acquire semantics, but the Insert method publishes entries using InterlockedExchangeT which has release semantics. To ensure proper memory ordering and guarantee that the fields of pEntry are visible after the pointer is loaded, consider using VolatileLoad to read the cache entry, similar to how VSD's DispatchCache::GetCacheEntry uses VolatileLoad. This ensures the data dependency chain is correctly established across all compiler optimizations and architectures.

Copilot uses AI. Check for mistakes.
@BrzVlad
Copy link
Member Author

BrzVlad commented Jan 30, 2026

@jkotas @davidwrighton Simplistic implementation from the discussion from the other day. Let me know if this approach seems ok.

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to look at this more on Monday, but this is about what I thought we should build here.


size_t idx = Hash(dispatchToken, pMT);

InterpDispatchCacheEntry* pEntry = m_cache[idx];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a VolatileLoadWithoutBarrier or the compiler can do bad things like make this not a single load.


ip += 4;
// Obtain the cached dispatch token or initialize it
size_t dispatchToken = (size_t)pMethod->pDataItems[ip[4]];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a read of something lazy initialized, this needs to be a VolatileLoadWithoutBarrier to prevent compiler shenanigans.

{
LIMITED_METHOD_CONTRACT;

uint16_t dispatchTokenHash = DispatchToken::From_SIZE_T(dispatchToken).GetHash();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least we could store the precomputed dispatch token hash into the instruction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants