Fix ref count cache inconsistencies#3176
Fix ref count cache inconsistencies#3176andrewbranch wants to merge 15 commits intomicrosoft:mainfrom
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| mu sync.Mutex | ||
| value V | ||
| refCount int | ||
| deleted bool |
There was a problem hiding this comment.
No longer needed now that new entries always start with refCount == 1, at least that's nice
There was a problem hiding this comment.
Pull request overview
This PR addresses cache lifetime/ref-counting races in the project system (parse cache + extended tsconfig cache) that can lead to premature cache entry deletion and panics during concurrent snapshot cloning and auto-import preparation.
Changes:
- Restore “acquire-on-load” semantics for the parse cache so new source file cache entries start with
refCount == 1, and adjust program cloning to ref unchanged files whenUpdateProgramreuses them. - Replace extended tsconfig ref-counting with snapshot-ID ownership tracking (
OwnerCache) and update clone/dispose paths accordingly. - Add/adjust tests covering cloning disposed snapshots and correct cache entry release behavior.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/project/snapshot.go | Updates program counter semantics for reused vs new programs; switches extended-config tracking to snapshot ownership acquire/release. |
| internal/project/session.go | Updates cache stats logging to reflect OwnerCache entry type. |
| internal/project/refcountcache.go | Reintroduces Acquire semantics for ref-counted cache entries (used by parse cache). |
| internal/project/parsecache.go | Updates parse cache to use the new RefCountCache.Acquire-only API. |
| internal/project/compilerhost.go | Ensures program construction acquires parse cache entries immediately via Acquire. |
| internal/project/project.go | When UpdateProgram clones, refs unchanged files since only the changed file is acquired during update. |
| internal/project/programcounter.go | Splits “new program” (Add) vs “ref existing only” (Ref returning bool) to avoid double-dispose patterns. |
| internal/project/autoimport.go | Uses Acquire instead of Load+Ref for auto-import registry source file access and balances via Deref on dispose. |
| internal/project/ownercache.go | Introduces OwnerCache for snapshot ownership tracking (used by extended config cache). |
| internal/project/extendedconfigcache.go | Switches extended config caching from ref-counting to OwnerCache with expiration logic. |
| internal/project/configfileregistrybuilder.go | Acquires extended configs using the current snapshot ID during config parsing. |
| internal/project/projectcollectionbuilder.go | Passes snapshot ID into config file registry builder to enable per-snapshot ownership. |
| internal/project/refcountcache_test.go | Expands tests for parse cache lifetime and cloning disposed snapshots. |
| internal/project/extendedconfigcache_test.go | Updates test expectations from ref counts to owner counts and adds ownership scenarios. |
jakebailey
left a comment
There was a problem hiding this comment.
I definitely like the invariant of ensuring we start with refcount at 1
|
Then again, the copilot comments do sound plausible |
|
Yes, they're correct. The last commit is a solution but I don't like it. It's kind of pointing towards "no, snapshots really should have a ref count," but we saw in #2905 that it's pretty confusing trying to track where to release snapshots since they're sometimes used explicitly and sometimes used implicitly by the language service. Language service objects would have be disposed too, or always come with a snapshot release func, or be used inside a closure, or whatever. It's not impossible to implement carefully and correctly... maybe that's better? Ugh |
internal/project/ownercache.go
Outdated
| type OwnerCacheOptions struct { | ||
| DisableDeletion bool | ||
| } | ||
|
|
There was a problem hiding this comment.
Can we add documentation on what this is for?
There was a problem hiding this comment.
It wasn't used for anything, it was just copied from the ref count cache without its doc comment. Deleted.
There was a problem hiding this comment.
I meant on owner cache in general: what an owner cache is for. Also, I think we could probably merge the parse cache and this if we abstract the whole owner/ref counting thing into an interface or something, but not sure if it's worth it instead of maybe just adding a comment saying if you update one, you should update the other.
There was a problem hiding this comment.
Yeah, I thought about sharing some code with an interface, but didn't want to introduce any unnecessary overhead into the parse cache. I can add a doc comment.
That was where my PR was headed; I just never committed the wrapper closure code since we opted to not go that way. I don't really mind so long as things are working 😄 |
|
General question: I thought the reason the parse cache didn't make entries start with ref count 1 was that we could somehow parse a file that was temporary and didn't make it into a program. That's not possible, is it? |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@gabritto not exactly, you might be thinking of the awkwardness around updating a program:
|
Ah yes, I think that was it. Thanks for the explanation. Seems there's no ideal solution 😅 |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
I hate it all, but I don't have better ideas. We might consider going back to snapshot ref counts, but that would only eliminate the program counter's Add/Ref split and the extended config cache's TryAcquire. The parse cache complications would remain. |
|
|
||
| var missingFiles []string | ||
| files := make([]*ast.SourceFile, 0, totalFileCount-libFileCount) | ||
| duplicateSourceFiles := make([]*ast.SourceFile, 0) |
There was a problem hiding this comment.
Might as well just use var duplicateSourceFiles []*ast.SourceFile, otherwise it allocates an empty slice that's immediately overwritten on append.
Fixes #3086
Unfortunately, the combination of simplifications in #2483 and #3062 interacted badly and introduced two possible ways of releasing source files from the cache too early.
Bug 1: building auto-imports while building a program
#2483 made the LoadOrStore operation of the ref counting cache not actually increment the ref count of the entry so we could remove some special handling of source file ref counts around program cloning. Instead, we would ref everything in a dedicated pass at the end of snapshot cloning. That was intended as a simplification to make the code easier to reason about, but having any period of time where entries in the cache have
refCount == 0but can't be deleted was a bad idea. The failure mode:refCount == 0; theparseCache.Ref()call hasn't happened yet.The fix here is to revert the simplification from #2483 and have parse cache entries begin life with
refCount == 1.The actual bug fixed in #2483 was not about parse cache entries at all, but about extended config cache entries, which used the same underlying ref counting cache infrastructure. The issue with extended config entries was their natural pattern of being pulled on by the config parser was extremely hard to follow, such that it was incredibly difficult to impossible to ensure that every snapshot refs every extended config exactly one time if incrementing the counter happened as a side effect of config parsing—that was where a single dedicated loop of reffing instead of reffing on the fly really paid off. As an alternative, I've switched extended config caches to store a set of owners instead of a count. This way, config parsing can immediately and atomically add the snapshot as an owner of anything it pulls on, but we can also do a pass at the end of snapshot cloning to ref every extended config without being concerned about double-reffing.
These two changes are in 55c29ba.
Note: I considered using the an owner cache for the parse cache too (owner key for that being program ID, not snapshot ID), but because there are typically orders of magnitude more source files than extended config files, I wanted to avoid the overhead of allocating a map on every parse cache entry.
Bug 2: building auto-imports by cloning a disposed snapshot
#3062 was another simplification that asserted that we don't need ref counting on snapshots, because disposing a snapshot doesn't make it unclonable—at worst, it throws away parse cache entries that could have been reused, but they're reconstructable, and we don't have any code paths that would benefit from keeping the cache entries around.
That was true, but the implementation created an asymmetry when cloning a snapshot that's already been disposed:
Dispose()on the previous one, still held by Request A. The disposal drops the ref count for the old program to zero, which triggers the release of some files from the parse cache.refCount == 1for it, an operation that can normally only happen in conjunction with reffing all its files in the parse cache.The fix: for newly created programs, unconditionally add them to the program counter. Otherwise, increment their ref count only if they still have an entry in the counter. If the entry is gone, they've been previously disposed, and re-adding them would cause an eventual double-dispose. 07af679
Alternatives considered: many. Most tempting is to delete the program counter entirely and unconditionally ref every source file across every program in every snapshot clone exactly once per occurrence, and do the same exact loop in snapshot disposal. This was in an earlier commit of #3062, but I benchmarked it as adding about 10ms per 100k files. 100k files is a lot and 10ms is a little, but at the same time, snapshot clones are very hot (typically one per keystroke plus change), so at the time I decided it was worth the added mental load to keep it as cheap as possible. I could now be convinced to take on that cost, but, I think the fix here is correct, and actually not too much code. But I am getting tired of this stuff being a problem, so simplifying is tempting.