Fix ref count cache inconsistencies by andrewbranch · Pull Request #3176 · microsoft/typescript-go

andrewbranch · 2026-03-20T19:32:07Z

Unfortunately, the combination of simplifications in #2483 and #3062 interacted badly and introduced two possible ways of releasing source files from the cache too early.

Bug 1: building auto-imports while building a program

#2483 made the LoadOrStore operation of the ref counting cache not actually increment the ref count of the entry so we could remove some special handling of source file ref counts around program cloning. Instead, we would ref everything in a dedicated pass at the end of snapshot cloning. That was intended as a simplification to make the code easier to reason about, but having any period of time where entries in the cache have refCount == 0 but can't be deleted was a bad idea. The failure mode:

Request A causes a snapshot clone that builds a new program. Execution advances past program creation such that at least one source file is present in the parse cache with refCount == 0; the parseCache.Ref() call hasn't happened yet.
Request B meanwhile triggers an auto-import registry build. It gets the same source file created by the program in Request A, increments its ref count to 1, finishes work, and decrements its ref count to 0, causing it to be deleted from the cache.
Request A finally tries to ref every file included in the program it built, which panics because that file has been deleted from the cache in the meantime.

The fix here is to revert the simplification from #2483 and have parse cache entries begin life with refCount == 1.

The actual bug fixed in #2483 was not about parse cache entries at all, but about extended config cache entries, which used the same underlying ref counting cache infrastructure. The issue with extended config entries was their natural pattern of being pulled on by the config parser was extremely hard to follow, such that it was incredibly difficult to impossible to ensure that every snapshot refs every extended config exactly one time if incrementing the counter happened as a side effect of config parsing—that was where a single dedicated loop of reffing instead of reffing on the fly really paid off. As an alternative, I've switched extended config caches to store a set of owners instead of a count. This way, config parsing can immediately and atomically add the snapshot as an owner of anything it pulls on, but we can also do a pass at the end of snapshot cloning to ref every extended config without being concerned about double-reffing.

These two changes are in 55c29ba.

Note: I considered using the an owner cache for the parse cache too (owner key for that being program ID, not snapshot ID), but because there are typically orders of magnitude more source files than extended config files, I wanted to avoid the overhead of allocating a map on every parse cache entry.

Bug 2: building auto-imports by cloning a disposed snapshot

#3062 was another simplification that asserted that we don't need ref counting on snapshots, because disposing a snapshot doesn't make it unclonable—at worst, it throws away parse cache entries that could have been reused, but they're reconstructable, and we don't have any code paths that would benefit from keeping the cache entries around.

That was true, but the implementation created an asymmetry when cloning a snapshot that's already been disposed:

Request A asks for completions, obtains the current snapshot, and discovers that it needs to clone it to add auto-imports.
Request B meanwhile processes a document change and produces a new snapshot, calling Dispose() on the previous one, still held by Request A. The disposal drops the ref count for the old program to zero, which triggers the release of some files from the parse cache.
Request A clones the disposed snapshot to add auto-import info to it. No programs are updated, so parse cache entries are not incremented, but the old program still in the snapshot gets reffed in the program counter, recreating a new entry with refCount == 1 for it, an operation that can normally only happen in conjunction with reffing all its files in the parse cache.
Request A finishes, and since the session state has already moved on, the snapshot with added auto-imports gets disposed, not adopted by the session. Disposal derefs the program and its ref count reaches zero for the second time, triggering an extra, unbalanced deref of all its source files.
Request C processes a single-file change. Each file in the new program is reffed, but any that reached a ref count of zero in the previous step are missing, triggering the panic.

The fix: for newly created programs, unconditionally add them to the program counter. Otherwise, increment their ref count only if they still have an entry in the counter. If the entry is gone, they've been previously disposed, and re-adding them would cause an eventual double-dispose. 07af679

Alternatives considered: many. Most tempting is to delete the program counter entirely and unconditionally ref every source file across every program in every snapshot clone exactly once per occurrence, and do the same exact loop in snapshot disposal. This was in an earlier commit of #3062, but I benchmarked it as adding about 10ms per 100k files. 100k files is a lot and 10ms is a little, but at the same time, snapshot clones are very hot (typically one per keystroke plus change), so at the time I decided it was worth the added mental load to keep it as cheap as possible. I could now be convinced to take on that cost, but, I think the fix here is correct, and actually not too much code. But I am getting tired of this stuff being a problem, so simplifying is tempting.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

andrewbranch · 2026-03-20T19:45:08Z

internal/project/refcountcache.go

 	mu       sync.Mutex
 	value    V
 	refCount int
-	deleted  bool


No longer needed now that new entries always start with refCount == 1, at least that's nice

Copilot

Pull request overview

This PR addresses cache lifetime/ref-counting races in the project system (parse cache + extended tsconfig cache) that can lead to premature cache entry deletion and panics during concurrent snapshot cloning and auto-import preparation.

Changes:

Restore “acquire-on-load” semantics for the parse cache so new source file cache entries start with refCount == 1, and adjust program cloning to ref unchanged files when UpdateProgram reuses them.
Replace extended tsconfig ref-counting with snapshot-ID ownership tracking (OwnerCache) and update clone/dispose paths accordingly.
Add/adjust tests covering cloning disposed snapshots and correct cache entry release behavior.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
internal/project/snapshot.go	Updates program counter semantics for reused vs new programs; switches extended-config tracking to snapshot ownership acquire/release.
internal/project/session.go	Updates cache stats logging to reflect `OwnerCache` entry type.
internal/project/refcountcache.go	Reintroduces `Acquire` semantics for ref-counted cache entries (used by parse cache).
internal/project/parsecache.go	Updates parse cache to use the new `RefCountCache.Acquire`-only API.
internal/project/compilerhost.go	Ensures program construction acquires parse cache entries immediately via `Acquire`.
internal/project/project.go	When `UpdateProgram` clones, refs unchanged files since only the changed file is acquired during update.
internal/project/programcounter.go	Splits “new program” (`Add`) vs “ref existing only” (`Ref` returning bool) to avoid double-dispose patterns.
internal/project/autoimport.go	Uses `Acquire` instead of `Load`+`Ref` for auto-import registry source file access and balances via `Deref` on dispose.
internal/project/ownercache.go	Introduces `OwnerCache` for snapshot ownership tracking (used by extended config cache).
internal/project/extendedconfigcache.go	Switches extended config caching from ref-counting to `OwnerCache` with expiration logic.
internal/project/configfileregistrybuilder.go	Acquires extended configs using the current snapshot ID during config parsing.
internal/project/projectcollectionbuilder.go	Passes snapshot ID into config file registry builder to enable per-snapshot ownership.
internal/project/refcountcache_test.go	Expands tests for parse cache lifetime and cloning disposed snapshots.
internal/project/extendedconfigcache_test.go	Updates test expectations from ref counts to owner counts and adds ownership scenarios.

internal/project/snapshot.go

internal/project/ownercache.go

jakebailey

I definitely like the invariant of ensuring we start with refcount at 1

jakebailey · 2026-03-20T20:10:58Z

Then again, the copilot comments do sound plausible

…ethod

andrewbranch · 2026-03-20T20:27:02Z

Yes, they're correct. The last commit is a solution but I don't like it. It's kind of pointing towards "no, snapshots really should have a ref count," but we saw in #2905 that it's pretty confusing trying to track where to release snapshots since they're sometimes used explicitly and sometimes used implicitly by the language service. Language service objects would have be disposed too, or always come with a snapshot release func, or be used inside a closure, or whatever. It's not impossible to implement carefully and correctly... maybe that's better? Ugh

gabritto · 2026-03-20T20:28:32Z

internal/project/ownercache.go

+type OwnerCacheOptions struct {
+	DisableDeletion bool
+}
+


Can we add documentation on what this is for?

It wasn't used for anything, it was just copied from the ref count cache without its doc comment. Deleted.

I meant on owner cache in general: what an owner cache is for. Also, I think we could probably merge the parse cache and this if we abstract the whole owner/ref counting thing into an interface or something, but not sure if it's worth it instead of maybe just adding a comment saying if you update one, you should update the other.

Yeah, I thought about sharing some code with an interface, but didn't want to introduce any unnecessary overhead into the parse cache. I can add a doc comment.

jakebailey · 2026-03-20T20:35:52Z

Language service objects would have be disposed too, or always come with a snapshot release func, or be used inside a closure, or whatever. It's not impossible to implement carefully and correctly... maybe that's better? Ugh

That was where my PR was headed; I just never committed the wrapper closure code since we opted to not go that way. I don't really mind so long as things are working 😄

gabritto · 2026-03-20T20:43:53Z

General question: I thought the reason the parse cache didn't make entries start with ref count 1 was that we could somehow parse a file that was temporary and didn't make it into a program. That's not possible, is it?

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

andrewbranch · 2026-03-20T21:19:49Z

@gabritto not exactly, you might be thinking of the awkwardness around updating a program:

if multiple files have changed, we don't try to do a cheap clone and every file in the new program gets acquired once
if one file has changed, UpdateProgram acquires it to determine if it allows for a cheap clone, then:
- if cloning is allowed, all other source files get copied in without being acquired, so we need to ref every file except that changed one afterwards
- if cloning is not allowed, all files including the changed one get acquired again, resulting in the changed one being reffed twice, so we need to deref it afterwards. This was missing until the last commit, including prior to Fix extended config ref counting #2483—it was an undiscovered memory leak. That's a great example of why building programs without reffing, and then just reffing whatever ends up in the program, is easier 😞

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

internal/project/compilerhost.go

gabritto · 2026-03-20T22:16:59Z

@gabritto not exactly, you might be thinking of the awkwardness around updating a program:

if multiple files have changed, we don't try to do a cheap clone and every file in the new program gets acquired once

if one file has changed, UpdateProgram acquires it to determine if it allows for a cheap clone, then:

if cloning is allowed, all other source files get copied in without being acquired, so we need to ref every file except that changed one afterwards

if cloning is not allowed, all files including the changed one get acquired again, resulting in the changed one being reffed twice, so we need to deref it afterwards. This was missing until the last commit, including prior to Fix extended config ref counting #2483—it was an undiscovered memory leak. That's a great example of why building programs without reffing, and then just reffing whatever ends up in the program, is easier 😞

Ah yes, I think that was it. Thanks for the explanation. Seems there's no ideal solution 😅

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…Files

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated no new comments.

andrewbranch · 2026-03-20T23:39:30Z

I hate it all, but I don't have better ideas. We might consider going back to snapshot ref counts, but that would only eliminate the program counter's Add/Ref split and the extended config cache's TryAcquire. The parse cache complications would remain.

jakebailey · 2026-03-21T00:11:09Z

internal/compiler/filesparser.go


 	var missingFiles []string
 	files := make([]*ast.SourceFile, 0, totalFileCount-libFileCount)
+	duplicateSourceFiles := make([]*ast.SourceFile, 0)


Might as well just use var duplicateSourceFiles []*ast.SourceFile, otherwise it allocates an empty slice that's immediately overwritten on append.

andrewbranch and others added 2 commits March 20, 2026 09:07

Use owner cache and refcount=1

55c29ba

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Prevent stale clone program resurrection

07af679

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

andrewbranch changed the title ~~Copilot/general herring~~ Fix ref count cache inconsistencies Mar 20, 2026

andrewbranch added 2 commits March 20, 2026 12:36

isExpired never used for RefCountCache

458d63b

AcquireWithStatus never used

293e49e

andrewbranch commented Mar 20, 2026

View reviewed changes

Comment up program counter usage

790f0cd

andrewbranch marked this pull request as ready for review March 20, 2026 19:50

Copilot AI review requested due to automatic review settings March 20, 2026 19:50

Copilot started reviewing on behalf of andrewbranch March 20, 2026 19:51 View session

andrewbranch requested review from gabritto and jakebailey March 20, 2026 19:52

Lint

05a6c03

Copilot AI reviewed Mar 20, 2026

View reviewed changes

internal/project/snapshot.go Show resolved Hide resolved

internal/project/ownercache.go Show resolved Hide resolved

jakebailey approved these changes Mar 20, 2026

View reviewed changes

andrewbranch added 2 commits March 20, 2026 13:12

Add test for stale clone handling in extended config cache

326e86c

Implement TryAcquire method in OwnerCache and update usage in Clone m…

ab69701

…ethod

gabritto reviewed Mar 20, 2026

View reviewed changes

Delete unused options

2b7495d

Refine cache ownership handling

e1906c4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

andrewbranch requested a review from Copilot March 20, 2026 21:22

Copilot started reviewing on behalf of andrewbranch March 20, 2026 21:23 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

internal/project/compilerhost.go Show resolved Hide resolved

Add doc comment for OwnerCache

3b7fe0f

andrewbranch and others added 3 commits March 20, 2026 15:41

Ensure Program only acquires files once per tspath.Path

3c07d9f

Format

301f7ab

Track duplicate source files in Program

e8d1785

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

andrewbranch requested a review from Copilot March 20, 2026 23:31

Copilot started reviewing on behalf of andrewbranch March 20, 2026 23:32 View session

Add comment to clarify package deduplication behavior in getProcessed…

993bf26

…Files

Copilot AI reviewed Mar 20, 2026

View reviewed changes

jakebailey reviewed Mar 21, 2026

View reviewed changes

Conversation

andrewbranch commented Mar 20, 2026

Bug 1: building auto-imports while building a program

Bug 2: building auto-imports by cloning a disposed snapshot

Uh oh!

andrewbranch Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

jakebailey left a comment

Choose a reason for hiding this comment

Uh oh!

jakebailey commented Mar 20, 2026

Uh oh!

andrewbranch commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gabritto Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

andrewbranch Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

gabritto Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

andrewbranch Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

jakebailey commented Mar 20, 2026

Uh oh!

gabritto commented Mar 20, 2026

Uh oh!

andrewbranch commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

gabritto commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

andrewbranch commented Mar 20, 2026

Uh oh!

jakebailey Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andrewbranch commented Mar 20, 2026 •

edited

Loading