-
Notifications
You must be signed in to change notification settings - Fork 656
Stalled REST fallback streaming reads can leave a Firestore ClientPool client checked out indefinitely, and later reads keep reacquiring it #7960
Description
Summary
This is a separate but related Firestore-side consequence of the REST fallback transport defect reported in #7959.
We are not claiming Firestore causes the original transport stall. We are claiming Firestore ClientPool behavior amplifies the impact of a single stuck REST server-streaming read:
- one stuck REST fallback read can keep a client permanently active
- later reads continue reacquiring that same client
- readiness and low-concurrency reads can remain poisoned until process restart
The production symptom for us was that one instance became persistently unhealthy on readiness while a fresh process on the same host was healthy.
What we confirmed
In traced soak logs we observed:
Healthy path:
ACQUIRE -> SEND -> STUB_CALL -> RESPONSE -> STREAM_END -> RELEASE
Poisoned path:
ACQUIRE -> SEND -> STUB_CALL- then no
RESPONSE, noSTREAM_END, noRELEASE
Later reads continued to reacquire the same client while its active count rose.
Our local repro also showed Firestore ClientPool preferring the most-full existing client, which lets a single poisoned client keep attracting future reads.
Scope
This report is specifically about Firestore pool reuse / stale-client behavior after a stalled REST fallback streaming read.
It is not a claim that Firestore causes the original transport defect. That transport defect is tracked separately in #7959.
Why this matters
Even if the original network stall is transient, the process can remain operationally poisoned for much longer because later reads keep landing on the same stuck client.
Requested maintainer feedback
- Should
ClientPoolavoid reacquiring clients with very old active streaming requests? - Is there any current or planned client eviction / stale-stream detection for this case?
- Would maintainers prefer this tracked separately, or folded into an existing internal/public Firestore thread?