Skip to content

Stalled REST fallback streaming reads can leave a Firestore ClientPool client checked out indefinitely, and later reads keep reacquiring it #7960

@akshatbaranwal

Description

@akshatbaranwal

Summary

This is a separate but related Firestore-side consequence of the REST fallback transport defect reported in #7959.

We are not claiming Firestore causes the original transport stall. We are claiming Firestore ClientPool behavior amplifies the impact of a single stuck REST server-streaming read:

  • one stuck REST fallback read can keep a client permanently active
  • later reads continue reacquiring that same client
  • readiness and low-concurrency reads can remain poisoned until process restart

The production symptom for us was that one instance became persistently unhealthy on readiness while a fresh process on the same host was healthy.

What we confirmed

In traced soak logs we observed:

Healthy path:

  • ACQUIRE -> SEND -> STUB_CALL -> RESPONSE -> STREAM_END -> RELEASE

Poisoned path:

  • ACQUIRE -> SEND -> STUB_CALL
  • then no RESPONSE, no STREAM_END, no RELEASE

Later reads continued to reacquire the same client while its active count rose.

Our local repro also showed Firestore ClientPool preferring the most-full existing client, which lets a single poisoned client keep attracting future reads.

Scope

This report is specifically about Firestore pool reuse / stale-client behavior after a stalled REST fallback streaming read.

It is not a claim that Firestore causes the original transport defect. That transport defect is tracked separately in #7959.

Why this matters

Even if the original network stall is transient, the process can remain operationally poisoned for much longer because later reads keep landing on the same stuck client.

Requested maintainer feedback

  1. Should ClientPool avoid reacquiring clients with very old active streaming requests?
  2. Is there any current or planned client eviction / stale-stream detection for this case?
  3. Would maintainers prefer this tracked separately, or folded into an existing internal/public Firestore thread?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions