Skip to content

[copilot-finds] Bug: gRPC stream error handler does not recover, causing silent worker hang #167

@github-actions

Description

@github-actions

Problem

The stream.on("error", ...) handler in TaskHubGrpcWorker.internalRunWorker() (packages/durabletask-js/src/worker/task-hub-grpc-worker.ts, line 384) only logs the error but does not clean up the stream or attempt to reconnect:

stream.on("error", (err: Error) => {
  if (this._stopWorker) {
    return;
  }
  WorkerLogs.streamErrorInfo(this._logger, err);
  // ← No cleanup, no retry
});

In contrast, the adjacent stream.on("end", ...) handler (line 370) correctly calls removeAllListeners(), destroy(), and _createNewClientAndRetry().

Root Cause

In Node.js with @grpc/grpc-js, gRPC stream errors — especially transport-level failures like UNAVAILABLE or abrupt network disconnections — may emit an "error" event without a subsequent "end" event. When this happens, the worker logs the error and then silently stops processing work items forever, because no recovery path is triggered.

The stop() method (line 424) already accounts for this asymmetry by listening for "end", "close", or "error" — confirming the developers know that "error" can fire alone.

Proposed Fix

Add stream cleanup and retry logic to the error handler, mirroring the "end" handler pattern:

  1. Call stream.removeAllListeners() to prevent double-recovery if both events fire
  2. Add a no-op stream.on("error", () => {}) guard to prevent unhandled error crashes from stale events
  3. Call stream.destroy() to clean up the stream
  4. Call _createNewClientAndRetry() to reconnect

Impact

Severity: High — Affected workers silently stop processing orchestration/activity/entity work items after a transport-level gRPC error. This can happen in production when network connectivity is temporarily lost, load balancers reset connections, or the sidecar restarts. The worker appears healthy (no crash, no error logged at error level) but is effectively dead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    copilot-findsFindings from daily automated code review agent

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions