[copilot-finds] Bug: gRPC stream error handler does not recover, causing silent worker hang

## Problem

The `stream.on("error", ...)` handler in `TaskHubGrpcWorker.internalRunWorker()` (`packages/durabletask-js/src/worker/task-hub-grpc-worker.ts`, line 384) only logs the error but does **not** clean up the stream or attempt to reconnect:

```typescript
stream.on("error", (err: Error) => {
  if (this._stopWorker) {
    return;
  }
  WorkerLogs.streamErrorInfo(this._logger, err);
  // ← No cleanup, no retry
});
```

In contrast, the adjacent `stream.on("end", ...)` handler (line 370) correctly calls `removeAllListeners()`, `destroy()`, and `_createNewClientAndRetry()`.

## Root Cause

In Node.js with `@grpc/grpc-js`, gRPC stream errors — especially transport-level failures like `UNAVAILABLE` or abrupt network disconnections — may emit an `"error"` event **without** a subsequent `"end"` event. When this happens, the worker logs the error and then silently stops processing work items forever, because no recovery path is triggered.

The `stop()` method (line 424) already accounts for this asymmetry by listening for `"end"`, `"close"`, **or** `"error"` — confirming the developers know that `"error"` can fire alone.

## Proposed Fix

Add stream cleanup and retry logic to the error handler, mirroring the `"end"` handler pattern:
1. Call `stream.removeAllListeners()` to prevent double-recovery if both events fire
2. Add a no-op `stream.on("error", () => {})` guard to prevent unhandled error crashes from stale events
3. Call `stream.destroy()` to clean up the stream
4. Call `_createNewClientAndRetry()` to reconnect

## Impact

**Severity: High** — Affected workers silently stop processing orchestration/activity/entity work items after a transport-level gRPC error. This can happen in production when network connectivity is temporarily lost, load balancers reset connections, or the sidecar restarts. The worker appears healthy (no crash, no error logged at error level) but is effectively dead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[copilot-finds] Bug: gRPC stream error handler does not recover, causing silent worker hang #167

Problem

Root Cause

Proposed Fix

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[copilot-finds] Bug: gRPC stream error handler does not recover, causing silent worker hang #167

Description

Problem

Root Cause

Proposed Fix

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions