fix: Sub-orchestration watcher uses unintended 30s default timeout in InMemoryOrchestrationBackend#156
Conversation
… InMemoryOrchestrationBackend The watchSubOrchestration method in InMemoryOrchestrationBackend intended to wait indefinitely for sub-orchestrations to complete (as documented by the inline comment 'No timeout'), but actually used the default 30-second timeout from waitForState. After 30 seconds, the watcher silently dropped the completion event via .catch(), causing the parent orchestration to hang indefinitely. Fix: - Modified waitForState to support no-timeout mode when timeoutMs is 0 - Updated watchSubOrchestration to pass timeoutMs=0 matching the documented intent - Added tests for sub-orchestrations with timer delays and failure propagation - Added tests for waitForState zero-timeout behavior and cleanup on reset Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes an in-memory testing backend bug where sub-orchestration completion watching unintentionally timed out after 30 seconds (and silently swallowed the timeout), preventing the parent orchestration from receiving the child completion/failure event.
Changes:
- Add “no-timeout” support to
InMemoryOrchestrationBackend.waitForStatewhentimeoutMs === 0(skipsetTimeoutcreation). - Update sub-orchestration watcher to pass
0towaitForStateto truly wait indefinitely. - Add unit tests covering sub-orchestration timer delays/failure propagation and
waitForState(..., 0)behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| packages/durabletask-js/src/testing/in-memory-backend.ts | Implements no-timeout mode for state waiting and applies it to sub-orchestration watching. |
| packages/durabletask-js/test/in-memory-backend.spec.ts | Adds regression/behavior tests around sub-orchestrations and zero-timeout waiting. |
| this.waitForState( | ||
| subInstanceId, | ||
| (inst) => this.isTerminalStatus(inst.status), | ||
| // No timeout - sub-orchestration will eventually complete, fail, or be terminated | ||
| // If parent is terminated, we check that when delivering the event | ||
| 0, // No timeout — sub-orchestration will eventually complete, fail, or be terminated | ||
| ) |
There was a problem hiding this comment.
Now that waitForState(..., 0) can’t time out, the .catch comment below that mentions “Timeout” is misleading. Consider updating the comment to reflect the only expected cancellation path here (backend reset), or explicitly note any other errors you intend to swallow.
| timer = setTimeout(() => { | ||
| const waiters = this.stateWaiters.get(instanceId); | ||
| if (waiters) { | ||
| const index = waiters.findIndex((w) => w.resolve === resolve); | ||
| if (index >= 0) { | ||
| waiters.splice(index, 1); | ||
| } | ||
| } | ||
| } | ||
| reject(new Error(`Timeout waiting for orchestration '${instanceId}'`)); | ||
| }, timeoutMs); | ||
| reject(new Error(`Timeout waiting for orchestration '${instanceId}'`)); |
There was a problem hiding this comment.
The timeout cleanup logic won’t remove the correct waiter: waiters.findIndex((w) => w.resolve === resolve) compares the waiter wrapper function to the Promise’s resolve, so the index will be -1. This leaves timed-out waiters in stateWaiters, which can cause memory leaks and later resolve() calls on an already-settled promise. Consider tracking/removing the waiter by reference (e.g., capture waiter in the timeout callback and findIndex((w) => w === waiter)) and rejecting via waiter.reject(...) so cleanup is centralized.
| // When timeoutMs is 0, no timeout is applied — the waiter will only be | ||
| // resolved by a matching state change or rejected by reset(). | ||
| let timer: ReturnType<typeof setTimeout> | undefined; |
There was a problem hiding this comment.
timeoutMs=0 now means “no timeout”. Since this is a non-obvious sentinel value (and differs from common “0 means immediate timeout” expectations), it would be good to reflect it in the waitForState JSDoc/parameter description so callers don’t accidentally create an unbounded wait.
| const childWithTimer: TOrchestrator = async function* (ctx: OrchestrationContext): any { | ||
| // Sub-orchestration uses a short timer before returning a result | ||
| yield ctx.createTimer(0.1); | ||
| return "child-done"; | ||
| }; | ||
|
|
||
| const parentOrchestrator: TOrchestrator = async function* (ctx: OrchestrationContext): any { | ||
| const result = yield ctx.callSubOrchestrator(childWithTimer); | ||
| return `parent-received-${result}`; | ||
| }; | ||
|
|
||
| worker.addOrchestrator(childWithTimer); | ||
| worker.addOrchestrator(parentOrchestrator); | ||
| await worker.start(); | ||
|
|
||
| const id = await client.scheduleNewOrchestration(parentOrchestrator); | ||
| const state = await client.waitForOrchestrationCompletion(id, true, 10); | ||
|
|
||
| expect(state).toBeDefined(); | ||
| expect(state?.runtimeStatus).toEqual(OrchestrationStatus.COMPLETED); | ||
| expect(state?.serializedOutput).toEqual(JSON.stringify("parent-received-child-done")); |
There was a problem hiding this comment.
This test doesn’t exercise the original regression (the unintended 30s timeout), because the child timer is only 0.1s. Consider using Jest fake timers to advance past 30s (e.g., a child createTimer(31)), and assert the parent still receives the completion/failure event—this would fail on the pre-fix behavior but run quickly.
| const childWithTimer: TOrchestrator = async function* (ctx: OrchestrationContext): any { | |
| // Sub-orchestration uses a short timer before returning a result | |
| yield ctx.createTimer(0.1); | |
| return "child-done"; | |
| }; | |
| const parentOrchestrator: TOrchestrator = async function* (ctx: OrchestrationContext): any { | |
| const result = yield ctx.callSubOrchestrator(childWithTimer); | |
| return `parent-received-${result}`; | |
| }; | |
| worker.addOrchestrator(childWithTimer); | |
| worker.addOrchestrator(parentOrchestrator); | |
| await worker.start(); | |
| const id = await client.scheduleNewOrchestration(parentOrchestrator); | |
| const state = await client.waitForOrchestrationCompletion(id, true, 10); | |
| expect(state).toBeDefined(); | |
| expect(state?.runtimeStatus).toEqual(OrchestrationStatus.COMPLETED); | |
| expect(state?.serializedOutput).toEqual(JSON.stringify("parent-received-child-done")); | |
| jest.useFakeTimers(); | |
| try { | |
| const childWithTimer: TOrchestrator = async function* (ctx: OrchestrationContext): any { | |
| // Sub-orchestration uses a timer longer than 30s before returning a result | |
| yield ctx.createTimer(31); | |
| return "child-done"; | |
| }; | |
| const parentOrchestrator: TOrchestrator = async function* (ctx: OrchestrationContext): any { | |
| const result = yield ctx.callSubOrchestrator(childWithTimer); | |
| return `parent-received-${result}`; | |
| }; | |
| worker.addOrchestrator(childWithTimer); | |
| worker.addOrchestrator(parentOrchestrator); | |
| await worker.start(); | |
| const id = await client.scheduleNewOrchestration(parentOrchestrator); | |
| const waitPromise = client.waitForOrchestrationCompletion(id, true, 40); | |
| // Advance fake timers past the 31s child timer to ensure the parent observes completion | |
| jest.advanceTimersByTime(31_000); | |
| const state = await waitPromise; | |
| expect(state).toBeDefined(); | |
| expect(state?.runtimeStatus).toEqual(OrchestrationStatus.COMPLETED); | |
| expect(state?.serializedOutput).toEqual(JSON.stringify("parent-received-child-done")); | |
| } finally { | |
| jest.useRealTimers(); | |
| } |
Summary
Fixes #149
Bug:
watchSubOrchestrationinInMemoryOrchestrationBackendintends to wait indefinitely for a sub-orchestration to complete (as documented by the inline comment "No timeout"), but notimeoutMsargument is passed towaitForState, so the default 30-second timeout applies. After 30 seconds, the timeout error is silently swallowed by the.catch(() => {})handler, and the completion/failure event is never delivered to the parent orchestration.Changes
packages/durabletask-js/src/testing/in-memory-backend.ts— ModifiedwaitForStateto support no-timeout mode whentimeoutMsis0(skips creating thesetTimeouttimer); updatedwatchSubOrchestrationto pass0to match the documented intentpackages/durabletask-js/test/in-memory-backend.spec.ts— Added tests verifying sub-orchestrations with timer delays and failure propagation work correctly without the 30s timeoutTesting
All tests pass. Lint clean.