Skip to content

fix: Sub-orchestration watcher uses unintended 30s default timeout in InMemoryOrchestrationBackend#156

Open
YunchuWang wants to merge 1 commit intomainfrom
copilot-finds/bug/fix-sub-orchestration-watcher-timeout
Open

fix: Sub-orchestration watcher uses unintended 30s default timeout in InMemoryOrchestrationBackend#156
YunchuWang wants to merge 1 commit intomainfrom
copilot-finds/bug/fix-sub-orchestration-watcher-timeout

Conversation

@YunchuWang
Copy link
Member

Summary

Fixes #149

Bug: watchSubOrchestration in InMemoryOrchestrationBackend intends to wait indefinitely for a sub-orchestration to complete (as documented by the inline comment "No timeout"), but no timeoutMs argument is passed to waitForState, so the default 30-second timeout applies. After 30 seconds, the timeout error is silently swallowed by the .catch(() => {}) handler, and the completion/failure event is never delivered to the parent orchestration.

Changes

  • packages/durabletask-js/src/testing/in-memory-backend.ts — Modified waitForState to support no-timeout mode when timeoutMs is 0 (skips creating the setTimeout timer); updated watchSubOrchestration to pass 0 to match the documented intent
  • packages/durabletask-js/test/in-memory-backend.spec.ts — Added tests verifying sub-orchestrations with timer delays and failure propagation work correctly without the 30s timeout

Testing

All tests pass. Lint clean.

… InMemoryOrchestrationBackend

The watchSubOrchestration method in InMemoryOrchestrationBackend intended to wait
indefinitely for sub-orchestrations to complete (as documented by the inline comment
'No timeout'), but actually used the default 30-second timeout from waitForState.
After 30 seconds, the watcher silently dropped the completion event via .catch(),
causing the parent orchestration to hang indefinitely.

Fix:
- Modified waitForState to support no-timeout mode when timeoutMs is 0
- Updated watchSubOrchestration to pass timeoutMs=0 matching the documented intent
- Added tests for sub-orchestrations with timer delays and failure propagation
- Added tests for waitForState zero-timeout behavior and cleanup on reset

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 10, 2026 16:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an in-memory testing backend bug where sub-orchestration completion watching unintentionally timed out after 30 seconds (and silently swallowed the timeout), preventing the parent orchestration from receiving the child completion/failure event.

Changes:

  • Add “no-timeout” support to InMemoryOrchestrationBackend.waitForState when timeoutMs === 0 (skip setTimeout creation).
  • Update sub-orchestration watcher to pass 0 to waitForState to truly wait indefinitely.
  • Add unit tests covering sub-orchestration timer delays/failure propagation and waitForState(..., 0) behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
packages/durabletask-js/src/testing/in-memory-backend.ts Implements no-timeout mode for state waiting and applies it to sub-orchestration watching.
packages/durabletask-js/test/in-memory-backend.spec.ts Adds regression/behavior tests around sub-orchestrations and zero-timeout waiting.

Comment on lines 595 to 599
this.waitForState(
subInstanceId,
(inst) => this.isTerminalStatus(inst.status),
// No timeout - sub-orchestration will eventually complete, fail, or be terminated
// If parent is terminated, we check that when delivering the event
0, // No timeout — sub-orchestration will eventually complete, fail, or be terminated
)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that waitForState(..., 0) can’t time out, the .catch comment below that mentions “Timeout” is misleading. Consider updating the comment to reflect the only expected cancellation path here (backend reset), or explicitly note any other errors you intend to swallow.

Copilot uses AI. Check for mistakes.
Comment on lines +344 to +352
timer = setTimeout(() => {
const waiters = this.stateWaiters.get(instanceId);
if (waiters) {
const index = waiters.findIndex((w) => w.resolve === resolve);
if (index >= 0) {
waiters.splice(index, 1);
}
}
}
reject(new Error(`Timeout waiting for orchestration '${instanceId}'`));
}, timeoutMs);
reject(new Error(`Timeout waiting for orchestration '${instanceId}'`));
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout cleanup logic won’t remove the correct waiter: waiters.findIndex((w) => w.resolve === resolve) compares the waiter wrapper function to the Promise’s resolve, so the index will be -1. This leaves timed-out waiters in stateWaiters, which can cause memory leaks and later resolve() calls on an already-settled promise. Consider tracking/removing the waiter by reference (e.g., capture waiter in the timeout callback and findIndex((w) => w === waiter)) and rejecting via waiter.reject(...) so cleanup is centralized.

Copilot uses AI. Check for mistakes.
Comment on lines +340 to +342
// When timeoutMs is 0, no timeout is applied — the waiter will only be
// resolved by a matching state change or rejected by reset().
let timer: ReturnType<typeof setTimeout> | undefined;
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeoutMs=0 now means “no timeout”. Since this is a non-obvious sentinel value (and differs from common “0 means immediate timeout” expectations), it would be good to reflect it in the waitForState JSDoc/parameter description so callers don’t accidentally create an unbounded wait.

Copilot uses AI. Check for mistakes.
Comment on lines +150 to +170
const childWithTimer: TOrchestrator = async function* (ctx: OrchestrationContext): any {
// Sub-orchestration uses a short timer before returning a result
yield ctx.createTimer(0.1);
return "child-done";
};

const parentOrchestrator: TOrchestrator = async function* (ctx: OrchestrationContext): any {
const result = yield ctx.callSubOrchestrator(childWithTimer);
return `parent-received-${result}`;
};

worker.addOrchestrator(childWithTimer);
worker.addOrchestrator(parentOrchestrator);
await worker.start();

const id = await client.scheduleNewOrchestration(parentOrchestrator);
const state = await client.waitForOrchestrationCompletion(id, true, 10);

expect(state).toBeDefined();
expect(state?.runtimeStatus).toEqual(OrchestrationStatus.COMPLETED);
expect(state?.serializedOutput).toEqual(JSON.stringify("parent-received-child-done"));
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn’t exercise the original regression (the unintended 30s timeout), because the child timer is only 0.1s. Consider using Jest fake timers to advance past 30s (e.g., a child createTimer(31)), and assert the parent still receives the completion/failure event—this would fail on the pre-fix behavior but run quickly.

Suggested change
const childWithTimer: TOrchestrator = async function* (ctx: OrchestrationContext): any {
// Sub-orchestration uses a short timer before returning a result
yield ctx.createTimer(0.1);
return "child-done";
};
const parentOrchestrator: TOrchestrator = async function* (ctx: OrchestrationContext): any {
const result = yield ctx.callSubOrchestrator(childWithTimer);
return `parent-received-${result}`;
};
worker.addOrchestrator(childWithTimer);
worker.addOrchestrator(parentOrchestrator);
await worker.start();
const id = await client.scheduleNewOrchestration(parentOrchestrator);
const state = await client.waitForOrchestrationCompletion(id, true, 10);
expect(state).toBeDefined();
expect(state?.runtimeStatus).toEqual(OrchestrationStatus.COMPLETED);
expect(state?.serializedOutput).toEqual(JSON.stringify("parent-received-child-done"));
jest.useFakeTimers();
try {
const childWithTimer: TOrchestrator = async function* (ctx: OrchestrationContext): any {
// Sub-orchestration uses a timer longer than 30s before returning a result
yield ctx.createTimer(31);
return "child-done";
};
const parentOrchestrator: TOrchestrator = async function* (ctx: OrchestrationContext): any {
const result = yield ctx.callSubOrchestrator(childWithTimer);
return `parent-received-${result}`;
};
worker.addOrchestrator(childWithTimer);
worker.addOrchestrator(parentOrchestrator);
await worker.start();
const id = await client.scheduleNewOrchestration(parentOrchestrator);
const waitPromise = client.waitForOrchestrationCompletion(id, true, 40);
// Advance fake timers past the 31s child timer to ensure the parent observes completion
jest.advanceTimersByTime(31_000);
const state = await waitPromise;
expect(state).toBeDefined();
expect(state?.runtimeStatus).toEqual(OrchestrationStatus.COMPLETED);
expect(state?.serializedOutput).toEqual(JSON.stringify("parent-received-child-done"));
} finally {
jest.useRealTimers();
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[copilot-finds] Bug: Sub-orchestration watcher in InMemoryOrchestrationBackend uses unintended 30s default timeout

2 participants