-
Notifications
You must be signed in to change notification settings - Fork 2.9k
fix: harden delegation lifecycle against race conditions with per-task metadata, mutual-exclusion guards, and multi-layer failure recovery #11379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
All previously flagged issues are now resolved. No new issues found in the latest commits.
Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
daniel-lxs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is a well-reasoned fix. The root cause analysis is correct -- saveClineMessages rebuilding the full HistoryItem (including status) via taskMetadata() was clobbering delegation state transitions. The five fixes address distinct failure modes coherently.
A few observations:
- The double-write pattern (globalState + per-task file) could diverge, but since
getTaskWithIdalways merges the file on top, it's self-healing on the next read. Acceptable tradeoff. - The early clear of
delegationInProgressinreopenParentFromDelegationbeforeresumeAfterDelegation()is a nice detail -- avoids deadlock when the resumed parent immediately re-delegates. A brief inline comment explaining why it's not just relying on thefinallywould help future readers (the existing comment is good but could be slightly more explicit about thefinallyalso setting it). - Note: PR has merge conflicts to resolve.
- Expand inline comment in reopenParentFromDelegation explaining why delegationInProgress is cleared before resume and how the finally block acts as a safety net on error paths. - Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>) so DELEGATION_META_KEYS cannot silently drift from the interface. - Remove unused initialStatus from CreateTaskOptions (packages/types). The field was never consumed; delegation status is persisted via saveDelegationMeta / updateTaskHistory instead.
Two issues found in the current revision.
Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
hannesrudolph
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review: fix/delegation-race-conditions
Solid work addressing the core race conditions in the delegation system. The delegationInProgress mutex, per-task delegation metadata files, retry-once pattern in AttemptCompletionTool, and the pWaitFor timeout are all well-motivated improvements.
Summary of findings
| Priority | Count | Description |
|---|---|---|
| P1 | 1 | Unsafe partial HistoryItem cast |
| P2 | 3 | Value validation gap, silent metadata failure, missing test coverage |
| P3 | 1 | Minor redundancy |
All 7 previously-raised threads are resolved and not repeated here.
- Expand inline comment in reopenParentFromDelegation explaining why delegationInProgress is cleared before resume and how the finally block acts as a safety net on error paths. - Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>) so DELEGATION_META_KEYS cannot silently drift from the interface. - Remove unused initialStatus from CreateTaskOptions (packages/types). The field was never consumed; delegation status is persisted via saveDelegationMeta / updateTaskHistory instead.
1d050ae to
03d4b44
Compare
- Expand inline comment in reopenParentFromDelegation explaining why delegationInProgress is cleared before resume and how the finally block acts as a safety net on error paths. - Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>) so DELEGATION_META_KEYS cannot silently drift from the interface. - Remove unused initialStatus from CreateTaskOptions (packages/types). The field was never consumed; delegation status is persisted via saveDelegationMeta / updateTaskHistory instead.
59f7e67 to
3ac05aa
Compare
- Expand inline comment in reopenParentFromDelegation explaining why delegationInProgress is cleared before resume and how the finally block acts as a safety net on error paths. - Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>) so DELEGATION_META_KEYS cannot silently drift from the interface. - Remove unused initialStatus from CreateTaskOptions (packages/types). The field was never consumed; delegation status is persisted via saveDelegationMeta / updateTaskHistory instead.
98fc2e0 to
fac6e2b
Compare
- Expand inline comment in reopenParentFromDelegation explaining why delegationInProgress is cleared before resume and how the finally block acts as a safety net on error paths. - Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>) so DELEGATION_META_KEYS cannot silently drift from the interface. - Remove unused initialStatus from CreateTaskOptions (packages/types). The field was never consumed; delegation status is persisted via saveDelegationMeta / updateTaskHistory instead.
67e1d8a to
ac12404
Compare
- Expand inline comment in reopenParentFromDelegation explaining why delegationInProgress is cleared before resume and how the finally block acts as a safety net on error paths. - Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>) so DELEGATION_META_KEYS cannot silently drift from the interface. - Remove unused initialStatus from CreateTaskOptions (packages/types). The field was never consumed; delegation status is persisted via saveDelegationMeta / updateTaskHistory instead.
5701bd5 to
a223bd5
Compare
Comprehensive fix for race conditions and error handling gaps in the subtask
delegation system. Addresses multiple failure modes that could leave parent
tasks permanently stuck in 'delegated' status, causing nested subtasks to hang.
Key fixes:
- Remove initialStatus from taskMetadata rebuild (eliminates status overwrites)
- Persist delegation metadata to per-task files (resolves globalState eviction)
- Add delegationInProgress mutex guard (prevents concurrent delegation ops)
- TOCTOU race fixes with fresh re-reads before writes
- Abort-aware pWaitFor predicate (prevents false 60s timeout on user input)
- Remove silent .catch(() => {}) — all errors now logged unless task is aborting
- Single-attempt delegation with parent repair on failure (no retry band-aids)
- Cancel debouncedEmitTokenUsage in dispose() (prevents zombie callbacks)
- new_task isolation truncation for parallel tool calls
a223bd5 to
e01cd31
Compare
Summary
Fixes multiple race conditions and reliability issues in the parent↔child task delegation system by introducing per-task delegation metadata files, mutual-exclusion guards, explicit status transitions, and multi-layer failure recovery.
Changes
New Per-Task Delegation Metadata File (
delegation_metadata.json)A new persistence layer (
delegationMeta.ts) stores delegation state (status,awaitingChildId,delegatedToId,childIds, etc.) as a per-task JSON file on disk, complementing the monolithicglobalStatetaskHistory array. Provides:safeWriteJson(proper-lockfile) for cross-process safetynullfor old tasks without the file_delegationMetaKeyRecord) that forces a TypeScript error if the interface drifts from the key listDelegation Guard — Preventing Concurrent Delegations
A
delegationInProgressboolean flag onClineProvideracts as a mutex-like guard:delegateParentAndOpenChildandreopenParentFromDelegationboth check and set this flag, throwing if delegation is already in progressshowTaskWithId,deleteTaskWithId,createTask(non-child), andcancelTaskall early-return with a user message when delegation is in progressfinallyblocks to ensure cleanup on error pathsTask Creation Double-Click Guard
isTaskCreationInProgressflag prevents the user from creating duplicate tasks via rapid button clicks in the webview message handler.Removal of
initialStatus— Status Set Explicitly Post-CreationThe
initialStatusoption is removed fromCreateTaskOptions,TaskOptions, andtaskMetadata. Instead of baking status into the task at construction time, the child's"active"status is now explicitly persisted via a separateupdateTaskHistorycall aftercreateTaskreturns. This removes ambiguity about when status is first written and makes ordering explicit.Task.start()ReturnsPromise<void> | voidPreviously ran
startTaskas fire-and-forget. Now returns the promise directly, enabling callers to.catch(). IndelegateParentAndOpenChild, ifchild.start()fails, the parent's status is repaired back to"active"to prevent it from being stuck in"delegated"state.Unhandled Promise Rejection Prevention
Every call to
presentAssistantMessage(this)(~12 call sites inTask.tsandpresentAssistantMessage.ts) now has.catch()handlers that suppress errors whenthis.abortis true.Abandoned Task Early-Exit
presentAssistantMessagecheckscline.abandonedalongsidecline.abortat entry, and adds a mid-function early-exit check after tool execution. ThepWaitForinTask.tsnow includesthis.abortin its condition, preventing indefinite hangs when a task is aborted during stream processing.Delegation Failure Recovery in
AttemptCompletionTooldelegateToParent()is rewritten with comprehensive error recovery:reopenParentFromDelegationfails, it repairs the parent back to"active"statusNewTaskToolNo Longer Pushes a Tool ResultRemoves the orphaned
pushToolResult("Delegated to child task ...")call. The tool_result is instead injected byreopenParentFromDelegationwhen the child completes, fixing duplicate/orphaned tool results.Disk Fallback in
removeClineFromStackWhen the parent task is missing from globalState, reads the delegation metadata file and repairs the parent's on-disk metadata to
"active"if it was"delegated"withawaitingChildIdmatching the removed child.getTaskWithIdMerges Delegation Metadata from DiskReads the per-task
delegation_metadata.jsonand merges its fields into thehistoryItembefore returning, ensuring delegation state is always sourced from the per-task file (which may be more current than globalState after a race).Stale-Read Fix in
reopenParentFromDelegationThe parent's
historyItemis re-read just before updating it in step 5, fixing a TOCTOU bug where async operations between the initial read and the final write could make the data stale.Additional Fixes
abortTasknow cancels thedebouncedEmitTokenUsagetimer, preventing zombie callbacks after task disposalstartTaskverifies the task appears inglobalStateafter its firstsay()call; if missing, retries persistenceTest Coverage
New test files:
delegationMeta.spec.ts— read/write, filtering, corruption handlingattemptCompletionDelegation.spec.ts— retry, parent repair, disk fallbackremoveClineFromStack-delegation.spec.ts— disk fallback repairUpdated tests: Existing delegation tests updated to mock the new
delegationMetamodule and reflect the changed flow (no moreinitialStatus, no morepushToolResultfromNewTaskTool, updated call ordering expectations).Files Changed
packages/types/src/task.tssrc/core/task-persistence/delegationMeta.ts,src/core/task-persistence/index.tssrc/core/task/Task.ts,src/core/task-persistence/taskMetadata.tssrc/core/webview/ClineProvider.tssrc/core/tools/AttemptCompletionTool.ts,src/core/tools/NewTaskTool.tssrc/core/assistant-message/presentAssistantMessage.ts,src/core/webview/webviewMessageHandler.tssrc/shared/globalFileNames.ts