Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Feb 10, 2026

Summary

Fixes multiple race conditions and reliability issues in the parent↔child task delegation system by introducing per-task delegation metadata files, mutual-exclusion guards, explicit status transitions, and multi-layer failure recovery.

Changes

New Per-Task Delegation Metadata File (delegation_metadata.json)

A new persistence layer (delegationMeta.ts) stores delegation state (status, awaitingChildId, delegatedToId, childIds, etc.) as a per-task JSON file on disk, complementing the monolithic globalState taskHistory array. Provides:

  • Read/write functions using safeWriteJson (proper-lockfile) for cross-process safety
  • Input sanitization: only known keys are read/written, with type validation on each field
  • Backward compatibility: returns null for old tasks without the file
  • Compile-time safeguard (_delegationMetaKeyRecord) that forces a TypeScript error if the interface drifts from the key list

Delegation Guard — Preventing Concurrent Delegations

A delegationInProgress boolean flag on ClineProvider acts as a mutex-like guard:

  • delegateParentAndOpenChild and reopenParentFromDelegation both check and set this flag, throwing if delegation is already in progress
  • showTaskWithId, deleteTaskWithId, createTask (non-child), and cancelTask all early-return with a user message when delegation is in progress
  • Released in finally blocks to ensure cleanup on error paths

Task Creation Double-Click Guard

isTaskCreationInProgress flag prevents the user from creating duplicate tasks via rapid button clicks in the webview message handler.

Removal of initialStatus — Status Set Explicitly Post-Creation

The initialStatus option is removed from CreateTaskOptions, TaskOptions, and taskMetadata. Instead of baking status into the task at construction time, the child's "active" status is now explicitly persisted via a separate updateTaskHistory call after createTask returns. This removes ambiguity about when status is first written and makes ordering explicit.

Task.start() Returns Promise<void> | void

Previously ran startTask as fire-and-forget. Now returns the promise directly, enabling callers to .catch(). In delegateParentAndOpenChild, if child.start() fails, the parent's status is repaired back to "active" to prevent it from being stuck in "delegated" state.

Unhandled Promise Rejection Prevention

Every call to presentAssistantMessage(this) (~12 call sites in Task.ts and presentAssistantMessage.ts) now has .catch() handlers that suppress errors when this.abort is true.

Abandoned Task Early-Exit

presentAssistantMessage checks cline.abandoned alongside cline.abort at entry, and adds a mid-function early-exit check after tool execution. The pWaitFor in Task.ts now includes this.abort in its condition, preventing indefinite hangs when a task is aborted during stream processing.

Delegation Failure Recovery in AttemptCompletionTool

delegateToParent() is rewritten with comprehensive error recovery:

  • If reopenParentFromDelegation fails, it repairs the parent back to "active" status
  • If the parent isn't in globalState ("Task not found"), falls back to disk-only read-merge-write using the delegation metadata file
  • Child completes as standalone with an error tool result rather than crashing

NewTaskTool No Longer Pushes a Tool Result

Removes the orphaned pushToolResult("Delegated to child task ...") call. The tool_result is instead injected by reopenParentFromDelegation when the child completes, fixing duplicate/orphaned tool results.

Disk Fallback in removeClineFromStack

When the parent task is missing from globalState, reads the delegation metadata file and repairs the parent's on-disk metadata to "active" if it was "delegated" with awaitingChildId matching the removed child.

getTaskWithId Merges Delegation Metadata from Disk

Reads the per-task delegation_metadata.json and merges its fields into the historyItem before returning, ensuring delegation state is always sourced from the per-task file (which may be more current than globalState after a race).

Stale-Read Fix in reopenParentFromDelegation

The parent's historyItem is re-read just before updating it in step 5, fixing a TOCTOU bug where async operations between the initial read and the final write could make the data stale.

Additional Fixes

  • abortTask now cancels the debouncedEmitTokenUsage timer, preventing zombie callbacks after task disposal
  • startTask verifies the task appears in globalState after its first say() call; if missing, retries persistence

Test Coverage

New test files:

  • delegationMeta.spec.ts — read/write, filtering, corruption handling
  • attemptCompletionDelegation.spec.ts — retry, parent repair, disk fallback
  • removeClineFromStack-delegation.spec.ts — disk fallback repair

Updated tests: Existing delegation tests updated to mock the new delegationMeta module and reflect the changed flow (no more initialStatus, no more pushToolResult from NewTaskTool, updated call ordering expectations).

Files Changed

Area Files
Types packages/types/src/task.ts
New module src/core/task-persistence/delegationMeta.ts, src/core/task-persistence/index.ts
Task lifecycle src/core/task/Task.ts, src/core/task-persistence/taskMetadata.ts
Delegation flow src/core/webview/ClineProvider.ts
Tools src/core/tools/AttemptCompletionTool.ts, src/core/tools/NewTaskTool.ts
Message handling src/core/assistant-message/presentAssistantMessage.ts, src/core/webview/webviewMessageHandler.ts
Constants src/shared/globalFileNames.ts
Tests 6 new/updated test files

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. bug Something isn't working labels Feb 10, 2026
@roomote
Copy link
Contributor

roomote bot commented Feb 10, 2026

Rooviewer Clock   See task

All previously flagged issues are now resolved. No new issues found in the latest commits.

  • File collision: saveDelegationMeta() and FileContextTracker.saveTaskMetadata() both do full overwrites to the same task_metadata.json file with incompatible data structures, causing mutual data loss. -- Fixed: delegation metadata now uses a dedicated delegation_metadata.json file.
  • Dead code in public API: initialStatus was removed from TaskOptions and the Task class but still existed in CreateTaskOptions. -- Fixed: initialStatus removed from CreateTaskOptions in packages/types/src/task.ts.
  • undefined delegation fields not surviving JSON round-trip: saveDelegationMeta writes undefined values (e.g. awaitingChildId: undefined) which JSON.stringify silently strips. -- Fixed: clearable fields now use null instead of undefined, and getTaskWithId converts null back to undefined on read.
  • child.start() error handling is unreachable: Task.start() is synchronous and fires startTask() without awaiting. -- Fixed: return value is now captured as startPromise and error handling is attached via .catch().
  • Double pushToolResult on delegation failure fallthrough: When both delegation attempts fail in delegateToParent, pushToolResult(toolError(...)) is called, then return false falls through to the normal completion ask flow. -- Fixed: changed to return true so the caller exits immediately after pushing the error tool result.
  • awaitingChildId: undefined in AttemptCompletionTool repair: persistDelegationMeta at line 219 passes awaitingChildId: undefined which gets stripped by JSON.stringify, failing to clear a previously-set value in the delegation file after extension restart. All other repair paths correctly use null. -- Fixed: changed to null.
  • getTaskWithId(child.taskId) throws in step 4b: The child task was created with startTask: false, so saveClineMessages() has never run and the child's historyItem does not exist in globalState. -- Fixed: step 4b now builds the HistoryItem directly from the child Task object instead of calling getTaskWithId.
  • ClineProvider disk fallback drops completion fields: The disk fallback in removeClineFromStack reads from disk but only carried forward delegatedToId and childIds, dropping completedByChildId and completionResultSummary during the full overwrite. -- Fixed: now preserves all six delegation fields.
  • createTask delegation guard deadlocks delegation flow: The guard at line 3031 threw when delegationInProgress was true, but delegateParentAndOpenChild sets the flag and then calls this.createTask(...). -- Fixed: guard now checks && !parentTask, allowing internal delegation calls through.
  • AttemptCompletionTool disk fallback still does blind overwrite: The "Task not found" fallback in delegateToParent wrote partial fields via saveDelegationMeta which does a full overwrite. -- Fixed: readDelegationMeta added to DelegationProvider interface and disk fallback now uses read-merge-write to preserve all existing fields.
  • Missing toolDenied mock in delegation test: attemptCompletionDelegation.spec.ts omits toolDenied from the formatResponse mock. The "approval denied" test passes accidentally because the missing function throws a TypeError caught by the outer error handler, rather than validating the actual denial path. -- Fixed: toolDenied mock added.
  • AttemptCompletionTool primary repair path drops completion fields: The persistDelegationMeta call in delegateToParent's primary repair path writes only 4 of 6 delegation fields, dropping completedByChildId and completionResultSummary. -- Fixed: both fields now preserved via parentHistory.completedByChildId ?? null and parentHistory.completionResultSummary ?? null.
  • reopenParentFromDelegation meta save drops delegatedToId: The saveDelegationMeta call at line 3770 writes 5 of 6 delegation fields but omits delegatedToId. Since it does a full overwrite, every successful delegation completion erases delegatedToId from the disk file. -- Fixed: delegatedToId: freshParent.delegatedToId ?? null now included.
  • delegateParentAndOpenChild meta saves drop completion fields: The saveDelegationMeta calls at lines 3508 and 3540 omit completedByChildId and completionResultSummary. -- Fixed: both fields now preserved with ?? null fallback in all call sites.
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

@hannesrudolph hannesrudolph marked this pull request as draft February 10, 2026 20:58
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is a well-reasoned fix. The root cause analysis is correct -- saveClineMessages rebuilding the full HistoryItem (including status) via taskMetadata() was clobbering delegation state transitions. The five fixes address distinct failure modes coherently.

A few observations:

  • The double-write pattern (globalState + per-task file) could diverge, but since getTaskWithId always merges the file on top, it's self-healing on the next read. Acceptable tradeoff.
  • The early clear of delegationInProgress in reopenParentFromDelegation before resumeAfterDelegation() is a nice detail -- avoids deadlock when the resumed parent immediately re-delegates. A brief inline comment explaining why it's not just relying on the finally would help future readers (the existing comment is good but could be slightly more explicit about the finally also setting it).
  • Note: PR has merge conflicts to resolve.

hannesrudolph added a commit that referenced this pull request Feb 10, 2026
- Expand inline comment in reopenParentFromDelegation explaining why
  delegationInProgress is cleared before resume and how the finally
  block acts as a safety net on error paths.
- Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>)
  so DELEGATION_META_KEYS cannot silently drift from the interface.
- Remove unused initialStatus from CreateTaskOptions (packages/types).
  The field was never consumed; delegation status is persisted via
  saveDelegationMeta / updateTaskHistory instead.
@hannesrudolph hannesrudolph marked this pull request as ready for review February 11, 2026 00:04
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 11, 2026
@roomote
Copy link
Contributor

roomote bot commented Feb 11, 2026

Rooviewer Clock   See task

Two issues found in the current revision.

  • undefined delegation fields not surviving JSON round-trip: saveDelegationMeta writes undefined values (e.g. awaitingChildId: undefined) which JSON.stringify silently strips. The delegation file cannot clear fields previously set in globalState, undermining its "source of truth" role.
  • child.start() error handling is unreachable: Task.start() is synchronous and fires startTask() without awaiting. The try/catch around child.start() in delegateParentAndOpenChild cannot catch async errors -- the parent-repair code inside it is dead code.
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

Copy link
Collaborator Author

@hannesrudolph hannesrudolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: fix/delegation-race-conditions

Solid work addressing the core race conditions in the delegation system. The delegationInProgress mutex, per-task delegation metadata files, retry-once pattern in AttemptCompletionTool, and the pWaitFor timeout are all well-motivated improvements.

Summary of findings

Priority Count Description
P1 1 Unsafe partial HistoryItem cast
P2 3 Value validation gap, silent metadata failure, missing test coverage
P3 1 Minor redundancy

All 7 previously-raised threads are resolved and not repeated here.

hannesrudolph added a commit that referenced this pull request Feb 11, 2026
- Expand inline comment in reopenParentFromDelegation explaining why
  delegationInProgress is cleared before resume and how the finally
  block acts as a safety net on error paths.
- Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>)
  so DELEGATION_META_KEYS cannot silently drift from the interface.
- Remove unused initialStatus from CreateTaskOptions (packages/types).
  The field was never consumed; delegation status is persisted via
  saveDelegationMeta / updateTaskHistory instead.
@hannesrudolph hannesrudolph force-pushed the fix/delegation-race-conditions branch from 1d050ae to 03d4b44 Compare February 11, 2026 16:59
hannesrudolph added a commit that referenced this pull request Feb 11, 2026
- Expand inline comment in reopenParentFromDelegation explaining why
  delegationInProgress is cleared before resume and how the finally
  block acts as a safety net on error paths.
- Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>)
  so DELEGATION_META_KEYS cannot silently drift from the interface.
- Remove unused initialStatus from CreateTaskOptions (packages/types).
  The field was never consumed; delegation status is persisted via
  saveDelegationMeta / updateTaskHistory instead.
@hannesrudolph hannesrudolph force-pushed the fix/delegation-race-conditions branch from 59f7e67 to 3ac05aa Compare February 11, 2026 20:19
hannesrudolph added a commit that referenced this pull request Feb 12, 2026
- Expand inline comment in reopenParentFromDelegation explaining why
  delegationInProgress is cleared before resume and how the finally
  block acts as a safety net on error paths.
- Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>)
  so DELEGATION_META_KEYS cannot silently drift from the interface.
- Remove unused initialStatus from CreateTaskOptions (packages/types).
  The field was never consumed; delegation status is persisted via
  saveDelegationMeta / updateTaskHistory instead.
@hannesrudolph hannesrudolph force-pushed the fix/delegation-race-conditions branch from 98fc2e0 to fac6e2b Compare February 12, 2026 00:06
hannesrudolph added a commit that referenced this pull request Feb 12, 2026
- Expand inline comment in reopenParentFromDelegation explaining why
  delegationInProgress is cleared before resume and how the finally
  block acts as a safety net on error paths.
- Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>)
  so DELEGATION_META_KEYS cannot silently drift from the interface.
- Remove unused initialStatus from CreateTaskOptions (packages/types).
  The field was never consumed; delegation status is persisted via
  saveDelegationMeta / updateTaskHistory instead.
@hannesrudolph hannesrudolph force-pushed the fix/delegation-race-conditions branch from 67e1d8a to ac12404 Compare February 12, 2026 01:18
hannesrudolph added a commit that referenced this pull request Feb 12, 2026
- Expand inline comment in reopenParentFromDelegation explaining why
  delegationInProgress is cleared before resume and how the finally
  block acts as a safety net on error paths.
- Add compile-time safeguard (Record<keyof Required<DelegationMeta>, true>)
  so DELEGATION_META_KEYS cannot silently drift from the interface.
- Remove unused initialStatus from CreateTaskOptions (packages/types).
  The field was never consumed; delegation status is persisted via
  saveDelegationMeta / updateTaskHistory instead.
@hannesrudolph hannesrudolph force-pushed the fix/delegation-race-conditions branch from 5701bd5 to a223bd5 Compare February 12, 2026 07:24
Comprehensive fix for race conditions and error handling gaps in the subtask
delegation system. Addresses multiple failure modes that could leave parent
tasks permanently stuck in 'delegated' status, causing nested subtasks to hang.

Key fixes:
- Remove initialStatus from taskMetadata rebuild (eliminates status overwrites)
- Persist delegation metadata to per-task files (resolves globalState eviction)
- Add delegationInProgress mutex guard (prevents concurrent delegation ops)
- TOCTOU race fixes with fresh re-reads before writes
- Abort-aware pWaitFor predicate (prevents false 60s timeout on user input)
- Remove silent .catch(() => {}) — all errors now logged unless task is aborting
- Single-attempt delegation with parent repair on failure (no retry band-aids)
- Cancel debouncedEmitTokenUsage in dispose() (prevents zombie callbacks)
- new_task isolation truncation for parallel tool calls
@hannesrudolph hannesrudolph force-pushed the fix/delegation-race-conditions branch from a223bd5 to e01cd31 Compare February 12, 2026 07:25
@hannesrudolph hannesrudolph changed the title fix: race conditions in subtask delegation system fix: harden delegation lifecycle against race conditions with per-task metadata, mutual-exclusion guards, and multi-layer failure recovery Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants