Skip to content

fix(taskctl): jobs/tasks go stale after binary restart — need auto-recovery on startup #343

@randomm

Description

@randomm

Problem

When opencode is restarted (new binary), any running taskctl jobs are left in a stale state:

  • Job remains running with a dead Pulse PID
  • Tasks stuck in developing/reviewing/adversarial-running with no active agents
  • taskctl resume refuses because job is still marked running
  • taskctl stop sends signal but Pulse is dead so nothing processes it
  • enableAutoWakeup() in-memory listener dies with the process — PM never wakes up even after restart
  • User must manually edit JSON files to force job to stopped

Fix

On taskctl start and taskctl resume, run a recovery scan that:

  1. Checks if the job's Pulse PID is alive — if not, immediately marks job stopped
  2. Resets tasks in developing/reviewing/adversarial-running back to open
  3. Cleans up orphaned worktrees for those tasks
  4. Re-calls enableAutoWakeup(pmSessionId) so PM wakeup listener is re-registered
  5. Logs a system comment on each reset task

Also: taskctl stop on a dead-PID job should immediately mark stopped rather than sending signal.

Acceptance Criteria

  • After binary restart, taskctl resume <jobId> auto-detects dead PID, resets stale tasks, restarts Pulse
  • taskctl stop on dead-PID job immediately transitions to stopped
  • Tasks in flight at restart are reset to open with a system comment
  • enableAutoWakeup re-registered on resume so PM wakes up on task completion
  • No manual JSON editing needed after restart
  • Tests cover restart/recovery scenario
  • Adversarial: APPROVED

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions