Bug
When the reconciler garbage-collects idle agents (>24h, no hook), deleteAgent() re-opens ALL beads assigned to that agent — including beads that are closed or failed. This bypasses the terminal state guard in updateBeadStatus() and mass-resurrects completed work.
Incident
Town 98172328 was idle. At 18:02:55 UTC on Apr 3, the reconciler GC'd 6 stale polecats. deleteAgent() re-opened 32 beads that were already closed/failed. The reconciler then tried to dispatch 30 agents to these zombie beads (60-second wall clock spike), all failed, triggering a cascade of dispatch failures, container evictions, and re-opening cycles. The town had to be manually recovered by having the Mayor close everything.
Root Cause
agents.ts:204-216 — deleteAgent() uses a raw SQL UPDATE that sets status = 'open' on every bead assigned to the deleted agent, with no terminal state check:
export function deleteAgent(sql, agentId) {
query(sql, `
UPDATE beads
SET assignee_agent_bead_id = NULL,
status = 'open', // <-- BYPASSES TERMINAL STATE GUARD
updated_at = ?
WHERE assignee_agent_bead_id = ?
`, [now(), agentId]);
deleteBead(sql, agentId);
}
The terminal state guard in updateBeadStatus() (beads.ts:278-287) correctly blocks closed -> open transitions. But deleteAgent() bypasses this by writing raw SQL.
Fix
Exclude terminal beads from the status reset. Clear the assignee on terminal beads without changing their status:
export function deleteAgent(sql, agentId) {
// Re-open non-terminal beads so they can be reassigned
query(sql, `
UPDATE beads
SET assignee_agent_bead_id = NULL,
status = 'open',
updated_at = ?
WHERE assignee_agent_bead_id = ?
AND status NOT IN ('closed', 'failed')
`, [now(), agentId]);
// Clear assignee on terminal beads without changing status
query(sql, `
UPDATE beads
SET assignee_agent_bead_id = NULL
WHERE assignee_agent_bead_id = ?
AND status IN ('closed', 'failed')
`, [agentId]);
deleteBead(sql, agentId);
}
Broader Concern
This is the same class of bug identified in audit #1986 (finding B5): raw SQL mutations that bypass updateBeadStatus(). Every direct UPDATE beads SET status = ... in the codebase should be audited. The close_sibling_mrs and close_convoy action handlers were fixed in an earlier PR, but deleteAgent() was missed.
Files
src/dos/town/agents.ts:204-216 — deleteAgent() (the bug)
src/dos/town/beads.ts:278-287 — terminal state guard (bypassed)
src/dos/town/reconciler.ts:1773-1781 — GC rule that triggers delete_agent actions
Impact
Critical — mass-resurrects completed beads, causing agents dispatched to already-done work (wasting credits), 60+ second alarm tick spikes, container eviction cascades, and towns becoming unusable until manually recovered.
Acceptance Criteria
References
Bug
When the reconciler garbage-collects idle agents (>24h, no hook),
deleteAgent()re-opens ALL beads assigned to that agent — including beads that areclosedorfailed. This bypasses the terminal state guard inupdateBeadStatus()and mass-resurrects completed work.Incident
Town
98172328was idle. At 18:02:55 UTC on Apr 3, the reconciler GC'd 6 stale polecats.deleteAgent()re-opened 32 beads that were already closed/failed. The reconciler then tried to dispatch 30 agents to these zombie beads (60-second wall clock spike), all failed, triggering a cascade of dispatch failures, container evictions, and re-opening cycles. The town had to be manually recovered by having the Mayor close everything.Root Cause
agents.ts:204-216—deleteAgent()uses a raw SQL UPDATE that setsstatus = 'open'on every bead assigned to the deleted agent, with no terminal state check:The terminal state guard in
updateBeadStatus()(beads.ts:278-287) correctly blocksclosed -> opentransitions. ButdeleteAgent()bypasses this by writing raw SQL.Fix
Exclude terminal beads from the status reset. Clear the assignee on terminal beads without changing their status:
Broader Concern
This is the same class of bug identified in audit #1986 (finding B5): raw SQL mutations that bypass
updateBeadStatus(). Every directUPDATE beads SET status = ...in the codebase should be audited. Theclose_sibling_mrsandclose_convoyaction handlers were fixed in an earlier PR, butdeleteAgent()was missed.Files
src/dos/town/agents.ts:204-216—deleteAgent()(the bug)src/dos/town/beads.ts:278-287— terminal state guard (bypassed)src/dos/town/reconciler.ts:1773-1781— GC rule that triggersdelete_agentactionsImpact
Critical — mass-resurrects completed beads, causing agents dispatched to already-done work (wasting credits), 60+ second alarm tick spikes, container eviction cascades, and towns becoming unusable until manually recovered.
Acceptance Criteria
deleteAgent()excludesclosed/failedbeads from status resetUPDATE beads SET statusqueries for the same gapReferences