audit(gastown): Reconciler code quality audit — bugs, gaps, and hardening recommendations

# Reconciler Code Quality Audit

Exhaustive line-by-line audit of the Gastown reconciler (`reconciler.ts`, `actions.ts`, `events.ts`) and supporting modules (`agents.ts`, `beads.ts`, `review-queue.ts`, `Town.do.ts`), compared against `reconciliation-spec.md`.

**Files audited:**
- `src/dos/town/reconciler.ts` (2130 lines)
- `src/dos/town/actions.ts` (974 lines)
- `src/dos/town/events.ts` (213 lines)
- `src/dos/town/agents.ts` (578 lines)
- `src/dos/town/beads.ts` (950 lines)
- `src/dos/town/review-queue.ts` (1453+ lines)
- `src/dos/Town.do.ts` (alarm method, ~300 lines)
- `test/integration/reconciler.test.ts` (489 lines)
- `docs/gt/reconciliation-spec.md` (1035 lines)

---

## 1. Bugs Found

### B1: SQL interpolation in circuit breaker query — potential injection / broken query
**File:** `reconciler.ts:63`
**Impact:** HIGH — could break the circuit breaker or cause SQL injection
```sql
WHERE ... > strftime('%Y-%m-%dT%H:%M:%fZ', 'now', '-${CIRCUIT_BREAKER_WINDOW_MINUTES} minutes')
```
`CIRCUIT_BREAKER_WINDOW_MINUTES` (30) is interpolated directly into the SQL string via a JavaScript template literal, not via a parameterized query argument. While this constant is hardcoded (not user-supplied), this violates the project convention ("use parameterized queries") and creates a maintenance hazard — if this constant were ever derived from user input, it would be a SQL injection vector.

**Fix:** Use a parameterized approach: compute the cutoff timestamp in JS (`new Date(Date.now() - CIRCUIT_BREAKER_WINDOW_MINUTES * 60_000).toISOString()`) and pass it as a query parameter.

### B2: `dispatch_agent` side effect doesn't roll back agent status on failure
**File:** `actions.ts:550-561`
**Impact:** HIGH — agent stuck in 'working' with no container running

The spec (§5.4) says: *"If [dispatch] not started: set agent back to idle. Bead stays in_progress. Reconciler will retry on next tick."* But the implementation only catches the error and logs it — it does NOT set the agent back to idle:

```ts
return async () => {
  await ctx.dispatchAgent(capturedAgentId, beadId, rigId).catch(err => {
    console.warn(...); // No rollback!
  });
};
```

The comment above says *"the reconciler detects the mismatch on the next tick (idle agent hooked to in_progress bead) and retries dispatch"* — but the agent is NOT idle, it's 'working'. The reconciler's Rule 2 only fires for `status = 'idle'` agents. The reconciler's `reconcileAgents` would eventually catch this via heartbeat loss (90s), but that's a 90-second delay for every failed dispatch.

**Fix:** In the `.catch()` handler, call `agentOps.updateAgentStatus(sql, capturedAgentId, 'idle')`.

### B3: `merge_pr` side effect writes SQL inside async closure
**File:** `actions.ts:780-798, 812-831, 838-856`
**Impact:** MEDIUM — violates phase separation (spec §1.5: "SQL mutations are synchronous and atomic")

The `merge_pr` action's deferred async function writes SQL (`UPDATE beads`, `UPDATE review_metadata`) in multiple places:
- Line 783-789: Clears `auto_merge_pending` when fresh feedback found
- Line 815-830: Clears `auto_merge_pending` on merge failure
- Line 838-856: Clears on catch

These SQL writes happen AFTER `Promise.allSettled()` in the alarm loop (Phase 2/4), potentially racing with the next alarm tick's Phase 0 event drain. If the alarm fires quickly, a concurrent drain could read stale `auto_merge_pending` state.

**Fix:** Extract the "clear auto_merge_pending" into a synchronous pre-check, or emit events from the async handler instead of direct SQL writes. The same issue exists in `poll_pr` (line 646-654: SQL writes inside async context).

### B4: `poll_pr` calls `checkPRFeedback` twice per tick
**File:** `actions.ts:608, 663`
**Impact:** LOW — wasted API calls, doubled latency

When both `auto_resolve_pr_feedback` and `auto_merge_delay_minutes` are configured, `checkPRFeedback` is called once at line 608 and again at line 663. Each call is a GitHub API request.

**Fix:** Cache the result of the first call and reuse it.

### B5: `reconcileReviewQueue` double-evaluates circuit breaker
**File:** `reconciler.ts:1024` vs `reconciler.ts:614`
**Impact:** LOW — unnecessary query per tick

`checkDispatchCircuitBreaker(sql)` is called in `reconcileBeads` (line 614) and again independently in `reconcileReviewQueue` (line 1021). The circuit breaker does a COUNT query on every call. These run in the same synchronous tick.

**Fix:** Pass the circuit breaker result from `reconcile()` into each sub-function, or hoist it to `reconcile()` and pass it down.

### B6: Rule 4 (orphaned PR review) only checks `in_progress`, but spec says `open` too
**File:** `reconciler.ts:1118-1134`
**Impact:** MEDIUM — spec says Rule 4 applies to `status in ('open', 'in_progress')` for PR-strategy beads

The spec (§5.3 reconcileReviewQueue Rule 4) says:
```
for each bead where status in ('open', 'in_progress') and pr_url is not null ...
```

But the implementation only checks `mr.status === 'in_progress'` (line 1119). An open MR bead with a `pr_url` that is stale >30min would not be caught. This could happen if a PR URL is set on an MR bead but the bead is never transitioned to `in_progress` (e.g., via a race in `setReviewPrUrl`).

**Fix:** Change the guard to also handle `mr.status === 'open'`.

---

## 2. Missing Edge Cases

### E1: No poison event protection
**File:** `Town.do.ts:3463-3477`
**Impact:** HIGH — a single malformed event blocks the entire reconciler queue

The alarm loop catches `applyEvent` errors but just skips to the next event, leaving the poison event unprocessed. The comment says "Mark it processed anyway after 3 consecutive failures" but this is NOT implemented. On every tick, the poison event is drained again and fails again, consuming resources forever.

**Fix:** Track failure count per event (e.g., in a `retry_count` column or in-memory map). After N failures, mark the event as processed with an error log / Sentry alert.

### E2: `bead_cancelled` event doesn't stop a working agent's container
**File:** `reconciler.ts:259-289`
**Impact:** MEDIUM — agent continues consuming credits after bead is cancelled

When `bead_cancelled` is applied, it unhooks the agent but doesn't emit a `stop_agent` side effect. The agent's container keeps running until the heartbeat times out (90s) or the reconciler's stale-heartbeat check fires. During that window, the agent consumes compute resources working on a cancelled bead.

**Fix:** After unhooking agents in `bead_cancelled`, also insert a `stop_agent` action (or defer a stop side effect).

### E3: No handling of `agent_done` + `agent_completed` in same drain batch
**File:** `reconciler.ts:201-228`
**Impact:** MEDIUM — order-dependent behavior within a batch

If both `agent_done` and `agent_completed` events arrive for the same agent in the same drain batch, the outcome depends on insertion order. Normally `agent_done` runs first (gt_done fires before process exit), but if `agent_completed` runs first:
- Non-refinery: `agentCompleted` unhooks the agent → `agentDone` finds no hook → logs warning and returns (for polecats). The MR is never created.
- This is currently handled for refineries (line 507-547 in review-queue.ts), but not for polecats.

**Fix:** Document this as an accepted trade-off, or add recovery logic to `agentDone` for polecats similar to the refinery recovery path.

### E4: `container_status` event with `not_found` uses JS timestamp comparison, not SQLite
**File:** `reconciler.ts:324-326`
**Impact:** LOW — potential timezone mismatch (though unlikely in practice since both use ISO 8601)

The 3-minute grace period is computed in JS (`Date.now() - new Date(agent.last_activity_at).getTime()`). This is correct as long as `last_activity_at` is stored as ISO 8601. The `touchAgent` function uses `new Date().toISOString()`, so this is consistent.

No fix needed, but worth noting for consistency.

### E5: `pr_feedback_detected` event uses `beadOps.createBead` directly in `applyEvent`
**File:** `reconciler.ts:392-407`
**Impact:** LOW — spec violation: events should only update state, not create new entities

The spec says "Events represent facts that have occurred; applying them updates state to reflect those facts." Creating a new bead is a side effect, not a state update. This blurs the boundary between event application and reconciliation.

**Fix:** Consider moving feedback bead creation to a reconciler rule instead of the event handler.

### E6: Convoy progress double-counting path
**File:** `reconciler.ts:1440-1470` and `beads.ts:330-469`
**Impact:** LOW — redundant but not harmful

`reconcileConvoys` recounts closed beads and emits `update_convoy_progress`. But `beads.updateBeadStatus` also calls `updateConvoyProgress` inline when a bead reaches a terminal state. Both paths update the same counter. The reconciler path is a safety net, which is good, but the dual-write means the counter is updated twice on terminal transitions.

No fix needed — redundancy is acceptable for safety.

---

## 3. Race Conditions

### R1: `dispatch_agent` action creates agent + hooks + sets working — all non-atomic with container start
**File:** `actions.ts:522-561`
**Impact:** HIGH — window where agent is 'working' but container hasn't started

Between the synchronous SQL mutations (agent → working, bead → in_progress) and the async container dispatch, there's a window where:
1. Agent is 'working' + bead is 'in_progress'
2. Container start fails
3. Agent stays 'working' (see B2 above) until heartbeat timeout

During this window, the reconciler won't retry dispatch because the agent appears healthy (status = 'working'). Combined with B2, this creates a 90-second dead zone per failed dispatch.

**Fix:** See B2. Additionally, consider checking container health in the pre-dispatch phase rather than relying solely on post-dispatch heartbeat monitoring.

### R2: `set_convoy_ready_to_land` and `create_landing_mr` in same tick
**File:** `reconciler.ts:1508-1584`
**Impact:** LOW — landing MR creation delayed by one tick

When a convoy first becomes ready to land, the reconciler emits `set_convoy_ready_to_land` (line 1510-1513). The `ready_to_land` flag is checked on line 1516 to decide whether to create the landing MR. But since `set_convoy_ready_to_land` is an action that hasn't been applied yet (it's in the same action batch), the check at line 1516 reads stale state from the database.

Result: the landing MR is created on the NEXT tick, not the current one. This is a one-tick delay (5 seconds), not a bug.

**Fix:** Either accept the one-tick delay (harmless) or restructure to emit both actions in the same check (when `closed_count >= total_count` and no landing MR exists, emit both).

### R3: Concurrent `agentDone` and `agentCompleted` for refineries
**File:** `review-queue.ts:681-782`
**Impact:** MEDIUM — already mitigated but fragile

The `agentCompleted` handler deliberately does NOT unhook refineries (lines 691-703) to avoid racing with `agentDone`. This is correct. However, the guard at line 763-778 that sets the refinery to idle has a subtle race: if `agentDone` runs between `agentCompleted`'s read and write, the `WHERE NOT (status = 'working' AND current_hook_bead_id IS NOT NULL)` guard prevents clobbering.

This is well-handled. No fix needed.

### R4: `poll_pr` touches `updated_at` on every tick, defeating staleness checks
**File:** `actions.ts:580-588`
**Impact:** LOW — intentional, but creates a coupling

The `poll_pr` action synchronously updates `updated_at` on the MR bead to prevent the 30-minute orphaned PR timeout (Rule 4) from firing while the PR is actively being polled. This is correct but means Rule 4 can NEVER fire as long as polling continues every 5 seconds.

If polling fails silently (e.g., GitHub rate limit returns 200 with an error body that isn't detected), the MR bead will appear non-stale forever.

**Fix:** Consider a separate `last_poll_at` field on `review_metadata` instead of overloading `updated_at`.

---

## 4. Spec Divergences

### S1: `transition_bead.from` is nullable in implementation, required in spec
**File:** `actions.ts:31` vs spec §4.2
**Impact:** LOW — the implementation is more lenient

The spec defines `from: BeadStatus` (required), but the implementation uses `from: z.string().nullable()`. Several call sites pass `from: null` (e.g., reconciler.ts:794, 1335), meaning the from-state guard is skipped entirely. The `applyAction` for `transition_bead` (actions.ts:309-319) doesn't check `from` at all — it always applies the transition.

**Fix:** Either enforce the from-state guard as the spec requires, or update the spec to document that `from` is informational.

### S2: Spec says GUPP warn at 30 min, implementation uses 15 min
**File:** `reconciler.ts:1722` vs spec §5.3 (`GUPP_WARN_MS` = 30 min)
**Impact:** LOW — intentional tightening

The spec says GUPP warn threshold is 30 minutes. The implementation uses 15 minutes (line 1722: `elapsed > 15 * 60_000`). The comment says "Tighter warn threshold (15min vs old 30min) using SDK activity." This is an intentional deviation.

**Fix:** Update the spec to reflect the 15-minute threshold.

### S3: Spec says `agent_completed` closes/fails bead, implementation doesn't for `completed` status
**File:** `review-queue.ts:704-749` vs spec §5.2

The spec says:
```
if agent.current_hook_bead_id:
  status = event.payload.status == 'completed' ? 'closed' : 'failed'
  transitionBead(sql, agent.current_hook_bead_id, status)
```

But the implementation (lines 718-749) does NOT close the bead when `status = 'completed'` and the bead is still `in_progress`. Instead, it leaves the bead as `in_progress` and lets reconciler Rule 3 recover it after 5 minutes. This is intentional (documented in comments) — gt_done is the canonical completion path for polecats.

**Fix:** Update spec to match implementation.

### S4: `reconcileBeads` Rule 3 timeout is 5 min, spec says 2 min
**File:** `reconciler.ts:103` vs spec §5.3 Rule 3 (`stale(updated_at, 2 minutes)`)
**Impact:** MEDIUM — longer recovery time than spec

`STALE_IN_PROGRESS_TIMEOUT_MS` is 5 minutes. The spec says 2 minutes. The code comment explains: *"Must be longer than AGENT_IDLE_TIMEOUT_MS (2 min) + one alarm tick (5s) to avoid racing with the idle-timer → agentCompleted → reconciler flow."*

**Fix:** Update spec to 5 minutes with the rationale.

### S5: Spec event type list is incomplete
**File:** spec §3.1 vs `reconciler.ts` event handlers
**Impact:** LOW — spec is stale

The implementation handles additional event types not in the spec:
- `container_eviction` (line 359-364)
- `pr_feedback_detected` (line 372-408)
- `pr_auto_merge` (line 410-434)

**Fix:** Update spec with the new event types.

### ~~S6: `refineryCodeReview` option is threaded through but unused~~ [RETRACTED]
**File:** `reconciler.ts:448, 455, 1018`

**Correction (verified against source at 30dfbde):** This finding is **inaccurate**. The `refineryCodeReview` option IS actively used. At `reconciler.ts:1141`, it is read: `const refineryCodeReview = opts?.refineryCodeReview ?? true`. At lines 1143-1172, when `!refineryCodeReview`, open MR beads with PR URLs are fast-tracked to `in_progress` (bypassing refinery dispatch). At line 1176, it gates Rules 5-6 (refinery dispatch logic). This is functional, not dead code.

---

## 5. Code Quality Issues

### C1: Repeated `AgentRow` query pattern (5+ copies)
**File:** `reconciler.ts:478-493, 537-553, 740-757, 1179-1196, 1274-1291, 1370-1387, 1611-1630, 1709-1726`
**Impact:** Maintainability

The same SELECT + JOIN + parse pattern for agent rows is duplicated ~8 times. Each copy selects slightly different columns but uses the same `AgentRow` schema. A helper function like `queryWorkingAgents(sql, filters)` would reduce duplication.

### C2: `hasExistingPrFeedbackBead` / `hasExistingFeedbackBead` — similar but not identical
**File:** `reconciler.ts:1881-1898` and `actions.ts:939-956`

**Correction (verified against source at 30dfbde):** These functions are similar but NOT identical. The reconciler version (`hasExistingPrFeedbackBead` at line 1881) queries `bd.bead_id = ?` (finds feedback beads that block the MR bead). The actions version (`hasExistingFeedbackBead` at line 939) uses the same direction but has a slightly different name and lives in a different module. The SQL is semantically equivalent in this case, but they are named differently (`hasExistingPrFeedbackBead` vs `hasExistingFeedbackBead`). Still worth deduplicating, but the original claim of "identical implementations" overstates it.

**Fix:** Consolidate into a shared helper and export from one module.

### C3: Magic numbers for timeouts
**File:** `reconciler.ts:324` (180 seconds grace), `reconciler.ts:1677` (15 * 60_000), `reconciler.ts:90_000`
**Impact:** Readability

Some timeouts are named constants (e.g., `STUCK_REVIEW_TIMEOUT_MS`), others are inline magic numbers. The 180-second cold start grace (line 325), 90-second heartbeat timeout (line 513), and 15-minute GUPP warn (line 1677) should be named constants for clarity.

### C4: `now()` helper duplicated in 4 files
**File:** `actions.ts:290-292`, `agents.ts:51-53`, `beads.ts:60-62`, `events.ts:22-24`, `review-queue.ts:36-38`

Every module defines its own `now()` function. This should be a shared utility.

---

## 6. Missing Tests

### T1: No test for circuit breaker
**Impact:** HIGH — a broken circuit breaker could either block all dispatches or allow infinite retries

The circuit breaker (`checkDispatchCircuitBreaker`) has no test coverage. Tests should verify:
- Breaker opens when threshold is exceeded
- Breaker allows dispatches below threshold
- Breaker emits `notify_mayor` action

### T2: No test for GUPP (reconcileGUPP)
**Impact:** HIGH — GUPP bugs cause stuck agents consuming credits

No tests for any GUPP tier (warn, escalate, force_stop). Tests should verify:
- 15-min idle → warn nudge
- 1-hour → escalate + triage request
- 2-hour → force stop + triage request
- Draining skips GUPP
- `hasRecentNudge` dedup works correctly

### T3: No test for event idempotency
**Impact:** HIGH — duplicate events could cause double transitions

No test verifies what happens when the same event is applied twice. Key scenarios:
- Double `agent_done` for the same agent
- Double `pr_status_changed` (merged) for the same MR
- `bead_cancelled` for an already-cancelled bead

### T4: No test for `reconcileReviewQueue` Rules 2, 3, 4, 6, 7
**Impact:** HIGH — stuck MR recovery is the most common failure mode

Only Rule 5 (refinery dispatch) and Rule 6 (dispatch limits) have test coverage. Missing:
- Rule 2: Stuck review recovery (no working agent, no PR, >30 min)
- Rule 3: Abandoned MR recovery (no agent hooked, >2 min)
- Rule 4: Orphaned PR review (>30 min, no working agent)
- Rule 7: Working refinery with terminal MR → stop

### T5: No test for convoy landing MR creation
**Impact:** MEDIUM — broken convoy landing blocks all review-then-land convoys

`reconcileConvoys` has a test for auto-close (review-and-merge) and verifying review-then-land doesn't auto-close, but no test for:
- Landing MR creation when `ready_to_land` is set
- Landing MR merged → convoy closes
- Failed landing MR handling

### T6: No test for `applyEvent` edge cases
**Impact:** MEDIUM — no coverage for:
- `pr_feedback_detected` → feedback bead creation
- `pr_auto_merge` → auto_merge_pending metadata
- `container_status` with `not_found` during cold start grace period
- `bead_cancelled` with hooked agents

### T7: No unit tests for reconciler rules in isolation
**Impact:** MEDIUM — all tests are integration tests

The integration tests are valuable but slow and can't easily test timing-dependent scenarios (staleness thresholds, cooldown backoff). Pure unit tests with a mock `SqlStorage` could test individual rules with controlled timestamps.

### T8: No test for invariant checker
**Impact:** LOW — invariant checker is observability-only (no auto-correction)

`checkInvariants` is never tested. Should verify all 5 invariant checks detect violations.

---

## 7. Hardening Recommendations

### H1: Implement poison event protection (priority: P0)
Add a `retry_count` column to `town_events` or track retries in-memory. After 3 failures, mark the event processed and alert via Sentry. This prevents a single bad event from blocking all reconciliation.

### H2: Fix dispatch rollback (priority: P0)
In `dispatch_agent`'s async side effect catch handler (actions.ts:555), set the agent back to 'idle' on failure. This eliminates the 90-second dead zone described in B2/R1.

### H3: Add from-state guards to `transition_bead` (priority: P1)
The spec requires checking `action.from` against the current bead status. The implementation skips this check. Without it, concurrent rule firings can produce unexpected transitions (e.g., a bead goes open→in_progress→open→in_progress in rapid succession). Add a guard that logs a warning and skips the transition when `action.from` doesn't match.

### H4: Add `last_poll_at` to review_metadata (priority: P1)
Stop overloading `updated_at` for poll liveness (see R4). A dedicated `last_poll_at` field on `review_metadata` would decouple poll freshness from general bead update semantics, making staleness checks more reliable.

### H5: Add auto-recovery for invariant violations (priority: P1)
Currently, `checkInvariants` only logs violations. Add auto-corrective actions:
- Invariant 7 (working + no hook): transition agent to idle
- Invariant 6 (multi-hooked bead): unhook all but the most recent agent
- Invariant 3 (multiple in_progress MRs per rig): fail all but the oldest

### H6: Hoist circuit breaker to `reconcile()` (priority: P2)
Call `checkDispatchCircuitBreaker` once in `reconcile()` and pass the result to `reconcileBeads` and `reconcileReviewQueue`. Saves one COUNT query per tick.

### H7: Add bead_cancelled → stop_agent (priority: P2)
When a bead is cancelled while an agent is working on it, stop the agent's container immediately instead of waiting for heartbeat timeout. See E2.

### H8: Parameterize the circuit breaker SQL (priority: P2)
Replace the template literal interpolation of `CIRCUIT_BREAKER_WINDOW_MINUTES` with a computed ISO timestamp and parameterized query. See B1.

### H9: Extract shared helpers (priority: P3)
- Deduplicate `now()` into `util/time.util.ts`
- Consolidate `hasExistingPrFeedbackBead` / `hasExistingFeedbackBead` between `reconciler.ts` and `actions.ts`
- Extract repeated `AgentRow` query patterns into helper functions

### H10: Add comprehensive reconciler unit tests (priority: P1)
Create `test/unit/reconciler.test.ts` with controlled state setup (direct SQL inserts) to test:
- Each reconcile rule in isolation with boundary timestamps
- Circuit breaker open/close thresholds
- GUPP tier progression
- Cooldown backoff calculations
- Idempotency of each rule (running twice produces same result)


audit(gastown): Reconciler code quality audit — bugs, gaps, and hardening recommendations #1986

Description

Reconciler Code Quality Audit

1. Bugs Found

B1: SQL interpolation in circuit breaker query — potential injection / broken query

B2: dispatch_agent side effect doesn't roll back agent status on failure

B3: merge_pr side effect writes SQL inside async closure

B4: poll_pr calls checkPRFeedback twice per tick

B5: reconcileReviewQueue double-evaluates circuit breaker

B6: Rule 4 (orphaned PR review) only checks in_progress, but spec says open too

2. Missing Edge Cases

E1: No poison event protection

E2: bead_cancelled event doesn't stop a working agent's container

E3: No handling of agent_done + agent_completed in same drain batch

E4: container_status event with not_found uses JS timestamp comparison, not SQLite

E5: pr_feedback_detected event uses beadOps.createBead directly in applyEvent

E6: Convoy progress double-counting path

3. Race Conditions

R1: dispatch_agent action creates agent + hooks + sets working — all non-atomic with container start

R2: set_convoy_ready_to_land and create_landing_mr in same tick

R3: Concurrent agentDone and agentCompleted for refineries

R4: poll_pr touches updated_at on every tick, defeating staleness checks

4. Spec Divergences

S1: transition_bead.from is nullable in implementation, required in spec

S2: Spec says GUPP warn at 30 min, implementation uses 15 min

S3: Spec says agent_completed closes/fails bead, implementation doesn't for completed status

S4: reconcileBeads Rule 3 timeout is 5 min, spec says 2 min

S5: Spec event type list is incomplete

S6: refineryCodeReview option is threaded through but unused [RETRACTED]

5. Code Quality Issues

C1: Repeated AgentRow query pattern (5+ copies)

C2: hasExistingPrFeedbackBead / hasExistingFeedbackBead — similar but not identical

C3: Magic numbers for timeouts

C4: now() helper duplicated in 4 files

6. Missing Tests

T1: No test for circuit breaker

T2: No test for GUPP (reconcileGUPP)

T3: No test for event idempotency

T4: No test for reconcileReviewQueue Rules 2, 3, 4, 6, 7

T5: No test for convoy landing MR creation

T6: No test for applyEvent edge cases

T7: No unit tests for reconciler rules in isolation

T8: No test for invariant checker

7. Hardening Recommendations

H1: Implement poison event protection (priority: P0)

H2: Fix dispatch rollback (priority: P0)

H3: Add from-state guards to transition_bead (priority: P1)

H4: Add last_poll_at to review_metadata (priority: P1)

H5: Add auto-recovery for invariant violations (priority: P1)

H6: Hoist circuit breaker to reconcile() (priority: P2)

H7: Add bead_cancelled → stop_agent (priority: P2)

H8: Parameterize the circuit breaker SQL (priority: P2)

H9: Extract shared helpers (priority: P3)

H10: Add comprehensive reconciler unit tests (priority: P1)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

B2: `dispatch_agent` side effect doesn't roll back agent status on failure

B3: `merge_pr` side effect writes SQL inside async closure

B4: `poll_pr` calls `checkPRFeedback` twice per tick

B5: `reconcileReviewQueue` double-evaluates circuit breaker

B6: Rule 4 (orphaned PR review) only checks `in_progress`, but spec says `open` too

E2: `bead_cancelled` event doesn't stop a working agent's container

E3: No handling of `agent_done` + `agent_completed` in same drain batch

E4: `container_status` event with `not_found` uses JS timestamp comparison, not SQLite

E5: `pr_feedback_detected` event uses `beadOps.createBead` directly in `applyEvent`

R1: `dispatch_agent` action creates agent + hooks + sets working — all non-atomic with container start

R2: `set_convoy_ready_to_land` and `create_landing_mr` in same tick

R3: Concurrent `agentDone` and `agentCompleted` for refineries

R4: `poll_pr` touches `updated_at` on every tick, defeating staleness checks

S1: `transition_bead.from` is nullable in implementation, required in spec

S3: Spec says `agent_completed` closes/fails bead, implementation doesn't for `completed` status

S4: `reconcileBeads` Rule 3 timeout is 5 min, spec says 2 min

S6: `refineryCodeReview` option is threaded through but unused [RETRACTED]

C1: Repeated `AgentRow` query pattern (5+ copies)

C2: `hasExistingPrFeedbackBead` / `hasExistingFeedbackBead` — similar but not identical

C4: `now()` helper duplicated in 4 files

T4: No test for `reconcileReviewQueue` Rules 2, 3, 4, 6, 7

T6: No test for `applyEvent` edge cases

H3: Add from-state guards to `transition_bead` (priority: P1)

H4: Add `last_poll_at` to review_metadata (priority: P1)

H6: Hoist circuit breaker to `reconcile()` (priority: P2)