fix(reconciler): deduplicate checkPRFeedback and checkDispatchCircuitBreaker calls#2029
Open
jrf0110 wants to merge 17 commits intoconvoy/fix-reconciler-p0-p1-bug-fixes-from-audi/892c7c4a/headfrom
Conversation
* fix(kilo-app): resolve Android manifest merger conflict between expo-secure-store and AppsFlyer Both expo-secure-store and af-android-sdk declare dataExtractionRules and fullBackupContent on <application>, causing the manifest merger to fail. Add a config plugin that sets tools:replace on the application element. * style: format config plugin
…me (#1919) * fix(cloud-agent-sdk): recover stale WebSocket connections on tab resume Add application-level ping/pong staleness detection for WebSocket connections that silently die when tabs are backgrounded. On tab resume, sends a ping and reconnects if no response within 5s. - base-connection: visibilitychange/pageshow/online handlers, ping timeout, proactive ticket refresh before reconnect - cloud-agent-transport/cli-live-transport: snapshot refetch on reconnect via onReconnected callback - session.ts: route completed sessions to historical transport - CloudAgentProvider: determine isLive from DO execution status * fix(cloud-agent-sdk): replace client ping with passive staleness detection Address review feedback on stale WebSocket recovery: - Remove ws.send('ping') — server never responds; staleness detection now relies on server heartbeats canceling the timeout - Make staleness timeout configurable (stalenessTimeoutMs) so the transport layer that knows the heartbeat interval controls the value - Increase default from 5s to 30s to exceed server heartbeat intervals - Track lastMessageTime to skip the check when a recent message proves the connection is alive - Wire heartbeatTimeoutMs through cloud-agent-connection - disconnect() now removes visibility/pageshow/online listeners, fixing a leak when transports disconnect without calling destroy() * fix(cloud-agent-sdk): reset staleness clock when creating a replacement socket * refactor(cloud-agent-sdk): replace ResolvedSession struct with discriminated union Replace the flat { cloudAgentSessionId, isLive } shape with a discriminated union ('remote' | 'cloud-agent' | 'read-only') so transport routing is explicit and type-safe. Simplify session resolution in CloudAgentProvider by removing the runtime-state liveness check. Add runtime exhaustive check in pickTransportFactory.
* feat(triggers): add scheduled trigger support
* feat(claw): add amber upgrade banner to Settings tab Add a new amber-colored Banner component to the Settings tab that appears when a newer OpenClaw version is available. The banner uses the same show/hide logic as the existing 'Update available' badge and triggers the same upgrade action via onRequestUpgrade. * feat(claw): move upgrade banner to page level in ClawDashboard Place the amber upgrade banner at the page level in ClawDashboardInner, colocated with the Free Trial billing banner, instead of inside the Settings tab. The banner uses the same updateAvailable condition and onRequestUpgrade handler as the existing badge in SettingsTab. * refactor(claw): extract useClawUpdateAvailable hook to centralize version detection Extract the duplicated updateAvailable logic into a shared useClawUpdateAvailable hook consumed by both ClawDashboard (page-level upgrade banner) and SettingsTab (inline badge). This eliminates the mirrored version detection code. * fix(claw): remove leftover upgrade banner from SettingsTab The banner now lives at the page level in ClawDashboard; remove the duplicate that was left behind in SettingsTab along with its unused Banner and ArrowUpCircle imports. * refactor(claw): move upgrade banner into InstanceControls Move the amber upgrade banner from the page level in ClawDashboard into InstanceControls, directly below the 'Manage power state and gateway lifecycle' description and above the action buttons. Extend Banner.Button to accept an onClick prop as an alternative to href so the upgrade button uses the Banner's default color-aware styling from BannerContext rather than hardcoded classes. * style: apply oxfmt formatting * feat(claw): white button text and 24hr dismissable upgrade banner Make the 'Upgrade now' button text explicitly white. Add a dismiss button (X) that hides the banner for 24 hours using localStorage, keyed by the available version so a newer release re-shows it. * fix(claw): re-sync banner dismiss state when version changes Replace useState initializer with useMemo keyed on dismissKey so the localStorage check re-runs when the target version loads asynchronously. Reset the in-session dismiss flag via useEffect when the key changes so a newer release is never hidden by a stale dismiss. --------- Co-authored-by: kiloconnect[bot] <240665456+kiloconnect[bot]@users.noreply.github.com>
* the actual issue appears to be in the forked streamchat repo itself. * fix(kiloclaw): align streamchat plugin entries key with manifest id The fork's openclaw.plugin.json now declares id 'openclaw-channel-streamchat' to match the idHint OpenClaw 3.24 derives from the scoped package name. The entries key must match the manifest id for config lookup to work, so update it from 'streamchat' to 'openclaw-channel-streamchat'. Load paths unchanged since the package name is still @wunderchat/openclaw-channel-streamchat. * Remove cache buster * re-add cache buster with better name, update comment * fix(kiloclaw): add plugins.allow for streamchat to suppress auto-load warning OpenClaw warns when plugins.allow is empty and non-bundled plugins are discovered. Add openclaw-channel-streamchat to plugins.allow when Stream Chat is configured. * ci(kiloclaw): pass STREAM_CHAT_CACHE_BUSTER to remote Docker build Forces rebuild of the npm install layer that includes the streamchat plugin fork, ensuring the fix/plugin-name-resolution branch is fetched. * Revert "ci(kiloclaw): pass STREAM_CHAT_CACHE_BUSTER to remote Docker build" This reverts commit ea2cc16.
* remove default allowlist configuration * Update tests to match changes
* add changelog * update guidance * Fix unnecessary escape char * Update src/app/(app)/claw/components/changelog-data.ts Co-authored-by: Florian Hines <florian@kilocode.ai> --------- Co-authored-by: Florian Hines <florian@kilocode.ai>
* feat(dev): add opt-in kiloclaw grafana dashboards * fix(dev): quote grafana textbox filters * style(dev): format grafana dashboards
Bump to 2026.3.28 Co-authored-by: Evan Jacobson <evanjacobson3@gmail.com>
…1987) * refactor(lib): drop QWEN36_PLUS_FREE_MODEL_ID import and usage remove unused QWEN36_PLUS_FREE_MODEL_ID from the qwen provider export and its usage in preferredModels this cleans up imports and keeps the model list in sync with the current provider surface * fix(models): update approval file preferredIndex after removing Qwen 3.6 Plus from preferred list * fix(models): remove trailing newline from approval file to match JSON.stringify output --------- Co-authored-by: kiloconnect[bot] <240665456+kiloconnect[bot]@users.noreply.github.com> Co-authored-by: Joshua Lambert <25085430+lambertjosh@users.noreply.github.com>
* docs(gastown): add local debug testing guide and drain test script - docs/local-debug-testing.md: comprehensive guide covering debug endpoints, container monitoring, drain testing, wrangler management, drain architecture, and common issues - scripts/test-drain.sh: automated end-to-end drain test that sends work, waits for an agent, triggers graceful stop, monitors the drain, and prints PASS/FAIL with diagnostics * fix(gastown): refresh container token when mayor is alive but waiting (#1979) fix(gastown): also refresh container token when mayor is alive (waiting) Refine the refreshContainerToken guard to check for an alive mayor (working/stalled/waiting) in addition to hasActiveWork(). A waiting mayor is alive in the container but hasActiveWork() returns false — without this fix the 8h container token can expire, stranding GT tool calls when the user sends the next message via sendMayorMessage (which reuses the container without calling ensureContainerToken). Co-authored-by: John Fawcett <john@kilcoode.ai> * feat(gastown): skip review queue for gt:pr-fixup beads (#1985) * feat(gastown): skip review queue for gt:pr-fixup beads (#1983) * feat(gastown): skip review queue for gt:pr-fixup beads in agentDone() * style: fix formatting in local-debug-testing.md --------- Co-authored-by: John Fawcett <john@kilcoode.ai> * feat(gastown): thread labels parameter through gt_sling tool chain Add optional labels[] parameter to gt_sling so the mayor can tag beads at creation time. The parameter flows through: mayor-tools.ts → client.ts → mayor-tools.handler.ts → Town.do.ts slingBead() → createBead() createBead() already accepts labels, so no further changes needed. * feat(gastown): add pr_fixup_context to PrimeContext and buildPrimeContext Add pr_fixup_context field to PrimeContext type and populate it in buildPrimeContext when the hooked bead has the gt:pr-fixup label, mirroring the existing rework_context pattern. Extracts pr_url, branch, and target_branch from bead metadata. --------- Co-authored-by: John Fawcett <john@kilcoode.ai> * feat(container): expand apt-get block with build tools, ripgrep, and dev libraries (#1978) * feat(container): expand apt-get block with build tools, ripgrep, and dev libraries Add comprehensive package list to both Dockerfile and Dockerfile.dev: - Build toolchain: build-essential, autoconf - Search tools: ripgrep, jq - Network: wget, gnupg, unzip - Compression: bzip2, zstd - SSL/crypto: libssl-dev, libffi-dev - Database client libs: libdb-dev, libgdbm-dev, libgdbm6 - Python build deps: libbz2-dev, liblzma-dev, libncurses5-dev, libreadline-dev, zlib1g-dev - Ruby build deps: libyaml-dev - Image processing: libvips-dev - Browser/rendering: libgbm1 - Native addon support: libc++1, libgmp-dev - Timezone data: tzdata Closes #1976 * fix(container): remove shell comments from inside backslash-continued apt-get install Shell comments inside backslash-continued lines break the apt-get install command — the # starts a comment that swallows the rest of the logical line, dropping all subsequent packages. Move category info to a block comment above the RUN statement instead. --------- Co-authored-by: John Fawcett <john@kilcoode.ai> * feat(gastown): polecat creates PRs, refinery reviews via GitHub, code_review toggle (#1586) * feat(gastown): auto-resolve PR feedback and auto-merge with grace period Implements the core infrastructure for automatically addressing unresolved review comments and failing CI checks on PRs, plus configurable auto-merge with a grace period timer. Closes #1002 * fix(gastown): address PR review feedback — pagination safety, merge failure recovery, repo validation - GraphQL reviewThreads query now includes pageInfo.hasNextPage; if there are more than 100 threads, conservatively treats PR as having unresolved comments to prevent auto-merging with unchecked feedback - merge_pr action clears auto_merge_pending flag on failure (405/409/error) and resets the auto_merge_ready_since timer so normal polling resumes - reconcileReviewQueue now emits poll_pr alongside merge_pr so PR status changes (merged/closed by human) are still observed during merge attempts - merge_pr validates the PR URL's owner/repo against the rig's git_url before calling the merge API, preventing merges targeting unrelated repos * fix(gastown): check-runs pagination and fresh feedback check before merge - check-runs API now uses per_page=100 and compares total_count to detect unpaginated runs; if more runs exist than returned, allChecksPass is conservatively set to false - merge_pr re-checks feedback (comments + CI) immediately before calling the merge API; if state regressed since the last poll, clears auto_merge_pending and resets the timer * fix(gastown): conservative hasFailingChecks for unpaginated check-runs, fix CI failures - hasFailingChecks is now conservatively true when total_count > returned runs, so PRs with >100 check-runs correctly trigger feedback handling - Replace `as` casts with Zod safeParse for GitHub API responses (checkPRFeedback) to satisfy oxlint no-unnecessary-type-assertion - Fix test constant expressions flagged by TS2871 and oxlint - Run oxfmt on changed files * fix(gastown): separate hasUncheckedRuns from hasFailingChecks in PR feedback hasFailingChecks now only reflects actual failures from inspected runs. A new hasUncheckedRuns field tracks when the check-runs response was paginated. allChecksPass is still conservatively false when runs are unchecked (blocking auto-merge), but the feedback bead is no longer dispatched for phantom CI failures on repos with >100 passing checks. * style: auto-format local-debug-testing.md * feat(gastown): polecat creates PRs, refinery reviews via GitHub, code_review toggle - Polecat creates PRs directly when merge_strategy is 'pr', passing pr_url to gt_done. Removes a full agent dispatch cycle from the flow. - Refinery reviews existing PRs via gh pr review, adding inline comments that go through the same auto-resolve feedback loop as human reviews. - New refinery.code_review toggle (default: true) lets users disable refinery code review when they have an external review bot. - Fix: allChecksPass treats 0 check-runs as passing for repos without CI. - Fix: reconciler Rule 7 stops refineries hooked to terminal MR beads, preventing post-merge review comments. - Convoy-aware: review-then-land intermediate beads skip PR creation. * fix(gastown): include hasUncheckedRuns in pr_feedback_detected condition When check-runs are paginated (>100), the first page may all pass but later pages may have failures. Previously only hasFailingChecks was checked, which only reflects the first page. Now hasUncheckedRuns also triggers feedback detection so the auto-resolve path can fix CI. * fix(gastown): include auto_merge columns in REVIEW_JOIN query REVIEW_JOIN was missing auto_merge_ready_since and last_feedback_check_at from the SELECT, causing MergeRequestBeadRecord parse failures on pr_status_changed events. * fix(gastown): refinery uses gh pr comment instead of --approve The bot account that created the PR cannot approve its own PRs. The refinery now posts a comment when the review passes instead of using gh pr review --approve. * fix(gastown): set parent_bead_id on rework and PR feedback beads Rework beads (from refinery review) and PR feedback beads (from poll_pr comment detection) now set parent_bead_id to the MR bead that triggered them. This links them into the bead chain so the relationship is visible in the UI and queryable via parent traversal. * fix(gastown): skip orphaned-review timeout when code_review disabled, fix inline comment syntax - Rule 4 orphaned-review timeout now skips when refineryCodeReview is false — poll_pr keeps MR beads alive, no refinery expected. - Removed invalid 'gh pr review --comment path:line' syntax from refinery prompt; consolidated on the GitHub API example which correctly creates inline review threads. * fix(gastown): check commit statuses, propagate hasUncheckedRuns, gate auto-merge on toggle - Also check combined commit status API alongside check-runs; repos using legacy status checks are no longer missed. - Propagate hasUncheckedRuns through pr_feedback_detected event so the feedback bead title and prompt reflect the actual reason. - Gate auto-merge timer on refineryConfig.auto_merge in addition to auto_merge_delay_minutes to respect the toggle. * style: auto-format files for CI * fix(gastown): bug fixes — org billing (#1756), reconciler spam (#1364), stuck MR beads (#1632), session leak (#1341) * fix(gastown): preserve org billing context across model changes (#1756) Store organizationId as a standalone env var (GASTOWN_ORGANIZATION_ID) that survives KILO_CONFIG_CONTENT rebuilds. Three defense layers: 1. Set GASTOWN_ORGANIZATION_ID on /agents/start and read it first in extractOrganizationId() before falling back to config extraction. 2. Pass organizationId in the PATCH /model request body so the container receives it even if X-Town-Config parsing fails. 3. Include organization_id in buildContainerConfig and sync it to process.env in the PATCH handler's X-Town-Config processing. * fix(gastown): stop false-positive invariant violations spamming logs every 5s Invariant 7: Exclude mayors from the 'working agent must have a hook' check — mayors are always working and intentionally hookless. Invariant 5: Widen valid convoy bead states to include in_progress and in_review, which are legitimate transient states while child beads are being worked on. Only flag truly unexpected states. Fixes #1364 * fix(gastown): prevent MR beads stuck in_progress when github_token missing (#1632) - Fall back to github_cli_pat then platform integration token in checkPRStatus - Track consecutive null poll results; fail MR bead after 10 nulls - Rate-limit PR polling to once per minute per MR bead via last_poll_at metadata - Store failureReason and failureMessage in bead metadata on permanent failure * fix: format and lint errors in reconciler.ts - Add explicit 'unknown' type annotation to lastPollAt to fix no-unsafe-assignment lint error - Auto-format ternary expression per oxfmt rules * fix(gastown): prevent duplicate session leak when restarting a starting agent Thread an AbortController through the startAgent() startup sequence so that when a restart is requested for an agent still in 'starting' status, the in-flight startup is cancelled before session.create() can produce an orphaned session. - Add startupAbortController field to ManagedAgent type - In startAgent: abort existing startup controller when agent is 'starting' - Check signal.aborted after each async step (ensureSDKServer, session.create, before session.prompt) - On abort: decrement sessionCount, remove agent entry, clean up SDK instance if no sessions remain - In stopAgent: also abort startupAbortController for 'starting' agents - Introduce StartupAbortedError to distinguish abort from real failures Fixes #1341 * fix(gastown): address review comments — reset poll_null_count on all non-null polls, abort orphaned session on startup abort, guard agents.delete with identity check * fix(gastown): store session ID before abort check to prevent orphaned session leak After session.create() resolves, parse and store the session ID on the agent object BEFORE checking signal.aborted. This ensures the catch block has agent.sessionId available to call session.abort(), closing the race window where an abort during session.create() could leak an orphaned session. Also fixes format-check CI by running oxfmt. * feat(gastown): add PR-fixup workflow prompts, accept metadata as object in gt_sling/gt_escalate (#1996) * feat(gastown): add PR-fixup workflow to polecat and mayor system prompts (#1989) * feat(gastown): add PR-fixup workflow to polecat and mayor system prompts Add instructions for the gt:pr-fixup workflow: - Polecat prompt: how to handle PR fixup beads (check out branch, address review comments, resolve threads, push to existing branch) - Mayor prompt: how to dispatch PR fixup beads using gt_sling with the gt:pr-fixup label and appropriate metadata * WIP: container eviction save * fix: address PR #1989 review comments — metadata type and branch conflict - Remove JSON.stringify wrapper from metadata example in mayor prompt so it passes gt_sling's z.record(z.string(), z.unknown()) validation as a plain object. - Clarify polecat PR fixup workflow as the explicit exception to the 'do not switch branches' rule, and add exception note in Commit & Push Hygiene section. --------- Co-authored-by: John Fawcett <john@kilcoode.ai> * fix(gastown): gt_sling and gt_escalate accept metadata as object instead of JSON string The metadata parameter on gt_sling (mayor) and gt_escalate (polecat) now uses z.record(z.string(), z.unknown()) so the LLM can pass a plain object directly. Removes the parseJsonObject indirection and dead code. Updates tests to match. --------- Co-authored-by: John Fawcett <john@kilcoode.ai> * feat(gastown): add debug/beads/:beadId endpoint for bead inspection New debug endpoint returns the full bead record, review_metadata, and bead_dependencies for a given bead ID. Useful for verifying parent_bead_id linkage, pr_url storage, and auto_merge_ready_since timer state during local testing. * docs(gastown): add debug endpoint docs, code_review toggle test, container networking tips - Document the new debug/beads/:beadId endpoint for bead inspection - Add Test C: code_review toggle verification procedure - Add bead chain verification examples - Document container networking workaround for local dev * fix(gastown): clear stale org ID, improve poll failure reason, unify token fallback, guard debug endpoint - Clear GASTOWN_ORGANIZATION_ID when org_id is absent (prevents stale billing context when a town moves from org to personal). - Change poll failure reason from 'no_github_token' to 'pr_poll_failed' with a message that covers all null-poll causes (token, rate limits, 5xx). - checkPRFeedback and mergePR now use the same token fallback chain as checkPRStatus (github_token → github_cli_pat). - Guard debug/beads/:beadId endpoint with dev-only check. * style(settings): align toggle switches to the right side of cards Swap Switch and label/description div positions in all 5 toggle cards on the town settings page so toggles are right-aligned. * fix(gastown): extract resolveGitHubToken helper, unify token fallback across all PR operations All three GitHub PR operations (checkPRStatus, checkPRFeedback, mergePR) now use the same resolveGitHubToken helper with the full fallback chain: github_token → github_cli_pat → platform integration (GitHub App). Towns relying only on a GitHub App installation token will now have auto-resolve and auto-merge working correctly. --------- Co-authored-by: John Fawcett <john@kilcoode.ai>
Compute the cutoff timestamp in JS and pass it as a bound SQL parameter instead of interpolating CIRCUIT_BREAKER_WINDOW_MINUTES into the query string via a template literal. Same 30-minute window, no behavioral change. Co-authored-by: John Fawcett <john@kilcoode.ai>
…Breaker calls B4: Hoist checkPRFeedback above both auto-resolve and auto-merge blocks in poll_pr so it is called at most once per tick instead of twice when both features are configured. B5: Move checkDispatchCircuitBreaker to the top-level reconcile() and pass the boolean result down to reconcileBeads and reconcileReviewQueue as an optional parameter, eliminating the redundant COUNT query. Both functions retain backward compatibility by falling back to an inline evaluation when the parameter is not provided.
Contributor
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Files Reviewed (2 files)
Reviewed by gpt-5.4-20260305 · 266,375 tokens |
dbf82c3 to
4bd4fea
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Eliminates redundant API and SQL calls in two hot paths:
B4 (actions.ts): Hoists
checkPRFeedbackabove both the auto-resolve and auto-merge blocks inpoll_pr. AneedsFeedbackCheckflag determines whether the call is needed, and the single result is reused for both code paths. This removes the duplicate GitHub API call that occurred when bothauto_resolve_pr_feedbackandauto_merge_delay_minuteswere configured.B5 (reconciler.ts): Moves
checkDispatchCircuitBreaker(sql)to the top-levelreconcile()function. The boolean result is computed once and passed down toreconcileBeadsandreconcileReviewQueuevia optionalcircuitBreakerOpenparameters. Both functions retain backward compatibility by falling back to inline evaluation when called standalone. Thenotify_mayoraction emission is also moved toreconcile()to ensure it fires exactly once per reconcile pass.Verification
Visual Changes
N/A
Reviewer Notes
reconcileBeadsandreconcileReviewQueueaccept an optionalcircuitBreakerOpenparameter for backward compatibility when called outside ofreconcile().needsFeedbackCheckcondition in actions.ts is the logical OR of the two original guard conditions, ensuringcheckPRFeedbackis called exactly when either block needs it.