fix(cloud-agent-sdk): recover stale WebSocket connections on tab resume#1919
Merged
fix(cloud-agent-sdk): recover stale WebSocket connections on tab resume#1919
Conversation
Add application-level ping/pong staleness detection for WebSocket connections that silently die when tabs are backgrounded. On tab resume, sends a ping and reconnects if no response within 5s. - base-connection: visibilitychange/pageshow/online handlers, ping timeout, proactive ticket refresh before reconnect - cloud-agent-transport/cli-live-transport: snapshot refetch on reconnect via onReconnected callback - session.ts: route completed sessions to historical transport - CloudAgentProvider: determine isLive from DO execution status
Contributor
Code Review SummaryStatus: 4 Issues Found | Recommendation: Address before merge Overview
Fix these issues in Kilo Cloud Issue Details (click to expand)WARNING
Other Observations (not in diff)Issues found in unchanged code that cannot receive inline comments:
Files Reviewed (8 files)
Reviewed by gpt-5.4-20260305 · 595,751 tokens |
…ction
Address review feedback on stale WebSocket recovery:
- Remove ws.send('ping') — server never responds; staleness detection
now relies on server heartbeats canceling the timeout
- Make staleness timeout configurable (stalenessTimeoutMs) so the
transport layer that knows the heartbeat interval controls the value
- Increase default from 5s to 30s to exceed server heartbeat intervals
- Track lastMessageTime to skip the check when a recent message proves
the connection is alive
- Wire heartbeatTimeoutMs through cloud-agent-connection
- disconnect() now removes visibility/pageshow/online listeners,
fixing a leak when transports disconnect without calling destroy()
jeanduplessis
approved these changes
Apr 2, 2026
…minated union
Replace the flat { cloudAgentSessionId, isLive } shape with a
discriminated union ('remote' | 'cloud-agent' | 'read-only') so
transport routing is explicit and type-safe. Simplify session
resolution in CloudAgentProvider by removing the runtime-state
liveness check. Add runtime exhaustive check in pickTransportFactory.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes half-open WebSocket connections in cloud agent sessions. When a user backgrounds a tab and returns after the TCP socket has silently died,
onclosenever fires and the session appears frozen. This adds application-level staleness detection and recovery.Connection layer (
base-connection.ts):visibilitychange,pageshow(BFCache), andonlineeventsrefreshAndConnect), fixing expired JWT failures on reconnectdestroy()Transport layer (
cloud-agent-transport.ts,cli-live-transport.ts):onReconnectedcallback refetches the session snapshot and replays it into the sink, avoiding blank screens after reconnectionSession routing (
session.ts,CloudAgentProvider.tsx):pickTransportFactorynow checksresolved.isLivefor Cloud Agent sessions — completed sessions route to the read-only historical transport instead of opening a live WebSocketresolveSessionqueriesgetWithRuntimeStateto determine actual session liveness from execution status, instead of hardcodingisLive: trueVerification
npx jest src/lib/cloud-agent-sdk/ --no-coverage— 18 suites, 551 tests, all passpnpm typecheck— passes, no errors[Connection]messagesVisual Changes
N/A
Reviewer Notes
[Connection]messages in dev: All connection log lines appear in pairs due to React StrictMode double-mounting effects. This is expected in development only and does not occur in production."ping") gracefully.isLivedetection inCloudAgentProvider.tsxhandles multiple edge cases (terminal execution, active execution, null execution with/withoutinitiatedAt). TheinitiatedAtheuristic covers the case where the DO has cleaned up the execution record after completion.