fix(slack): add streaming keepalive to prevent session timeout#240
fix(slack): add streaming keepalive to prevent session timeout#240gakonst wants to merge 1 commit intovercel:mainfrom
Conversation
Slack's streaming API expires after ~5 min of inactivity. When the textStream iterable pauses during long-running agent work (tool calls, reasoning, etc.), the session expires and subsequent append/stop calls fail with message_not_in_streaming_state. Race each chunk against a 2-minute keepalive timer. If no chunk arrives in time, append a zero-width space to keep the session alive. The same pending iterator promise is re-raced after each keepalive, so no chunks are ever dropped.
|
Someone is attempting to deploy a commit to the Vercel Team on Vercel. A member of the Team first needs to authorize it. |
haydenbleasel
left a comment
There was a problem hiding this comment.
Nice fix @gakonst — this is a real problem and the Promise.race keepalive approach is solid. The PR description is super clear too, appreciate the before/after. A few things I'd want addressed before merging:
Timer leak on every keepalive cycle
Each iteration creates a new setTimeout via Promise.race, but previous timers are never cleared. If chunks arrive quickly, you'll accumulate orphaned timers. More importantly, when the stream ends the last keepalive timer keeps the event loop alive for up to 2 minutes unnecessarily.
Suggestion — use a clearable timer pattern:
let keepaliveTimer: ReturnType<typeof setTimeout> | null = null;
const startKeepalive = () =>
new Promise<{ kind: "keepalive" }>((resolve) => {
keepaliveTimer = setTimeout(() => resolve({ kind: "keepalive" }), KEEPALIVE_MS);
});
// In the loop:
const raced = await Promise.race([
pending.then((r) => ({ kind: "value" as const, result: r })),
startKeepalive(),
]);
if (keepaliveTimer) clearTimeout(keepaliveTimer);Zero-width spaces accumulate in the final message
Each keepalive appends \u200B via flushMarkdownDelta. Over a long agent turn (say 20+ minutes), that's 10+ invisible characters baked into the message. While individually invisible, they could affect text selection, copy-paste, search, and screen readers. Worth considering whether these should be stripped in the final streamer.stop() call, or whether the keepalive should use a different mechanism.
No cleanup of the async iterator
If an error is thrown mid-stream, iter.return?.() is never called. The original for await loop handles iterator cleanup automatically via the protocol. This version should wrap in try/finally:
try {
while (true) { /* ... */ }
} finally {
await iter.return?.();
}Minor
- The
// eslint-disable-next-line no-constant-conditioncomment onwhile (true)is a no-op — this project uses Biome, not ESLint. Either use// biome-ignoreif Biome flags it, or just remove the comment.
Core approach is great, just want the timer leak and iterator cleanup addressed, and the ZWS accumulation at least discussed. Thanks for the contribution! 🙏
Problem
Slack's streaming API expires the session after ~5 minutes of inactivity. When the
textStreamiterable pauses for extended periods — which is common during long-running agent tool calls, multi-step reasoning, or external API waits — the session expires silently. All subsequentstreamer.append()orstreamer.stop()calls then fail with:This is fatal: the SDK's
sendStructuredChunkcatch handler disables structured chunks for the rest of the stream, and if text streaming also fails, the entire response is lost. The user sees only an error message.Currently the SDK has no keepalive or heartbeat mechanism — the
for awaitloop instream()simply blocks waiting for the next chunk with no timeout awareness.Fix
Replace the
for awaitloop with aPromise.racepattern that races eachiter.next()against a 2-minute keepalive timer (well under Slack's ~5-minute TTL). If no chunk arrives within 2 minutes, a zero-width space (\u200B) is appended via the existingflushMarkdownDeltahelper to keep the session alive.The same pending iterator promise is re-raced after each keepalive, so no chunks are ever dropped or duplicated.
Before
After
Testing
pnpm --filter @chat-adapter/slack build✅pnpm --filter @chat-adapter/slack typecheck✅pnpm --filter @chat-adapter/slack test— 296/297 pass (1 pre-existing network-dependent failure unrelated to this change)