fix(slack): add streaming keepalive to prevent session timeout by gakonst · Pull Request #240 · vercel/chat

gakonst · 2026-03-14T16:39:53Z

Problem

Slack's streaming API expires the session after ~5 minutes of inactivity. When the textStream iterable pauses for extended periods — which is common during long-running agent tool calls, multi-step reasoning, or external API waits — the session expires silently. All subsequent streamer.append() or streamer.stop() calls then fail with:

Error: An API error occurred: message_not_in_streaming_state

This is fatal: the SDK's sendStructuredChunk catch handler disables structured chunks for the rest of the stream, and if text streaming also fails, the entire response is lost. The user sees only an error message.

Currently the SDK has no keepalive or heartbeat mechanism — the for await loop in stream() simply blocks waiting for the next chunk with no timeout awareness.

Fix

Replace the for await loop with a Promise.race pattern that races each iter.next() against a 2-minute keepalive timer (well under Slack's ~5-minute TTL). If no chunk arrives within 2 minutes, a zero-width space (\u200B) is appended via the existing flushMarkdownDelta helper to keep the session alive.

The same pending iterator promise is re-raced after each keepalive, so no chunks are ever dropped or duplicated.

Before

for await (const chunk of textStream) { ... }

After

while (true) {
  if (!pending) pending = iter.next();
  const raced = await Promise.race([
    pending.then(r => ({ kind: 'value', result: r })),
    new Promise(r => setTimeout(() => r({ kind: 'keepalive' }), 120_000)),
  ]);
  if (raced.kind === 'keepalive') {
    await flushMarkdownDelta('\u200B');  // invisible keepalive
    continue;
  }
  pending = null;
  if (raced.result.done) break;
  // ... process chunk as before
}

Testing

pnpm --filter @chat-adapter/slack build ✅
pnpm --filter @chat-adapter/slack typecheck ✅
pnpm --filter @chat-adapter/slack test — 296/297 pass (1 pre-existing network-dependent failure unrelated to this change)

Slack's streaming API expires after ~5 min of inactivity. When the textStream iterable pauses during long-running agent work (tool calls, reasoning, etc.), the session expires and subsequent append/stop calls fail with message_not_in_streaming_state. Race each chunk against a 2-minute keepalive timer. If no chunk arrives in time, append a zero-width space to keep the session alive. The same pending iterator promise is re-raced after each keepalive, so no chunks are ever dropped.

vercel · 2026-03-14T16:39:59Z

Someone is attempting to deploy a commit to the Vercel Team on Vercel.

A member of the Team first needs to authorize it.

haydenbleasel

Nice fix @gakonst — this is a real problem and the Promise.race keepalive approach is solid. The PR description is super clear too, appreciate the before/after. A few things I'd want addressed before merging:

Timer leak on every keepalive cycle

Each iteration creates a new setTimeout via Promise.race, but previous timers are never cleared. If chunks arrive quickly, you'll accumulate orphaned timers. More importantly, when the stream ends the last keepalive timer keeps the event loop alive for up to 2 minutes unnecessarily.

Suggestion — use a clearable timer pattern:

let keepaliveTimer: ReturnType<typeof setTimeout> | null = null;
const startKeepalive = () =>
  new Promise<{ kind: "keepalive" }>((resolve) => {
    keepaliveTimer = setTimeout(() => resolve({ kind: "keepalive" }), KEEPALIVE_MS);
  });

// In the loop:
const raced = await Promise.race([
  pending.then((r) => ({ kind: "value" as const, result: r })),
  startKeepalive(),
]);
if (keepaliveTimer) clearTimeout(keepaliveTimer);

Zero-width spaces accumulate in the final message

Each keepalive appends \u200B via flushMarkdownDelta. Over a long agent turn (say 20+ minutes), that's 10+ invisible characters baked into the message. While individually invisible, they could affect text selection, copy-paste, search, and screen readers. Worth considering whether these should be stripped in the final streamer.stop() call, or whether the keepalive should use a different mechanism.

No cleanup of the async iterator

If an error is thrown mid-stream, iter.return?.() is never called. The original for await loop handles iterator cleanup automatically via the protocol. This version should wrap in try/finally:

try {
  while (true) { /* ... */ }
} finally {
  await iter.return?.();
}

Minor

The // eslint-disable-next-line no-constant-condition comment on while (true) is a no-op — this project uses Biome, not ESLint. Either use // biome-ignore if Biome flags it, or just remove the comment.

Core approach is great, just want the timer leak and iterator cleanup addressed, and the ZWS accumulation at least discussed. Thanks for the contribution! 🙏

haydenbleasel reviewed Mar 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(slack): add streaming keepalive to prevent session timeout#240

fix(slack): add streaming keepalive to prevent session timeout#240
gakonst wants to merge 1 commit intovercel:mainfrom
gakonst:fix/slack-streaming-keepalive

gakonst commented Mar 14, 2026

Uh oh!

vercel bot commented Mar 14, 2026

Uh oh!

haydenbleasel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gakonst commented Mar 14, 2026

Problem

Fix

Before

After

Testing

Uh oh!

vercel bot commented Mar 14, 2026

Uh oh!

haydenbleasel left a comment

Choose a reason for hiding this comment

Timer leak on every keepalive cycle

Zero-width spaces accumulate in the final message

No cleanup of the async iterator

Minor

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants