Skip to content

fix(slack): add streaming keepalive to prevent session timeout#240

Open
gakonst wants to merge 1 commit intovercel:mainfrom
gakonst:fix/slack-streaming-keepalive
Open

fix(slack): add streaming keepalive to prevent session timeout#240
gakonst wants to merge 1 commit intovercel:mainfrom
gakonst:fix/slack-streaming-keepalive

Conversation

@gakonst
Copy link
Contributor

@gakonst gakonst commented Mar 14, 2026

Problem

Slack's streaming API expires the session after ~5 minutes of inactivity. When the textStream iterable pauses for extended periods — which is common during long-running agent tool calls, multi-step reasoning, or external API waits — the session expires silently. All subsequent streamer.append() or streamer.stop() calls then fail with:

Error: An API error occurred: message_not_in_streaming_state

This is fatal: the SDK's sendStructuredChunk catch handler disables structured chunks for the rest of the stream, and if text streaming also fails, the entire response is lost. The user sees only an error message.

Currently the SDK has no keepalive or heartbeat mechanism — the for await loop in stream() simply blocks waiting for the next chunk with no timeout awareness.

Fix

Replace the for await loop with a Promise.race pattern that races each iter.next() against a 2-minute keepalive timer (well under Slack's ~5-minute TTL). If no chunk arrives within 2 minutes, a zero-width space (\u200B) is appended via the existing flushMarkdownDelta helper to keep the session alive.

The same pending iterator promise is re-raced after each keepalive, so no chunks are ever dropped or duplicated.

Before

for await (const chunk of textStream) { ... }

After

while (true) {
  if (!pending) pending = iter.next();
  const raced = await Promise.race([
    pending.then(r => ({ kind: 'value', result: r })),
    new Promise(r => setTimeout(() => r({ kind: 'keepalive' }), 120_000)),
  ]);
  if (raced.kind === 'keepalive') {
    await flushMarkdownDelta('\u200B');  // invisible keepalive
    continue;
  }
  pending = null;
  if (raced.result.done) break;
  // ... process chunk as before
}

Testing

  • pnpm --filter @chat-adapter/slack build
  • pnpm --filter @chat-adapter/slack typecheck
  • pnpm --filter @chat-adapter/slack test — 296/297 pass (1 pre-existing network-dependent failure unrelated to this change)

Slack's streaming API expires after ~5 min of inactivity. When the
textStream iterable pauses during long-running agent work (tool calls,
reasoning, etc.), the session expires and subsequent append/stop calls
fail with message_not_in_streaming_state.

Race each chunk against a 2-minute keepalive timer. If no chunk arrives
in time, append a zero-width space to keep the session alive. The same
pending iterator promise is re-raced after each keepalive, so no chunks
are ever dropped.
@vercel
Copy link
Contributor

vercel bot commented Mar 14, 2026

Someone is attempting to deploy a commit to the Vercel Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@haydenbleasel haydenbleasel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix @gakonst — this is a real problem and the Promise.race keepalive approach is solid. The PR description is super clear too, appreciate the before/after. A few things I'd want addressed before merging:

Timer leak on every keepalive cycle

Each iteration creates a new setTimeout via Promise.race, but previous timers are never cleared. If chunks arrive quickly, you'll accumulate orphaned timers. More importantly, when the stream ends the last keepalive timer keeps the event loop alive for up to 2 minutes unnecessarily.

Suggestion — use a clearable timer pattern:

let keepaliveTimer: ReturnType<typeof setTimeout> | null = null;
const startKeepalive = () =>
  new Promise<{ kind: "keepalive" }>((resolve) => {
    keepaliveTimer = setTimeout(() => resolve({ kind: "keepalive" }), KEEPALIVE_MS);
  });

// In the loop:
const raced = await Promise.race([
  pending.then((r) => ({ kind: "value" as const, result: r })),
  startKeepalive(),
]);
if (keepaliveTimer) clearTimeout(keepaliveTimer);

Zero-width spaces accumulate in the final message

Each keepalive appends \u200B via flushMarkdownDelta. Over a long agent turn (say 20+ minutes), that's 10+ invisible characters baked into the message. While individually invisible, they could affect text selection, copy-paste, search, and screen readers. Worth considering whether these should be stripped in the final streamer.stop() call, or whether the keepalive should use a different mechanism.

No cleanup of the async iterator

If an error is thrown mid-stream, iter.return?.() is never called. The original for await loop handles iterator cleanup automatically via the protocol. This version should wrap in try/finally:

try {
  while (true) { /* ... */ }
} finally {
  await iter.return?.();
}

Minor

  • The // eslint-disable-next-line no-constant-condition comment on while (true) is a no-op — this project uses Biome, not ESLint. Either use // biome-ignore if Biome flags it, or just remove the comment.

Core approach is great, just want the timer leak and iterator cleanup addressed, and the ZWS accumulation at least discussed. Thanks for the contribution! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants