Document SSE/chunked streaming roadmap scope and status by Copilot · Pull Request #4196 · makr-code/ThemisDB

Copilot · 2026-03-13T18:23:13Z

Description

Clarifies the “Streaming Token Output (SSE / Chunked Response)” roadmap detail in src/llm/FUTURE_ENHANCEMENTS.md with scoped goals, constraints, interfaces, phase tasks, performance targets, and security considerations.
Specifies thread-safety/ownership for stream_callback and StreamingHandler, adds dedup-cache bypass rationale, and captures cancellation/deadline behavior and terminal markers.
Marks streaming tests as blocked due to existing llm_deployment_plugin.cpp incomplete-type build failure, keeping status transparent.

Type of Change

Testing

Unit tests added/updated
Integration tests added/updated
Manual testing performed

📚 Research & Knowledge (wenn applicable)

Diese PR basiert auf wissenschaftlichen Paper(s) oder Best Practices?
- Falls JA: Research-Dateien in /docs/research/ angelegt?
- Falls JA: Im Modul-README unter "Wissenschaftliche Grundlagen" verlinkt?
- Falls JA: In /docs/research/implementation_influence/ eingetragen?

Relevante Quellen:

Paper:
Best Practice:
Architecture Decision:

Checklist

Code follows project style guidelines
Self-review completed
Documentation updated (if needed)
No new warnings introduced

Original prompt

This section details on the original issue you should resolve

<issue_title>Streaming Token Output (SSE / Chunked Response)</issue_title>
<issue_description>### Context

This issue implements the roadmap item 'Streaming Token Output (SSE / Chunked Response)' for the llm domain. It is sourced from the consolidated roadmap under 🟠 High Priority — Near-term (v1.5.0 – v1.8.0) and targets milestone v1.7.0.

Primary detail section: Detail section not found

Goal

Deliver the scoped changes for Streaming Token Output (SSE / Chunked Response) in src/llm/ and complete the linked detail section in a release-ready state for v1.7.0.

Detailed Scope

No matching section was found for anchor 'streaming-token-output-sse--chunked-response' in src/llm/FUTURE_ENHANCEMENTS.md.

Use this issue to restore roadmap/detail consistency and implement the scope described in the roadmap title.

Acceptance Criteria

Implement the scoped changes described in the linked detail section.

Add or update tests that verify the intended behaviour.

Relationships

Roadmap row: Add reference implementations and best practices research + Phase 1 Sprint 1-2 implementation with VIS integration, Task Basket UI, Dashboard, AI Navigation, Favorites, Testing and Logging for DocumentManager frontend #76 (🟠 High Priority — Near-term (v1.5.0 – v1.8.0))

Depends on: none identified during generation.

Part of: consolidated roadmap delivery tracking.

References

src/ROADMAP.md

src/llm/FUTURE_ENHANCEMENTS.md#streaming-token-output-sse--chunked-response

Source key: roadmap:76:llm:v1.7.0:streaming-token-output-sse-chunked-response

Generated from the consolidated source roadmap. Keep the roadmap and issue in sync when scope changes.

</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Streaming Token Output (SSE / Chunked Response) #3907

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>

Copilot

Pull request overview

Updates the LLM module’s future-enhancement documentation to clearly scope and describe the “Streaming Token Output (SSE / Chunked Response)” roadmap item, including constraints, interfaces, phases, and test status.

Changes:

Replaces the prior streaming roadmap blurb with a structured section (scope/constraints/interfaces/phases/test strategy/perf/security).
Documents thread-safety/ownership expectations for InferenceRequest::stream_callback and llm::StreamingHandler.
Records test execution as blocked due to an unrelated build failure, to keep status transparent.

You can also share your feedback on Copilot code review. Take the survey.

src/llm/FUTURE_ENHANCEMENTS.md

+### Scope
+- Deliver OpenAI-style streaming for LLM responses via SSE framing and HTTP chunked responses.
+- Expose token-level callbacks through `InferenceRequest::stream_callback` for both engines while keeping engines output-format agnostic.
+- Provide reusable formatting helpers in `llm::StreamingHandler` for SSE events, `[DONE]` sentinel, and chunked-transfer frames.
+
+### Design Constraints
+- SSE payloads must be valid JSON per RFC 8259 with control-character escaping; framing must end with `\n\n`.
+- Terminal events must emit the canonical `data: [DONE]\n\n` sentinel; chunked responses must end with the zero-length chunk `0\r\n\r\n`.
+- Streaming callbacks run on worker threads and must respect cancellation/deadlines before emitting tokens.
+- Deduplication caching must be skipped for streaming requests to avoid serving partial cached content.
+
+### Required Interfaces
+| Interface | Consumer | Notes |
+|-----------|----------|-------|
+| `InferenceRequest::stream_callback` | `AsyncInferenceEngine`, `InferenceEngineEnhanced`, HTTP SSE writers | Serial invocation on the producing worker thread; sink must be thread-safe when sharing state. |
+| `llm::StreamingHandler::{formatSseEvent, formatDoneEvent, formatChunkedData, makeStreamCallback}` | HTTP layer (SSE endpoints, OpenAI compat adapter) | Static, reentrant helpers; atomic index for single-producer streams. |
+
+### Implementation Phases


src/llm/FUTURE_ENHANCEMENTS.md

+- SSE payloads must be valid JSON per RFC 8259 with control-character escaping; framing must end with `\n\n`.
+- Terminal events must emit the canonical `data: [DONE]\n\n` sentinel; chunked responses must end with the zero-length chunk `0\r\n\r\n`.
+- Streaming callbacks run on worker threads and must respect cancellation/deadlines before emitting tokens.
+- Deduplication caching must be skipped for streaming requests to avoid serving partial cached content.
+
+### Required Interfaces
+| Interface | Consumer | Notes |
+|-----------|----------|-------|
+| `InferenceRequest::stream_callback` | `AsyncInferenceEngine`, `InferenceEngineEnhanced`, HTTP SSE writers | Serial invocation on the producing worker thread; sink must be thread-safe when sharing state. |
+| `llm::StreamingHandler::{formatSseEvent, formatDoneEvent, formatChunkedData, makeStreamCallback}` | HTTP layer (SSE endpoints, OpenAI compat adapter) | Static, reentrant helpers; atomic index for single-producer streams. |
+
+### Implementation Phases
+- **Phase 1 — Design / API Contract**
+  - [x] Expose `InferenceRequest::stream_callback` (`include/llm/llm_plugin_interface.h`) as `std::function<void(const std::string&)>`, invoked serially on the worker thread; sinks must be thread-safe when sharing state and must handle abrupt stop (no further callbacks, possibly without a terminal marker) without throwing.
+  - [x] Define SSE/chunked framing surface via `StreamingHandler` (JSON escaping, `[DONE]` sentinel, zero-length terminal chunk) to keep engines output-format agnostic.


Initial plan

b3e2ffb

Copilot AI assigned Copilot and makr-code Mar 13, 2026

Copilot started work on behalf of makr-code March 13, 2026 18:23 View session

Copilot AI and others added 3 commits March 13, 2026 18:27

Add streaming token output detail section

0e3ecfa

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>

Refine streaming roadmap detail structure

4172919

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>

Clarify streaming detail and statuses

91f7d3d

Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Implement streaming token output for SSE and chunked response~~ Document SSE/chunked streaming roadmap scope and status Mar 13, 2026

Copilot AI requested a review from makr-code March 13, 2026 18:46

Copilot finished work on behalf of makr-code March 13, 2026 18:46

Copilot started work on behalf of makr-code March 13, 2026 19:28 View session

makr-code marked this pull request as ready for review March 14, 2026 06:15

makr-code requested a review from Copilot March 14, 2026 06:15

Copilot started reviewing on behalf of makr-code March 14, 2026 06:16 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document SSE/chunked streaming roadmap scope and status#4196

Document SSE/chunked streaming roadmap scope and status#4196
Copilot wants to merge 4 commits intodevelopfrom
copilot/streaming-token-output-sse-chunked-response

Copilot AI commented Mar 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

📚 Research & Knowledge (wenn applicable)

Checklist

Goal

Detailed Scope

Acceptance Criteria

Relationships

References

Comments on the Issue (you are @copilot in this section)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 13, 2026 •

edited

Loading