Skip to content

[FEATURE] Make agent only yield final reponse #2055

@zhifanl

Description

@zhifanl

Problem Statement

When using stream_async with an agent that has tools, the agent goes through multiple model turns (think → tool_use → think → tool_use → final answer).

Currently, stream_async yields {"data": "..."} text chunks from every model turn, including intermediate "thinking" text before tool calls.

For many production use cases (chat UIs, SSE endpoints, API responses), consumers only want the final answer streamed to the end user — not the intermediate reasoning that precedes tool invocations. Today, the only way to achieve this is to implement custom buffering logic in the consumer:

turn_buffer = []
turn_has_tool_use = False

async for event in agent.stream_async(prompt):
if isinstance(event, dict):
if event.get("start_event_loop"):
turn_buffer = []
turn_has_tool_use = False
continue

    raw_event = event.get("event")
    if isinstance(raw_event, dict):
        cb_start = raw_event.get("contentBlockStart", {}).get("start", {})
        if "toolUse" in cb_start:
            turn_has_tool_use = True

        if "messageStop" in raw_event:
            stop_reason = raw_event["messageStop"].get("stopReason", "")
            # Mirror SDK's handle_message_stop override
            if stop_reason == "end_turn" and turn_has_tool_use:
                stop_reason = "tool_use"
            if stop_reason == "end_turn":
                for chunk in turn_buffer:
                    yield chunk
                turn_buffer = []
            continue

    if "result" in event and turn_buffer:
        for chunk in turn_buffer:
            yield chunk
        turn_buffer = []

chunk_text = event.get("data", "") if isinstance(event, dict) else ""
if chunk_text:
    turn_buffer.append(chunk_text)

This requires deep knowledge of the internal event lifecycle (start_event_loop, raw Bedrock messageStop events, the SDK's end_turn → tool_use override when tool blocks are present, etc.) and is fragile to SDK changes.

Proposed Solution

Add a stream_final_turn_only parameter (or similar) to stream_async that buffers intermediate turn text internally and only yields {"data": "..."} events from the final model turn (where stopReason == "end_turn" and no tool use blocks are present).

Only stream the final answer to the user

async for event in agent.stream_async(
"Analyze this data",
stream_final_turn_only=True,
):
if "data" in event:
yield event["data"] # Only receives final turn text
The SDK already has all the signals needed to implement this:

StartEventLoopEvent marks the beginning of each model turn (event_loop.py)
messageStop with stopReason distinguishes end_turn vs tool_use (streaming.py)
handle_message_stop already overrides end_turn → tool_use when tool blocks are present (streaming.py L310-L325)
Implementation could live in stream_async in agent.py or as a wrapper around the event loop, buffering TextStreamEvents per cycle and flushing only when the cycle ends with a true end_turn.

Non-text events (lifecycle, tool use, result) could still be yielded as-is, or filtered based on a separate flag.

Use Case

  • Chat applications that stream responses via SSE — users should only see the final answer, not intermediate "Let me look that up..." text before tool callsAPI endpoints wrapping agents — downstream consumers expect a single coherent streamed response

  • Any production deployment where intermediate model reasoning is noise for the end user

Alternatives Solutions

A callback handler that filters events — but callback handlers are synchronous and don't work with stream_async's async iterator pattern
Consumer-side buffering (what we do today) — works but requires understanding SDK internals and breaks if event format changes
A hook-based approach — AfterModelCallEvent could signal turn boundaries, but hooks can't suppress events that were already yielded

Additional Context

The latency cost of this approach is minimal. messageStop arrives immediately after the last contentBlockDelta in the same Bedrock HTTP stream, so the buffer flush is nearly instantaneous after the last text token of each turn.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions