[FEATURE] Make agent only yield final reponse

### Problem Statement

When using stream_async with an agent that has tools, the agent goes through multiple model turns (think → tool_use → think → tool_use → final answer). 

Currently, stream_async yields {"data": "..."} text chunks from every model turn, including intermediate "thinking" text before tool calls.

For many production use cases (chat UIs, SSE endpoints, API responses), consumers only want the final answer streamed to the end user — not the intermediate reasoning that precedes tool invocations. Today, the only way to achieve this is to implement custom buffering logic in the consumer:

turn_buffer = []
turn_has_tool_use = False

async for event in agent.stream_async(prompt):
    if isinstance(event, dict):
        if event.get("start_event_loop"):
            turn_buffer = []
            turn_has_tool_use = False
            continue

        raw_event = event.get("event")
        if isinstance(raw_event, dict):
            cb_start = raw_event.get("contentBlockStart", {}).get("start", {})
            if "toolUse" in cb_start:
                turn_has_tool_use = True

            if "messageStop" in raw_event:
                stop_reason = raw_event["messageStop"].get("stopReason", "")
                # Mirror SDK's handle_message_stop override
                if stop_reason == "end_turn" and turn_has_tool_use:
                    stop_reason = "tool_use"
                if stop_reason == "end_turn":
                    for chunk in turn_buffer:
                        yield chunk
                    turn_buffer = []
                continue

        if "result" in event and turn_buffer:
            for chunk in turn_buffer:
                yield chunk
            turn_buffer = []

    chunk_text = event.get("data", "") if isinstance(event, dict) else ""
    if chunk_text:
        turn_buffer.append(chunk_text)

This requires deep knowledge of the internal event lifecycle (start_event_loop, raw Bedrock messageStop events, the SDK's end_turn → tool_use override when tool blocks are present, etc.) and is fragile to SDK changes.



### Proposed Solution

Add a stream_final_turn_only parameter (or similar) to stream_async that buffers intermediate turn text internally and only yields {"data": "..."} events from the final model turn (where stopReason == "end_turn" and no tool use blocks are present).

# Only stream the final answer to the user
async for event in agent.stream_async(
    "Analyze this data",
    stream_final_turn_only=True,
):
    if "data" in event:
        yield event["data"]  # Only receives final turn text
The SDK already has all the signals needed to implement this:

StartEventLoopEvent marks the beginning of each model turn (event_loop.py)
messageStop with stopReason distinguishes end_turn vs tool_use (streaming.py)
handle_message_stop already overrides end_turn → tool_use when tool blocks are present (streaming.py L310-L325)
Implementation could live in stream_async in agent.py or as a wrapper around the event loop, buffering TextStreamEvents per cycle and flushing only when the cycle ends with a true end_turn.

Non-text events (lifecycle, tool use, result) could still be yielded as-is, or filtered based on a separate flag.



### Use Case

- Chat applications that stream responses via SSE — users should only see the final answer, not intermediate "Let me look that up..." text before tool callsAPI endpoints wrapping agents — downstream consumers expect a single coherent streamed response

- Any production deployment where intermediate model reasoning is noise for the end user


### Alternatives Solutions

A callback handler that filters events — but callback handlers are synchronous and don't work with stream_async's async iterator pattern
Consumer-side buffering (what we do today) — works but requires understanding SDK internals and breaks if event format changes
A hook-based approach — AfterModelCallEvent could signal turn boundaries, but hooks can't suppress events that were already yielded


### Additional Context

The latency cost of this approach is minimal. messageStop arrives immediately after the last contentBlockDelta in the same Bedrock HTTP stream, so the buffer flush is nearly instantaneous after the last text token of each turn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Make agent only yield final reponse #2055

Problem Statement

Proposed Solution

Only stream the final answer to the user

Use Case

Alternatives Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Make agent only yield final reponse #2055

Description

Problem Statement

Proposed Solution

Only stream the final answer to the user

Use Case

Alternatives Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions