Skip to content

Commit 8b4ca29

Browse files
google-genai-botcopybara-github
authored andcommitted
feat: fixing context propagation for agent transfers
PiperOrigin-RevId: 885871597
1 parent cd56902 commit 8b4ca29

3 files changed

Lines changed: 368 additions & 73 deletions

File tree

core/src/main/java/com/google/adk/flows/llmflows/BaseLlmFlow.java

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -430,7 +430,12 @@ private Flowable<Event> runOneStep(Context spanContext, InvocationContext contex
430430
"Agent not found: " + agentToTransfer)));
431431
}
432432
return postProcessedEvents.concatWith(
433-
Flowable.defer(() -> nextAgent.get().runAsync(context)));
433+
Flowable.defer(
434+
() -> {
435+
try (Scope s = spanContext.makeCurrent()) {
436+
return nextAgent.get().runAsync(context);
437+
}
438+
}));
434439
}
435440
return postProcessedEvents;
436441
});
@@ -488,6 +493,8 @@ private Flowable<Event> run(
488493
public Flowable<Event> runLive(InvocationContext invocationContext) {
489494
AtomicReference<LlmRequest> llmRequestRef = new AtomicReference<>(LlmRequest.builder().build());
490495
Flowable<Event> preprocessEvents = preprocess(invocationContext, llmRequestRef);
496+
// Capture agent context at assembly time to use as parent for agent transfer at subscription
497+
// time. See Flowable.defer() usages below.
491498
Context spanContext = Context.current();
492499

493500
return preprocessEvents.concatWith(
@@ -608,7 +615,12 @@ public void onError(Throwable e) {
608615
"Agent not found: " + event.actions().transferToAgent().get());
609616
}
610617
Flowable<Event> nextAgentEvents =
611-
nextAgent.get().runLive(invocationContext);
618+
Flowable.defer(
619+
() -> {
620+
try (Scope s = spanContext.makeCurrent()) {
621+
return nextAgent.get().runLive(invocationContext);
622+
}
623+
});
612624
events = Flowable.concat(events, nextAgentEvents);
613625
}
614626
return events;
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# ADK Telemetry and Tracing
2+
3+
This package contains classes for capturing and reporting telemetry data within
4+
the ADK, primarily for tracing agent execution leveraging OpenTelemetry.
5+
6+
## Overview
7+
8+
The `Tracing` utility class provides methods to trace various aspects of an
9+
agent's execution, including:
10+
11+
* Agent invocations
12+
* LLM requests and responses
13+
* Tool calls and responses
14+
15+
These traces can be exported and visualized in telemetry backends like Google
16+
Cloud Trace or Zipkin, or viewed through the ADK Dev Server UI, providing
17+
observability into agent behavior.
18+
19+
## How Tracing is Used
20+
21+
Tracing is deeply integrated into the ADK's RxJava-based asynchronous workflows.
22+
23+
### Agent Invocations
24+
25+
Every agent's `runAsync` or `runLive` execution is wrapped in a span named
26+
`invoke_agent <agent_name>`. The top-level agent invocation initiated by
27+
`Runner.runAsync` or `Runner.runLive` is captured in a span named `invocation`.
28+
Agent-specific metadata like name and description are added as span attributes,
29+
following OpenTelemetry semantic conventions (e.g., `gen_ai.agent.name`).
30+
31+
### LLM Calls
32+
33+
Calls to Large Language Models (LLMs) are traced within a `call_llm` span. The
34+
`traceCallLlm` method attaches detailed attributes to this span, including:
35+
36+
* The LLM request (excluding large data like images) and response.
37+
* Model name (`gen_ai.request.model`).
38+
* Token usage (`gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`).
39+
* Configuration parameters (`gen_ai.request.top_p`,
40+
`gen_ai.request.max_tokens`).
41+
* Response finish reason (`gen_ai.response.finish_reasons`).
42+
43+
### Tool Calls and Responses
44+
45+
Tool executions triggered by the LLM are traced using `tool_call [<tool_name>]`
46+
and `tool_response [<tool_name>]` spans.
47+
48+
* `traceToolCall` records tool arguments in the
49+
`gcp.vertex.agent.tool_call_args` attribute.
50+
* `traceToolResponse` records tool output in the
51+
`gcp.vertex.agent.tool_response` attribute.
52+
* If multiple tools are called in parallel, a single `tool_response` span may
53+
be created for the merged result.
54+
55+
### Context Propagation
56+
57+
ADK is built on RxJava and heavily uses asynchronous processing, which means
58+
that work is often handed off between different threads. For tracing to work
59+
correctly in such an environment, it's crucial that the active span's context
60+
is propagated across these thread boundaries. If context is not propagated,
61+
new spans may be orphaned or attached to the wrong parent, making traces
62+
difficult to interpret.
63+
64+
OpenTelemetry stores the currently active span in a thread-local variable.
65+
When an asynchronous operation switches threads, this thread-local context is
66+
lost. To solve this, ADK's `Tracing` class provides functionality to capture
67+
the context on one thread and restore it on another when an asynchronous
68+
operation resumes. This ensures that spans created on different threads are
69+
correctly parented under the same trace.
70+
71+
The primary mechanism for this is the `Tracing.withContext(context)` method,
72+
which returns an RxJava transformer. When applied to an RxJava stream via
73+
`.compose()`, this transformer ensures that the provided `Context` (containing
74+
the parent span) is re-activated before any `onNext`, `onError`, `onComplete`,
75+
or `onSuccess` signals are propagated downstream. It achieves this by wrapping
76+
the downstream observer with a `TracingObserver`, which uses
77+
`context.makeCurrent()` in a try-with-resources block around each callback,
78+
guaranteeing that the correct span is active when downstream operators execute,
79+
regardless of the thread.
80+
81+
### RxJava Integration
82+
83+
ADK integrates OpenTelemetry with RxJava streams to simplify span creation and
84+
ensure context propagation:
85+
86+
* **Span Creation**: The `Tracing.trace(spanName)` method returns an RxJava
87+
transformer that can be applied to a `Flowable`, `Single`, `Maybe`, or
88+
`Completable` using `.compose()`. This transformer wraps the stream's
89+
execution in a new OpenTelemetry span.
90+
* **Context Propagation**: The `Tracing.withContext(context)` transformer is
91+
used with `.compose()` to ensure that the correct OpenTelemetry `Context`
92+
(and thus the correct parent span) is active when stream operators or
93+
subscriptions are executed, even across thread boundaries.
94+
95+
## Trace Hierarchy Example
96+
97+
A typical agent interaction might produce a trace hierarchy like the following:
98+
99+
```
100+
invocation
101+
└── invoke_agent my_agent
102+
├── call_llm
103+
│ ├── tool_call [search_flights]
104+
│ └── tool_response [search_flights]
105+
└── call_llm
106+
```
107+
108+
This shows:
109+
110+
1. The overall `invocation` started by the `Runner`.
111+
2. The invocation of `my_agent`.
112+
3. The first `call_llm` made by `my_agent`.
113+
4. A `tool_call` to `search_flights` and its corresponding `tool_response`.
114+
5. A second `call_llm` made by `my_agent` to generate the final user response.
115+
116+
### Nested Agents
117+
118+
ADK supports nested agents, where one agent invokes another. If an agent has
119+
sub-agents, it can transfer control to one of them using the built-in
120+
`transfer_to_agent` tool. When `AgentA` calls `transfer_to_agent` to transfer
121+
control to `AgentB`, the `invoke_agent AgentB` span will appear as a child of
122+
the `invoke_agent AgentA` span, like so:
123+
124+
```
125+
invocation
126+
└── invoke_agent AgentA
127+
├── call_llm
128+
│ ├── tool_call [transfer_to_agent]
129+
│ └── tool_response [transfer_to_agent]
130+
└── invoke_agent AgentB
131+
├── call_llm
132+
└── ...
133+
```
134+
135+
This structure allows you to see how `AgentA` delegated work to `AgentB`.
136+
137+
## Span Creation References
138+
139+
The following classes are the primary places where spans are created:
140+
141+
* **`com.google.adk.runner.Runner`**: Initiates the top-level `invocation`
142+
span for `runAsync` and `runLive`.
143+
* **`com.google.adk.agents.BaseAgent`**: Creates the `invoke_agent
144+
<agent_name>` span for each agent execution.
145+
* **`com.google.adk.flows.llmflows.BaseLlmFlow`**: Creates the `call_llm` span
146+
when the LLM is invoked.
147+
* **`com.google.adk.flows.llmflows.Functions`**: Creates `tool_call [...]` and
148+
`tool_response [...]` spans when handling tool calls and responses.
149+
150+
## Configuration
151+
152+
**ADK_CAPTURE_MESSAGE_CONTENT_IN_SPANS**: This environment variable controls
153+
whether LLM request/response content and tool arguments/responses are captured
154+
in span attributes. It defaults to `true`. Set to `false` to exclude potentially
155+
large or sensitive data from traces, in which case a `{}` JSON object will be
156+
recorded instead.

0 commit comments

Comments
 (0)