From 83325344c8b43897ebace2f4621096f59f8e1460 Mon Sep 17 00:00:00 2001 From: Paulo Borges Date: Tue, 3 Feb 2026 18:22:09 -0300 Subject: [PATCH 01/19] base --- .../pages/observability/concepts.adoc | 148 ++++++++++++++++++ 1 file changed, 148 insertions(+) create mode 100644 modules/ai-agents/pages/observability/concepts.adoc diff --git a/modules/ai-agents/pages/observability/concepts.adoc b/modules/ai-agents/pages/observability/concepts.adoc new file mode 100644 index 000000000..18878a0dd --- /dev/null +++ b/modules/ai-agents/pages/observability/concepts.adoc @@ -0,0 +1,148 @@ += Agent Concepts +:description: Understand how agents execute, manage context, invoke tools, and handle errors. +:page-topic-type: concepts +:personas: agent_developer, streaming_developer, data_engineer +:learning-objective-1: Explain how agents execute reasoning loops and make tool invocation decisions +:learning-objective-2: Describe how agents manage context and state across interactions +:learning-objective-3: Identify error handling strategies for agent failures + +Agents execute through a reasoning loop where the LLM analyzes context, decides which tools to invoke, processes results, and repeats until the task completes. Understanding this execution model helps you design reliable agent systems. + +After reading this page, you will be able to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} + +== Agent execution model + +Every agent request follows a reasoning loop. The agent doesn't execute all tool calls at once. Instead, it makes decisions iteratively. + +=== The reasoning loop + +When an agent receives a request: + +. The LLM receives the context, including system prompt, conversation history, user request, and previous tool results. +. The LLM chooses to invoke a tool, requests more information, or responds to user. +. The tool runs and returns results if invoked. +. The tool's results are added to conversation history. +. The LLM reasons again with an expanded context. + +The loop continues until one of these conditions is met: + +* Agent completes the task and responds to the user +* Agent reaches max iterations limit +* Agent encounters an unrecoverable error + +=== Why iterations matter + +Each iteration includes three phases: + +. **LLM reasoning**: The model processes the growing context to decide the next action. +. **Tool invocation**: If the agent decides to call a tool, execution happens and waits for results. +. **Context expansion**: Tool results are added to the conversation history for the next iteration. + +With higher iteration limits, agents can complete complex tasks but costs more and takes longer. + +With lower iteration limits, agents respond faster and cheaper but may fail on complex requests. + +==== Cost calculation + +Calculate the approximate cost per request by estimating average context tokens per iteration: + +---- +Cost per request = (iterations x context tokens x model price per token) +---- + +Example with 30 iterations at $0.000002 per token: + +---- +Iteration 1: 500 tokens x $0.000002 = $0.001 +Iteration 15: 2000 tokens x $0.000002 = $0.004 +Iteration 30: 4000 tokens x $0.000002 = $0.008 + +Total: ~$0.013 per request +---- + +Actual costs vary based on: + +* Tool result sizes (large results increase context) +* Model pricing (varies by provider and model tier) +* Task complexity (determines iteration count) + +Setting max iterations creates a cost/capability trade-off: + +[cols="1,1,2,1", options="header"] +|=== +|Limit |Range |Use Case |Cost + +|Low +|10-20 +|Simple queries, single tool calls +|Cost-effective + +|Medium +|30-50 +|Multi-step workflows, tool chaining +|Balanced + +|High +|50-100 +|Complex analysis, exploratory tasks +|Higher +|=== + +Iteration limits prevent runaway costs when agents encounter complex or ambiguous requests. + +== MCP tool invocation patterns + +MCP tools extend agent capabilities beyond text generation. Understanding when and how tools execute helps you design effective tool sets. + +=== Synchronous tool execution + +In Redpanda Cloud, tool calls block the agent. When the agent decides to invoke a tool, it pauses and waits while the tool executes (querying a database, calling an API, or processing data). When the tool returns its result, the agent resumes reasoning. + +This synchronous model means latency adds up across multiple tool calls, the agent sees tool results sequentially rather than in parallel, and long-running tools can delay or fail agent requests due to timeouts. + +=== Tool selection decisions + +The LLM decides which tool to invoke based on system prompt guidance (such as "Use get_orders when customer asks about history"), tool descriptions from the MCP schema that define parameters and purpose, and conversation context where previous tool results influence the next tool choice. Agents can invoke the same tool multiple times with different parameters if the task requires it. + +=== Tool chaining + +Agents chain tools when one tool's output feeds another tool's input. For example, an agent might first call `get_customer_info(customer_id)` to retrieve details, then use that data to call `get_order_history(customer_email)`. + +Tool chaining requires sufficient max iterations because each step in the chain consumes one iteration. + +=== Tool granularity considerations + +Tool design affects agent behavior. Coarse-grained tools that do many things result in fewer tool calls but less flexibility and more complex implementation. Fine-grained tools that each do one thing require more tool calls but offer higher composability and simpler implementation. + +Choose granularity based on how often you'll reuse tool logic across workflows, whether intermediate results help with debugging, and how much control you want over tool invocation order. + +For tool design guidance, see xref:ai-agents:mcp/remote/best-practices.adoc[]. + +== Context and state management + +Agents handle two types of information: conversation context (what's been discussed) and state (persistent data across sessions). + +=== Conversation context + +The agent's context includes the system prompt (always present), user messages, agent responses, tool invocation requests, and tool results. + +As the conversation progresses, context grows. Each tool result adds tokens to the context window, which the LLM uses for reasoning in subsequent iterations. + +=== Context window limits + +LLM context windows limit how much history fits. Small models support 8K-32K tokens, medium models support 32K-128K tokens, and large models support 128K-1M+ tokens. + +When context exceeds the limit, the oldest tool results get truncated, the agent loses access to early conversation details, and may ask for information it already retrieved. + +Design workflows to complete within context limits. Avoid unbounded tool chaining. + +== Next steps + +* xref:ai-agents:agents/architecture-patterns.adoc[] +* xref:ai-agents:agents/quickstart.adoc[] +* xref:ai-agents:agents/prompt-best-practices.adoc[] +* xref:ai-agents:mcp/remote/best-practices.adoc[] \ No newline at end of file From 41169e5c4675ac57bd63d2fbcc3924d4cbfe3c5e Mon Sep 17 00:00:00 2001 From: Paulo Borges Date: Tue, 3 Feb 2026 18:27:54 -0300 Subject: [PATCH 02/19] DOC-1901 --- modules/ROOT/nav.adoc | 4 + .../pages/observability/concepts.adoc | 377 +++++++++++---- .../ai-agents/pages/observability/index.adoc | 5 + .../observability/ingest-custom-traces.adoc | 457 ++++++++++++++++++ .../pages/observability/view-transcripts.adoc | 104 ++++ 5 files changed, 861 insertions(+), 86 deletions(-) create mode 100644 modules/ai-agents/pages/observability/index.adoc create mode 100644 modules/ai-agents/pages/observability/ingest-custom-traces.adoc create mode 100644 modules/ai-agents/pages/observability/view-transcripts.adoc diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 4380ada99..93a521c7a 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -42,6 +42,10 @@ **** xref:ai-agents:mcp/local/overview.adoc[Overview] **** xref:ai-agents:mcp/local/quickstart.adoc[Quickstart] **** xref:ai-agents:mcp/local/configuration.adoc[Configure] +** xref:ai-agents:observability/index.adoc[Transcripts] +*** xref:ai-agents:observability/concepts.adoc[Concepts] +** xref:ai-agents:observability/view-transcripts.adoc[View Transcripts] +** xref:ai-agents:observability/ingest-custom-traces.adoc[Ingest Traces from Custom Agents] * xref:develop:connect/about.adoc[Redpanda Connect] ** xref:develop:connect/connect-quickstart.adoc[Quickstart] diff --git a/modules/ai-agents/pages/observability/concepts.adoc b/modules/ai-agents/pages/observability/concepts.adoc index 18878a0dd..5b4d5f08c 100644 --- a/modules/ai-agents/pages/observability/concepts.adoc +++ b/modules/ai-agents/pages/observability/concepts.adoc @@ -1,12 +1,12 @@ -= Agent Concepts -:description: Understand how agents execute, manage context, invoke tools, and handle errors. += Transcripts and AI Observability +:description: Understand how Redpanda captures execution transcripts for agents and MCP servers using OpenTelemetry. :page-topic-type: concepts -:personas: agent_developer, streaming_developer, data_engineer -:learning-objective-1: Explain how agents execute reasoning loops and make tool invocation decisions -:learning-objective-2: Describe how agents manage context and state across interactions -:learning-objective-3: Identify error handling strategies for agent failures +:personas: agent_developer, platform_admin, data_engineer +:learning-objective-1: Explain how transcripts and spans capture execution flow +:learning-objective-2: Interpret transcript structure for debugging and monitoring +:learning-objective-3: Distinguish between transcripts and audit logs -Agents execute through a reasoning loop where the LLM analyzes context, decides which tools to invoke, processes results, and repeats until the task completes. Understanding this execution model helps you design reliable agent systems. +Redpanda automatically captures execution transcripts for both AI agents and MCP servers, providing complete observability into how your agentic systems operate. After reading this page, you will be able to: @@ -14,135 +14,340 @@ After reading this page, you will be able to: * [ ] {learning-objective-2} * [ ] {learning-objective-3} -== Agent execution model +== What are transcripts -Every agent request follows a reasoning loop. The agent doesn't execute all tool calls at once. Instead, it makes decisions iteratively. +Every agent and MCP server automatically emits OpenTelemetry traces to a glossterm:topic[] called `redpanda.otel_traces`. These traces provide detailed observability into operations, creating complete transcripts. -=== The reasoning loop +Transcripts capture: -When an agent receives a request: +* Tool invocations and results +* Agent reasoning steps +* Data processing operations +* External API calls +* Error conditions +* Performance metrics -. The LLM receives the context, including system prompt, conversation history, user request, and previous tool results. -. The LLM chooses to invoke a tool, requests more information, or responds to user. -. The tool runs and returns results if invoked. -. The tool's results are added to conversation history. -. The LLM reasons again with an expanded context. +With 100% sampling, every operation is captured, enabling comprehensive debugging, monitoring, and performance analysis. -The loop continues until one of these conditions is met: +== Traces and spans -* Agent completes the task and responds to the user -* Agent reaches max iterations limit -* Agent encounters an unrecoverable error +OpenTelemetry traces provide a complete picture of how a request flows through your system: -=== Why iterations matter +* A _trace_ represents the entire lifecycle of a request (for example, a tool invocation from start to finish). +* A _span_ represents a single unit of work within that trace (such as a data processing operation or an external API call). +* A trace contains one or more spans organized hierarchically, showing how operations relate to each other. -Each iteration includes three phases: +== Agent transcript hierarchy -. **LLM reasoning**: The model processes the growing context to decide the next action. -. **Tool invocation**: If the agent decides to call a tool, execution happens and waits for results. -. **Context expansion**: Tool results are added to the conversation history for the next iteration. +Agent executions create a hierarchy of spans that reflect how agents process requests. Understanding this hierarchy helps you interpret agent behavior and identify where issues occur. -With higher iteration limits, agents can complete complex tasks but costs more and takes longer. +=== Agent span types -With lower iteration limits, agents respond faster and cheaper but may fail on complex requests. +Agent transcripts contain these span types: -==== Cost calculation +[cols="2,3,3", options="header"] +|=== +| Span Type | Description | Use To + +| `ai-agent` +| Top-level span representing the entire agent invocation from start to finish. Includes all processing time, from receiving the request through executing the reasoning loop, calling tools, and returning the final response. +| Measure total request duration and identify slow agent invocations. + +| `agent` +| Internal agent processing that represents reasoning and decision-making. Shows time spent in the LLM reasoning loop, including context processing, tool selection, and response generation. Multiple `agent` spans may appear when the agent iterates through its reasoning loop. +| Track reasoning time and identify iteration patterns. -Calculate the approximate cost per request by estimating average context tokens per iteration: +| `invoke_agent` +| Agent and sub-agent invocation ( in multi-agent architectures). Represents one agent calling another via the A2A protocol. +| Trace calls between root agents and sub-agents, measure cross-agent latency, and identify which sub-agent was invoked. + +| `openai`, `anthropic`, or other LLM providers +| LLM provider API call showing calls to the language model. The span name matches the provider, and attributes typically include the model name (like `gpt-5.2` or `claude-sonnet-4-5`). +| Identify which model was called, measure LLM response time, and debug LLM API errors. + +| `rpcn-mcp` +| MCP tool invocation representing calls to Remote MCP servers. Shows tool execution time, including network latency and tool processing. Child spans with `instrumentationScope.name` set to `redpanda-connect` represent internal Redpanda Connect processing. +| Measure tool execution time and identify slow MCP tool calls. +|=== + +=== Typical agent execution flow + +A simple agent request creates this hierarchy: ---- -Cost per request = (iterations x context tokens x model price per token) +ai-agent (6.65 seconds) +├── agent (6.41 seconds) +│ ├── invoke_agent: customer-support-agent (6.39 seconds) +│ │ └── openai: chat gpt-5.2 (6.2 seconds) ---- -Example with 30 iterations at $0.000002 per token: +This shows: ----- -Iteration 1: 500 tokens x $0.000002 = $0.001 -Iteration 15: 2000 tokens x $0.000002 = $0.004 -Iteration 30: 4000 tokens x $0.000002 = $0.008 +1. Total agent invocation: 6.65 seconds +2. Agent reasoning: 6.41 seconds +3. Sub-agent call: 6.39 seconds (most of the time) +4. LLM API call: 6.2 seconds (the actual bottleneck) + +Examine span durations to identify where time is spent and optimize accordingly. + +== MCP server transcript hierarchy + +MCP server tool invocations produce a different span hierarchy focused on tool execution and internal processing. This structure reveals performance bottlenecks and helps debug tool-specific issues. + +=== MCP server span types + +MCP server transcripts contain these span types: + +[cols="2,3,3", options="header"] +|=== +| Span Type | Description | Use To + +| `mcp-{server-id}` +| Top-level span representing the entire MCP server invocation. The server ID uniquely identifies the MCP server instance. This span encompasses all tool execution from request receipt to response completion. +| Measure total MCP server response time and identify slow tool invocations. -Total: ~$0.013 per request +| `service` +| Internal service processing span that appears at multiple levels in the hierarchy. Represents Redpanda Connect service operations including routing, processing, and component execution. +| Track internal processing overhead and identify where time is spent in the service layer. + +| Tool name (e.g., `get_order_status`, `get_customer_history`) +| The specific MCP tool being invoked. This span name matches the tool name defined in the MCP server configuration. +| Identify which tool was called and measure tool-specific execution time. + +| `processors` +| Processor pipeline execution span showing the collection of processors that process the tool's data. Appears as a child of the tool invocation span. +| Measure total processor pipeline execution time. + +| Processor name (e.g., `mapping`, `http`, `branch`) +| Individual processor execution span representing a single Redpanda Connect processor. The span name matches the processor type. +| Identify slow processors and debug processing logic. +|=== + +=== Typical MCP server execution flow + +An MCP tool invocation creates this hierarchy: + +---- +mcp-d5mnvn251oos73 (4.00 seconds) +├── service > get_order_status (4.07 seconds) +│ └── service > processors (43 microseconds) +│ └── service > mapping (18 microseconds) ---- -Actual costs vary based on: +This shows: -* Tool result sizes (large results increase context) -* Model pricing (varies by provider and model tier) -* Task complexity (determines iteration count) +1. Total MCP server invocation: 4.00 seconds +2. Tool execution (get_order_status): 4.07 seconds +3. Processor pipeline: 43 microseconds +4. Mapping processor: 18 microseconds (data transformation) -Setting max iterations creates a cost/capability trade-off: +The majority of time (4+ seconds) is spent in tool execution, while internal processing (mapping) takes only microseconds. This indicates the tool itself (likely making external API calls or database queries) is the bottleneck, not Redpanda Connect's internal processing. -[cols="1,1,2,1", options="header"] +== Transcript layers and scope + +Transcripts contain multiple layers of instrumentation, from HTTP transport through application logic to external service calls. The `scope.name` field in each span identifies which instrumentation layer created that span. + +=== Instrumentation layers + +A complete agent transcript includes these layers: + +[cols="2,2,4", options="header"] |=== -|Limit |Range |Use Case |Cost - -|Low -|10-20 -|Simple queries, single tool calls -|Cost-effective - -|Medium -|30-50 -|Multi-step workflows, tool chaining -|Balanced - -|High -|50-100 -|Complex analysis, exploratory tasks -|Higher +| Layer | Scope Name | Purpose + +| HTTP Server +| `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` +| HTTP transport layer receiving requests. Shows request/response sizes, status codes, client addresses, and network details. + +| AI SDK (Agent) +| `github.com/redpanda-data/ai-sdk-go/plugins/otel` +| Agent application logic. Shows agent invocations, LLM calls, tool executions, conversation IDs, token usage, and model details. Includes `gen_ai.*` semantic convention attributes. + +| HTTP Client +| `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` +| Outbound HTTP calls from agent to MCP servers. Shows target URLs, request methods, and response codes. + +| MCP Server +| `rpcn-mcp` +| MCP server tool execution. Shows tool name, input parameters, result size, and execution time. Appears as a separate `service.name` in resource attributes. + +| Redpanda Connect +| `redpanda-connect` +| Internal Redpanda Connect component execution within MCP tools. Shows pipeline and individual component spans. |=== -Iteration limits prevent runaway costs when agents encounter complex or ambiguous requests. +=== How layers connect + +Layers connect through parent-child relationships in a single transcript: + +---- +ai-agent-http-server (HTTP Server layer) +└── invoke_agent customer-support-agent (AI SDK layer) + ├── chat gpt-5-nano (AI SDK layer, LLM call 1) + ├── execute_tool get_order_status (AI SDK layer) + │ └── HTTP POST (HTTP Client layer) + │ └── get_order_status (MCP Server layer, different service) + │ └── processors (Redpanda Connect layer) + └── chat gpt-5-nano (AI SDK layer, LLM call 2) +---- + +The request flow demonstrates: + +1. HTTP request arrives at agent +2. Agent invokes sub-agent +3. Agent makes first LLM call to decide what to do +4. Agent executes tool, making HTTP call to MCP server +5. MCP server processes tool through its pipeline +6. Agent makes second LLM call with tool results +7. Response returns through HTTP layer + +=== Cross-service transcripts + +When agents call MCP tools, the transcript spans multiple services. Each service has a different `service.name` in the resource attributes: + +* Agent spans: `"service.name": "ai-agent"` +* MCP server spans: `"service.name": "mcp-{server-id}"` + +Both use the same `traceId`, allowing you to follow a request across service boundaries. + +=== Key attributes by layer + +Different layers expose different attributes: + +HTTP Server/Client layer: + +- `http.request.method`, `http.response.status_code` +- `server.address`, `url.path`, `url.full` +- `network.peer.address`, `network.peer.port` +- `http.request.body.size`, `http.response.body.size` + +AI SDK layer: -== MCP tool invocation patterns +- `gen_ai.operation.name`: Operation type (`invoke_agent`, `chat`, `execute_tool`) +- `gen_ai.conversation.id`: Links spans to the same conversation +- `gen_ai.agent.name`: Sub-agent name for multi-agent systems +- `gen_ai.provider.name`, `gen_ai.request.model`: LLM provider and model +- `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`: Token consumption +- `gen_ai.tool.name`, `gen_ai.tool.call.arguments`: Tool execution details +- `gen_ai.input.messages`, `gen_ai.output.messages`: Full LLM conversation context -MCP tools extend agent capabilities beyond text generation. Understanding when and how tools execute helps you design effective tool sets. +MCP Server layer: -=== Synchronous tool execution +- Tool-specific attributes like `order_id`, `customer_id` +- `result_prefix`, `result_length`: Tool result metadata -In Redpanda Cloud, tool calls block the agent. When the agent decides to invoke a tool, it pauses and waits while the tool executes (querying a database, calling an API, or processing data). When the tool returns its result, the agent resumes reasoning. +Redpanda Connect layer: -This synchronous model means latency adds up across multiple tool calls, the agent sees tool results sequentially rather than in parallel, and long-running tools can delay or fail agent requests due to timeouts. +- Component-specific attributes from your tool configuration -=== Tool selection decisions +Use `scope.name` to filter spans by layer when analyzing transcripts. -The LLM decides which tool to invoke based on system prompt guidance (such as "Use get_orders when customer asks about history"), tool descriptions from the MCP schema that define parameters and purpose, and conversation context where previous tool results influence the next tool choice. Agents can invoke the same tool multiple times with different parameters if the task requires it. +== Understand the transcript structure -=== Tool chaining +Each span captures a unit of work. Here's what a typical MCP tool invocation looks like: + +[,json] +---- +{ + "traceId": "71cad555b35602fbb35f035d6114db54", + "spanId": "43ad6bc31a826afd", + "name": "http_processor", + "attributes": [ + {"key": "city_name", "value": {"stringValue": "london"}}, + {"key": "result_length", "value": {"intValue": "198"}} + ], + "startTimeUnixNano": "1765198415253280028", + "endTimeUnixNano": "1765198424660663434", + "instrumentationScope": {"name": "rpcn-mcp"}, + "status": {"code": 0, "message": ""} +} +---- + +Key elements to understand: + +* `traceId`: Links all spans belonging to the same request. Use this to follow a tool invocation through its entire lifecycle. +* `name`: The tool or operation name (`http_processor` in this example). This tells you which component was invoked. +* `instrumentationScope.name`: When this is `rpcn-mcp`, the span represents an MCP tool. When it's `redpanda-connect`, it's internal processing. +* `attributes`: Context about the operation, like input parameters or result metadata. +* `status.code`: `0` means success, `2` means error. + +=== Parent-child relationships + +Transcripts show how operations relate. A tool invocation (parent) may trigger internal operations (children): + +[,json] +---- +{ + "traceId": "71cad555b35602fbb35f035d6114db54", + "spanId": "ed45544a7d7b08d4", + "parentSpanId": "43ad6bc31a826afd", + "name": "http", + "instrumentationScope": {"name": "redpanda-connect"}, + "status": {"code": 0, "message": ""} +} +---- + +The `parentSpanId` links this child span to the parent tool invocation. Both share the same `traceId` so you can reconstruct the complete operation. + +== Error events in transcripts + +When something goes wrong, transcripts capture error details: + +[,json] +---- +{ + "traceId": "71cad555b35602fbb35f035d6114db54", + "spanId": "ba332199f3af6d7f", + "parentSpanId": "43ad6bc31a826afd", + "name": "http_request", + "events": [ + { + "name": "event", + "timeUnixNano": "1765198420254169629", + "attributes": [{"key": "error", "value": {"stringValue": "type"}}] + } + ], + "status": {"code": 0, "message": ""} +} +---- -Agents chain tools when one tool's output feeds another tool's input. For example, an agent might first call `get_customer_info(customer_id)` to retrieve details, then use that data to call `get_order_history(customer_email)`. +The `events` array captures what happened and when. Use `timeUnixNano` to see exactly when the error occurred within the operation. -Tool chaining requires sufficient max iterations because each step in the chain consumes one iteration. +[[opentelemetry-traces-topic]] +== How Redpanda stores trace data -=== Tool granularity considerations +The `redpanda.otel_traces` topic stores OpenTelemetry spans using Redpanda's Schema Registry wire format, with a custom Protobuf schema named `redpanda.otel_traces-value` that follows the https://opentelemetry.io/docs/specs/otel/protocol/[OpenTelemetry Protocol (OTLP)^] specification. Spans include attributes following OpenTelemetry https://opentelemetry.io/docs/specs/semconv/gen-ai/[semantic conventions for generative AI^], such as `gen_ai.operation.name` and `gen_ai.conversation.id`. The schema is automatically registered in the Schema Registry with the topic, so Kafka clients can consume and deserialize trace data correctly. -Tool design affects agent behavior. Coarse-grained tools that do many things result in fewer tool calls but less flexibility and more complex implementation. Fine-grained tools that each do one thing require more tool calls but offer higher composability and simpler implementation. +Redpanda manages both the `redpanda.otel_traces` topic and its schema automatically. If you delete either the topic or the schema, they are recreated automatically. However, deleting the topic permanently deletes all trace data, and the topic comes back empty. Do not produce your own data to this topic. It is reserved for OpenTelemetry traces. -Choose granularity based on how often you'll reuse tool logic across workflows, whether intermediate results help with debugging, and how much control you want over tool invocation order. +=== Topic configuration and lifecycle -For tool design guidance, see xref:ai-agents:mcp/remote/best-practices.adoc[]. +The `redpanda.otel_traces` topic has a predefined retention policy. Configuration changes to this topic are not supported. If you modify settings, Redpanda reverts them to the default values. -== Context and state management +The topic persists in your cluster even after all agents and MCP servers are deleted, allowing you to retain historical trace data for analysis. -Agents handle two types of information: conversation context (what's been discussed) and state (persistent data across sessions). +Transcripts may contain sensitive information from your tool inputs and outputs. Consider implementing appropriate glossterm:ACL[access control lists (ACLs)] for the `redpanda.otel_traces` topic, and review the data in transcripts before sharing or exporting to external systems. -=== Conversation context +== Transcripts compared to audit logs -The agent's context includes the system prompt (always present), user messages, agent responses, tool invocation requests, and tool results. +Transcripts are designed for observability and debugging, not audit logging or compliance. -As the conversation progresses, context grows. Each tool result adds tokens to the context window, which the LLM uses for reasoning in subsequent iterations. +Transcripts provide: -=== Context window limits +* Hierarchical view of request flow through your system (parent-child span relationships) +* Detailed timing information for performance analysis +* Ability to reconstruct execution paths and identify bottlenecks +* Insights into how operations flow through distributed systems -LLM context windows limit how much history fits. Small models support 8K-32K tokens, medium models support 32K-128K tokens, and large models support 128K-1M+ tokens. +Transcripts are not: -When context exceeds the limit, the oldest tool results get truncated, the agent loses access to early conversation details, and may ask for information it already retrieved. +* Immutable audit records for compliance purposes +* Designed for "who did what" accountability tracking -Design workflows to complete within context limits. Avoid unbounded tool chaining. +For compliance and audit requirements, use the session and task topics for agents, which provide records of agent conversations and execution. == Next steps -* xref:ai-agents:agents/architecture-patterns.adoc[] -* xref:ai-agents:agents/quickstart.adoc[] -* xref:ai-agents:agents/prompt-best-practices.adoc[] -* xref:ai-agents:mcp/remote/best-practices.adoc[] \ No newline at end of file +* xref:ai-agents:observability/view-transcripts.adoc[] +* xref:ai-agents:agents/monitor-agents.adoc[] +* xref:ai-agents:mcp/remote/monitor-mcp-servers.adoc[] \ No newline at end of file diff --git a/modules/ai-agents/pages/observability/index.adoc b/modules/ai-agents/pages/observability/index.adoc new file mode 100644 index 000000000..92ba2a5a5 --- /dev/null +++ b/modules/ai-agents/pages/observability/index.adoc @@ -0,0 +1,5 @@ += Transcripts +:page-layout: index +:description: Monitor agent and MCP server execution using complete OpenTelemetry traces captured by Redpanda. + +{description} diff --git a/modules/ai-agents/pages/observability/ingest-custom-traces.adoc b/modules/ai-agents/pages/observability/ingest-custom-traces.adoc new file mode 100644 index 000000000..c66d4b617 --- /dev/null +++ b/modules/ai-agents/pages/observability/ingest-custom-traces.adoc @@ -0,0 +1,457 @@ += Ingest OpenTelemetry Traces from Custom Agents +:description: Configure a Redpanda Connect pipeline to ingest OTEL traces from custom agents into Redpanda for unified observability. +:page-topic-type: how-to +:learning-objective-1: Configure a Redpanda Connect pipeline to receive OpenTelemetry traces from custom agents via HTTP and publish them to redpanda.otel_traces +:learning-objective-2: Validate trace data format and compatibility with existing MCP server traces +:learning-objective-3: Secure the ingestion endpoint using authentication mechanisms + +When you build custom agents or instrument applications outside of Remote MCP servers and declarative agents, you can send OpenTelemetry (OTEL) traces to Redpanda for centralized observability. Deploy a Redpanda Connect pipeline as an HTTP ingestion endpoint to collect and publish traces to the `redpanda.otel_traces` topic. + +After reading this page, you will be able to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} + +== Prerequisites + +* A BYOC cluster +* Ability to manage secrets in Redpanda Cloud +* The latest version of `rpk` installed +* Custom agent or application instrumented with OpenTelemetry SDK +* Basic understanding of the https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/[OpenTelemetry span format^] and https://opentelemetry.io/docs/specs/otlp/[OpenTelemetry Protocol (OTLP)^] + +== Quickstart for LangChain users + +If you're using LangChain with OpenTelemetry tracing, you can send traces to Redpanda's `redpanda.otel_traces` glossterm:topic[] to view them in the Transcripts view. + +. Configure LangChain's OpenTelemetry integration by following the https://docs.langchain.com/langsmith/trace-with-opentelemetry[LangChain documentation^]. + +. Deploy a Redpanda Connect pipeline using the `otlp_http` input to receive OTLP traces over HTTP. Create the pipeline in the **Connect** page of your cluster, or see the <> section below for a sample configuration. + +. Configure your OTEL exporter to send traces to your Redpanda Connect pipeline using environment variables: + +[,bash] +---- +# Configure LangChain OTEL integration +export LANGSMITH_OTEL_ENABLED=true +export LANGSMITH_TRACING=true + +# Send traces to Redpanda Connect pipeline +export OTEL_EXPORTER_OTLP_ENDPOINT="https://:4318" +export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer " +---- + +By default, traces are sent to both LangSmith and your Redpanda Connect pipeline. If you want to send traces only to Redpanda (not LangSmith), set: + +[,bash] +---- +export LANGSMITH_OTEL_ONLY="true" +---- + +Your LangChain application will send traces to the `redpanda.otel_traces` topic, making them visible in the Transcripts view in your cluster alongside Remote MCP server and declarative agent traces. + +For non-LangChain applications or custom instrumentation, continue with the sections below. + +== About custom trace ingestion + +Custom agents include applications you build with OpenTelemetry instrumentation that operate independently of Redpanda's Remote MCP servers or declarative agents. Examples include: + +* Custom AI agents built with LangChain, CrewAI, or other frameworks +* Applications with manual OpenTelemetry instrumentation +* Services that integrate with third-party AI platforms + +When these applications send traces to Redpanda's `redpanda.otel_traces` glossterm:topic[], you gain unified observability across all agentic components in your system. Custom agent transcripts appear alongside Remote MCP server and declarative agent transcripts in the Transcripts view, creating xref:ai-agents:observability/concepts.adoc#cross-service-transcripts[cross-service transcripts] that allow you to correlate operations and analyze end-to-end request flows. + +=== Trace format requirements + +Custom agents must emit traces in OTLP format. The `otlp_http` input accepts both OTLP Protobuf (`application/x-protobuf`) and JSON (`application/json`) payloads. For <>, use the `otlp_grpc` input. + +Each trace must follow the OTLP specification with these required fields: + +[cols="1,3", options="header"] +|=== +| Field | Description + +| `traceId` +| Hex-encoded unique identifier for the entire trace + +| `spanId` +| Hex-encoded unique identifier for this span + +| `name` +| Descriptive operation name + +| `startTimeUnixNano` and `endTimeUnixNano` +| Timing information in nanoseconds + +| `instrumentationScope` +| Identifies the library that created the span + +| `status` +| Operation status with code (0 = OK, 2 = ERROR) +|=== + +Optional but recommended fields: +- `parentSpanId` for hierarchical traces +- `attributes` for contextual information + +For complete trace structure details, see xref:ai-agents:observability/concepts.adoc#understand-the-transcript-structure[Understand the transcript structure]. + +== Configure the ingestion pipeline + +Create a Redpanda Connect pipeline that receives HTTP requests containing OTLP traces and publishes them to the `redpanda.otel_traces` topic. The pipeline uses the `otlp_http` input component, which is specifically designed to receive OpenTelemetry Protocol data. + +=== Create the pipeline configuration + +Create a pipeline configuration file that defines the OTLP HTTP ingestion endpoint. + +The `otlp_http` input component: + +* Exposes an OpenTelemetry Collector HTTP receiver +* Accepts traces at the standard `/v1/traces` endpoint +* Listens on port 4318 by default (standard OTLP/HTTP port) +* Converts incoming OTLP data into individual Redpanda OTEL v1 Protobuf messages and publishes them to the `redpanda.otel_traces` topic + +Create a file named `trace-ingestion.yaml`: + +[,yaml] +---- +input: + otlp_http: + address: "0.0.0.0:4318" + auth_token: "${secrets.TRACE_AUTH_TOKEN}" + max_body_size: 4194304 # 4MB default + read_timeout: "10s" + write_timeout: "10s" + +output: + redpanda: + seed_brokers: ["${REDPANDA_BROKERS}"] + topic: "redpanda.otel_traces" + compression: snappy + max_in_flight: 10 +---- + +The `otlp_http` input automatically handles format conversion, so no processors are needed for basic trace ingestion. Each span becomes a separate message in the `redpanda.otel_traces` topic. + +[[use-grpc]] +==== Alternative: Use gRPC instead of HTTP + +If your custom agent requires gRPC transport, use the `otlp_grpc` input instead: + +[,yaml] +---- +input: + otlp_grpc: + address: "0.0.0.0:4317" # Standard OTLP/gRPC port + auth_token: "${secrets.TRACE_AUTH_TOKEN}" + max_recv_msg_size: 4194304 + +output: + redpanda: + seed_brokers: ["${REDPANDA_BROKERS}"] + topic: "redpanda.otel_traces" + compression: snappy + max_in_flight: 10 +---- + +The gRPC input works identically to HTTP but uses Protobuf encoding over gRPC. Clients must include the authentication token in gRPC metadata as `authorization: Bearer `. + +=== Deploy the pipeline in Redpanda Cloud + +. In the *Connect* page of your Redpanda Cloud cluster, click *Create Pipeline*. +. For the input, select the *otlp_http* (or *otlp_grpc*) component. +. Skip to *Add a topic* and select `redpanda.otel_traces` from the list of existing topics. Leave the default advanced settings. +. In the *Add permissions* step, you can create a service account with write access to the `redpanda.otel_traces` topic. +. In the *Create pipeline* step, enter a name for your ingestion pipeline and paste your `trace-ingestion.yaml` configuration. Ensure that you've created the TRACE_AUTH_TOKEN secret you're referencing in the configuration. + +== Send traces from your custom agent + +Configure your custom agent to send OpenTelemetry traces to the ingestion endpoint. The endpoint accepts traces in OTLP format via HTTP on port 4318 at the `/v1/traces` path. + +=== Configure your OTEL exporter + +Install the OpenTelemetry SDK for your language and configure the OTLP exporter to target your Redpanda Connect pipeline endpoint. + +The exporter configuration requires: + +* **Endpoint**: Your pipeline's URL including the `/v1/traces` path +* **Headers**: Authorization header with your bearer token +* **Protocol**: HTTP to match the `otlp_http` input (or gRPC for `otlp_grpc`) + +.Python example for OTLP HTTP exporter +[,python] +---- +from opentelemetry import trace +from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.sdk.trace.export import BatchSpanProcessor +from opentelemetry.sdk.resources import Resource + +# Configure resource attributes to identify your agent +resource = Resource(attributes={ + "service.name": "my-custom-agent", + "service.version": "1.0.0" +}) + +# Configure the OTLP HTTP exporter +exporter = OTLPSpanExporter( + endpoint=":4318/v1/traces", + headers={"Authorization": "Bearer YOUR_TOKEN"} +) + +# Set up tracing with batch processing +provider = TracerProvider(resource=resource) +processor = BatchSpanProcessor(exporter) +provider.add_span_processor(processor) +trace.set_tracer_provider(provider) + +# Use the tracer with GenAI semantic conventions +tracer = trace.get_tracer(__name__) +with tracer.start_as_current_span( + "invoke_agent my-assistant", + kind=trace.SpanKind.INTERNAL +) as span: + # Set GenAI semantic convention attributes + span.set_attribute("gen_ai.operation.name", "invoke_agent") + span.set_attribute("gen_ai.agent.name", "my-assistant") + span.set_attribute("gen_ai.provider.name", "openai") + span.set_attribute("gen_ai.request.model", "gpt-4") + + # Your agent logic here + result = process_request() + + # Set token usage if available + span.set_attribute("gen_ai.usage.input_tokens", 150) + span.set_attribute("gen_ai.usage.output_tokens", 75) +---- + +.Node.js example for OTLP HTTP exporter +[,javascript] +---- +const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node'); +const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http'); +const { BatchSpanProcessor } = require('@opentelemetry/sdk-trace-base'); +const { Resource } = require('@opentelemetry/resources'); +const { trace, SpanKind } = require('@opentelemetry/api'); + +// Configure resource +const resource = new Resource({ + 'service.name': 'my-custom-agent', + 'service.version': '1.0.0' +}); + +// Configure OTLP HTTP exporter +const exporter = new OTLPTraceExporter({ + url: 'https://your-pipeline-endpoint.redpanda.cloud:4318/v1/traces', + headers: { + 'Authorization': 'Bearer YOUR_TOKEN' + } +}); + +// Set up provider +const provider = new NodeTracerProvider({ resource }); +provider.addSpanProcessor(new BatchSpanProcessor(exporter)); +provider.register(); + +// Use the tracer with GenAI semantic conventions +const tracer = trace.getTracer('my-agent'); +const span = tracer.startSpan('invoke_agent my-assistant', { + kind: SpanKind.INTERNAL +}); + +// Set GenAI semantic convention attributes +span.setAttribute('gen_ai.operation.name', 'invoke_agent'); +span.setAttribute('gen_ai.agent.name', 'my-assistant'); +span.setAttribute('gen_ai.provider.name', 'openai'); +span.setAttribute('gen_ai.request.model', 'gpt-4'); + +// Your agent logic +processRequest().then(result => { + // Set token usage if available + span.setAttribute('gen_ai.usage.input_tokens', 150); + span.setAttribute('gen_ai.usage.output_tokens', 75); + span.end(); +}); +---- + +TIP: Use environment variables for the endpoint URL and authentication token to keep credentials out of your code. + +=== Use recommended semantic conventions + +The Transcripts view recognizes https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/[OpenTelemetry semantic conventions for GenAI operations^]. Following these conventions ensures your traces display correctly with proper attribution, token usage, and operation identification. + +==== Required attributes for agent operations + +Following the OpenTelemetry semantic conventions, agent spans should include these attributes: + +* Operation identification: +** `gen_ai.operation.name` - Set to `"invoke_agent"` for agent execution spans +** `gen_ai.agent.name` - Human-readable name of your agent (displayed in Transcripts view) +* LLM provider details: +** `gen_ai.provider.name` - LLM provider identifier (e.g., `"openai"`, `"anthropic"`, `"gcp.vertex_ai"`) +** `gen_ai.request.model` - Model name (e.g., `"gpt-4"`, `"claude-sonnet-4"`) +* Token usage (for cost tracking): +** `gen_ai.usage.input_tokens` - Number of input tokens consumed +** `gen_ai.usage.output_tokens` - Number of output tokens generated +* Session correlation: +** `gen_ai.conversation.id` - Identifier linking related agent invocations in the same conversation + +==== Example with semantic conventions + +.Python example with GenAI semantic conventions +[,python] +---- +from opentelemetry import trace + +tracer = trace.get_tracer(__name__) + +# Create an agent invocation span +with tracer.start_as_current_span( + "invoke_agent my-assistant", + kind=trace.SpanKind.INTERNAL +) as span: + # Set required attributes + span.set_attribute("gen_ai.operation.name", "invoke_agent") + span.set_attribute("gen_ai.agent.name", "my-assistant") + span.set_attribute("gen_ai.provider.name", "openai") + span.set_attribute("gen_ai.request.model", "gpt-4") + span.set_attribute("gen_ai.conversation.id", "session-abc-123") + + # Your agent logic here + response = process_agent_request(user_input) + + # Set token usage after completion + span.set_attribute("gen_ai.usage.input_tokens", response.usage.input_tokens) + span.set_attribute("gen_ai.usage.output_tokens", response.usage.output_tokens) +---- + +.Node.js example with GenAI semantic conventions +[,javascript] +---- +const { trace } = require('@opentelemetry/api'); + +const tracer = trace.getTracer('my-agent'); + +const span = tracer.startSpan('invoke_agent my-assistant', { + kind: SpanKind.INTERNAL +}); + +// Set required attributes +span.setAttribute('gen_ai.operation.name', 'invoke_agent'); +span.setAttribute('gen_ai.agent.name', 'my-assistant'); +span.setAttribute('gen_ai.provider.name', 'openai'); +span.setAttribute('gen_ai.request.model', 'gpt-4'); +span.setAttribute('gen_ai.conversation.id', 'session-abc-123'); + +// Your agent logic +const response = await processAgentRequest(userInput); + +// Set token usage +span.setAttribute('gen_ai.usage.input_tokens', response.usage.inputTokens); +span.setAttribute('gen_ai.usage.output_tokens', response.usage.outputTokens); + +span.end(); +---- + +=== Validate trace format + +Before deploying to production, verify your traces match the expected format. + +//// + +* How to validate trace format against schema +* Common format issues and solutions +* Tools for format validation +==== + +//// + +Test your agent locally and inspect the traces it produces: + +[,bash] +---- +# Example validation steps + +---- + +== Verify trace ingestion + +After deploying your pipeline and configuring your custom agent, verify traces are flowing correctly. + +=== Consume traces from the topic + +Check that traces are being published to the `redpanda.otel_traces` topic: + +[,bash] +---- +rpk topic consume redpanda.otel_traces --offset end -n 10 +---- + +You can also view the `redpanda.otel_traces` topic in the *Topics* page of Redpanda Cloud UI. + +Look for spans with your custom `instrumentationScope.name` to identify traces from your agent. + +=== View traces in Transcripts + +After your custom agent sends traces through the pipeline, they appear in your cluster's *Agentic AI > Transcripts* view alongside traces from Remote MCP servers and declarative agents. + +==== Identify custom agent transcripts + +Custom agent transcripts are identified by the `service.name` resource attribute, which differs from Redpanda's built-in services (`ai-agent` for declarative agents, `mcp-{server-id}` for MCP servers). See xref:ai-agents:observability/concepts.adoc#cross-service-transcripts[Cross-service transcripts] to understand how the `service.name` attribute identifies transcript sources. + +Your custom agent transcripts display with: + +* **Service name** in the service filter dropdown (from your `service.name` resource attribute) +* **Agent name** in span details (from the `gen_ai.agent.name` attribute) +* **Operation names** like `"invoke_agent my-assistant"` indicating agent executions + +For detailed instructions on filtering, searching, and navigating transcripts in the UI, see xref:ai-agents:observability/view-transcripts.adoc[View Transcripts]. + +==== Token usage tracking + +If your spans include the recommended token usage attributes (`gen_ai.usage.input_tokens` and `gen_ai.usage.output_tokens`), they display in the summary panel's token usage section. This enables cost tracking alongside Remote MCP server and declarative agent transcripts. + +== Troubleshooting + +//// +* Common issues and solutions +* How to monitor pipeline health +* Log locations and debugging techniques +* Failure modes and diagnostics + +//// + +=== Pipeline not receiving requests + +If your custom agent cannot reach the ingestion endpoint: + +. Verify the endpoint URL includes the correct port and path: + * HTTP: `https://your-endpoint:4318/v1/traces` + * gRPC: `https://your-endpoint:4317` +. Check network connectivity and firewall rules. +. Ensure authentication tokens are valid and properly formatted in the `Authorization: Bearer ` header (HTTP) or `authorization` metadata field (gRPC). +. Verify the Content-Type header matches your data format (`application/x-protobuf` or `application/json`). +. Review pipeline logs for connection errors or authentication failures. + +=== Traces not appearing in topic + +If requests succeed but traces do not appear in `redpanda.otel_traces`: + +. Check pipeline output configuration. +. Verify topic permissions. +. Validate trace format matches OTLP specification. + +== Limitations + +* The `otlp_http` and `otlp_grpc` inputs accept only traces, logs, and metrics, not profiles. +* Only traces are published to the `redpanda.otel_traces` topic. +* Exceeded rate limits return HTTP 429 (HTTP) or ResourceExhausted status (gRPC). + +== Next steps + +* xref:ai-agents:observability/view-transcripts.adoc[] +* xref:ai-agents:agents/monitor-agents.adoc[Observability for declarative agents] +* https://docs.redpanda.com/redpanda-connect/components/inputs/otlp_http/[OTLP HTTP input reference^] - Complete configuration options for the `otlp_http` component +* https://docs.redpanda.com/redpanda-connect/components/inputs/otlp_grpc/[OTLP gRPC input reference^] - Alternative gRPC-based trace ingestion \ No newline at end of file diff --git a/modules/ai-agents/pages/observability/view-transcripts.adoc b/modules/ai-agents/pages/observability/view-transcripts.adoc new file mode 100644 index 000000000..5d3506db6 --- /dev/null +++ b/modules/ai-agents/pages/observability/view-transcripts.adoc @@ -0,0 +1,104 @@ += View Transcripts +:description: Learn how to filter and navigate the Transcripts interface to investigate agent execution traces using multiple detail views and interactive timeline navigation. +:page-topic-type: how-to +:personas: agent_developer, platform_admin +:learning-objective-1: Filter transcripts to find specific execution traces +:learning-objective-2: Navigate between detail views to inspect span information at different levels +:learning-objective-3: Use the timeline interactively to navigate to specific time periods + +The Transcripts view provides filtering and navigation capabilities for investigating agent, MCP server, and AI Gateway execution glossterm:transcript[transcripts]. Use this view to quickly locate specific operations, analyze performance patterns, and debug issues across glossterm:tool[] invocations, LLM calls, and glossterm:agent[] reasoning steps. + +After reading this page, you will be able to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} + +For basic orientation on monitoring each Redpanda Agentic Data Plane component, see: + +* xref:ai-agents:ai-gateway/observability-metrics.adoc[] +* xref:ai-agents:agents/monitor-agents.adoc[] +* xref:ai-agents:mcp/remote/monitor-mcp-servers.adoc[] + +For conceptual background on what transcripts capture and how glossterm:span[spans] are organized hierarchically, see xref:ai-agents:observability/concepts.adoc[]. + +== Prerequisites + +* xref:ai-agents:agents/create-agent.adoc[Running agent] or xref:ai-agents:mcp/remote/quickstart.adoc[MCP server] with at least one execution +* Access to the Transcripts view (requires appropriate permissions to read the `redpanda.otel_traces` topic) + +== Navigate the Transcripts interface + +=== Use the interactive timeline + +Use the timeline visualization to quickly identify when errors began or patterns changed, and navigate directly to transcripts from particular timestamps. + +When viewing time periods with many transcripts (hundreds or thousands), the timeline displays a subset of the data to maintain performance and usability. The timeline bar indicates the actual time range of currently visible data, which may be narrower than your <>. + +TIP: See xref:ai-agents:agents/monitor-agents.adoc[] and xref:ai-agents:mcp/remote/monitor-mcp-servers.adoc[] to learn basic execution patterns and health indicators to investigate. + +=== Filter transcripts + +Use filters to narrow down transcripts and quickly locate specific executions. When you use any of the filters, the transcript list updates to show only matching results. You can toggle *Full transcript* on to see the complete execution context, in grayed-out text, for the filtered transcripts. + +==== Filter by attribute + +// Add details when available + +==== Adjust time range + +Use the time range selector to focus on specific time periods (from the last five minutes up to the last 24 hours): + +* View recent executions (for example, over the last hour) to monitor real-time activity +* Expand to longer periods for trend analysis over the last day +* Narrow to specific time windows when investigating issues that occurred at known times + +== Inspect span details + +Each row in the transcript table represents a high-level agent or MCP server request flow. Expand each parent glossterm:span[] to see the xref:ai-agents:observability/concepts.adoc#agent-transcript-hierarchy[hierarchical structure] of nested operations, including tool calls, LLM interactions, and internal processing steps. Parent-child spans show how operations relate: for example, an agent invocation (parent) triggers LLM calls and tool executions (children). + +When agents invoke remote MCP servers, transcripts fold together under a tree structure to provide a unified view of the complete operation across service boundaries. The glossterm:trace ID[] originates at the initial request touchpoint and propagates across all involved services, linking spans from both the agent and MCP server under a single transcript. Use the tree view to follow the trace flow across multiple services and understand the complete request lifecycle. + +If you use external agents that directly invoke MCP servers in the Redpanda Agentic Data Plane, you may only see MCP-level parent transcripts, unless you have configured the agents to also emit traces to the Redpanda glossterm:OpenTelemetry[OTEL] ingestion pipeline. + +Selected spans display detailed information at multiple levels, from high-level summaries to complete raw data: + +* Start with summary view for quick assessment +* Inspect attributes for detailed investigation +* Use raw data when you need complete information + +=== Summary view + +The summary panel provides high-level span information: + +* Total nested operations (span count) and execution time +* Token usage for LLM operations +* Counts of LLM calls and tool calls + +Click on an individual span to drill down into the execution context: + +* View the full conversation history saved for that session, including user prompts, configured xref:ai-agents:agents/create-agent.adoc#write-the-system-prompt[system prompts] to guide agent behavior, and LLM outputs +* Inspect individual tool calls made by the agent and any of its sub-agents, including request arguments and responses + +TIP: Expand the summary panel to full view to easily read long conversations. + +=== Detailed attributes view + +The attributes view shows structured metadata for each transcript span. Use this view to inspect span attributes and understand the context of each operation. See xref:ai-agents:observability/concepts.adoc#key-attributes-by-layer[Transcripts and AI Observability] for details on standard attributes by instrumentation layer. + +=== Raw data view + +The raw data view provides the complete span structure: + +* Full OpenTelemetry span in JSON format +* All fields including those not displayed in summary or attributes views +* Structured data suitable for export or programmatic access + +You can also view the raw transcript data in the `redpanda.otel_traces` topic. + +== Next steps + +* xref:ai-agents:agents/monitor-agents.adoc[] +* xref:ai-agents:mcp/remote/monitor-mcp-servers.adoc[] +* xref:ai-agents:observability/concepts.adoc[] +* xref:ai-agents:agents/troubleshooting.adoc[] \ No newline at end of file From 5dc154612319298de5d02939324c7f79c3af6528 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Tue, 3 Feb 2026 19:54:54 -0800 Subject: [PATCH 03/19] Change filename and update with UI changes --- .../pages/observability/transcripts.adoc | 190 ++++++++++++++++++ .../pages/observability/view-transcripts.adoc | 104 ---------- 2 files changed, 190 insertions(+), 104 deletions(-) create mode 100644 modules/ai-agents/pages/observability/transcripts.adoc delete mode 100644 modules/ai-agents/pages/observability/view-transcripts.adoc diff --git a/modules/ai-agents/pages/observability/transcripts.adoc b/modules/ai-agents/pages/observability/transcripts.adoc new file mode 100644 index 000000000..8b3d66add --- /dev/null +++ b/modules/ai-agents/pages/observability/transcripts.adoc @@ -0,0 +1,190 @@ += View Transcripts +:description: Learn how to filter and navigate the Transcripts interface to investigate agent execution traces using multiple detail views and interactive timeline navigation. +:page-topic-type: how-to +:personas: agent_developer, platform_admin +:learning-objective-1: Filter transcripts to find specific execution traces +:learning-objective-2: Use the timeline interactively to navigate to specific time periods +:learning-objective-3: Navigate between detail views to inspect span information at different levels + +The Transcripts view provides filtering and navigation capabilities for investigating agent, MCP server, and AI Gateway execution glossterm:transcript[transcripts]. Use this view to quickly locate specific operations, analyze performance patterns, and debug issues across glossterm:tool[] invocations, LLM calls, and glossterm:agent[] reasoning steps. + +After reading this page, you will be able to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} + +For basic orientation on monitoring each Redpanda Agentic Data Plane component, see: + +* xref:ai-agents:ai-gateway/observability-metrics.adoc[] +* xref:ai-agents:agents/monitor-agents.adoc[] +* xref:ai-agents:mcp/remote/monitor-mcp-servers.adoc[] + +For conceptual background on what transcripts capture and how glossterm:span[spans] are organized hierarchically, see xref:ai-agents:observability/concepts.adoc[]. + +== Prerequisites + +* xref:ai-agents:agents/create-agent.adoc[Running agent] or xref:ai-agents:mcp/remote/quickstart.adoc[MCP server] with at least one execution +* Access to the Transcripts view (requires appropriate permissions to read the `redpanda.otel_traces` topic) + +== Navigate the Transcripts interface + +=== Filter transcripts + +Use filters to narrow down transcripts and quickly locate specific executions. When you use any of the filters, the transcript list updates to show only matching results. + +The Transcripts view provides several quick-filter buttons to focus on specific operation types: + +* *Service*: Isolate operations from a particular component in your agentic data plane (agents, MCP servers, or AI Gateway) +* *LLM Calls*: Spans representing large language model (LLM) invocations, including chat completions and embeddings +* *Tool Calls*: Spans where tools were executed by agents +* *Agent Spans*: Agent invocation and reasoning spans +* *Errors Only*: Transcripts containing failed operations or errors +* *Slow (>5s)*: Operations that exceeded five seconds in duration, useful for performance investigation + +You can combine multiple filters to narrow results further. For example, use *Tool Calls* and *Errors Only* together to investigate failed tool executions. + +Toggle *Full traces* on to see the complete execution context, in grayed-out text, for the filtered transcripts. + +==== Filter by attribute + +Click the *Attribute* button to query exact matches on specific span metadata such as the following: + +* Agent names +* LLM model names, for example, `gemini-3-flash-preview` +* Tool names +* Span and trace IDs + +You can add multiple attribute filters to refine results. + +==== Adjust time range + +Use the time range selector to focus on specific time periods (from the last five minutes up to the last 24 hours): + +* View recent executions, for example, over the last hour, to monitor real-time activity +* Expand to longer periods for trend analysis over the last day + +=== Use the interactive timeline + +Use the timeline visualization to quickly identify when errors began or patterns changed, and navigate directly to transcripts from specific time windows when investigating issues that occurred at known times + +The timeline displays transcript volume as a bar chart. Each bar represents a time bucket that recalibrates dynamically based on your <>, with color-coded indicators: + +* Green: Successful operations +* Red: Operations with errors + +Click on any bar in the timeline to zoom into transcripts from that specific time period. The transcript table automatically scrolls to show operations from the time bucket in view. + +[NOTE] +==== +When viewing time ranges with many transcripts (hundreds or thousands), the table displays a subset of the data to maintain performance and usability. The timeline bar indicates the actual time range of currently loaded data, which may be narrower than your selected time range. + +Refer to the timeline header to check the exact range and count of visible transcripts, for example, "Showing 100 of 299 transcripts from 13:17 to 15:16". +==== + +== Inspect span details + +Each row in the transcript table represents a high-level agent or MCP server request flow. The table displays the following: + +* Time: Timestamp when the span started (sortable) +* Span: Span type indicator and span name, with hierarchical tree structure +* Duration: Total duration, or duration of child spans relative to the parent span, represented as visual bars + +Expand each parent span to see the xref:ai-agents:observability/concepts.adoc#agent-transcript-hierarchy[hierarchical structure] of nested operations, including internal processing steps, LLM interactions, and tool calls. Parent-child spans show how operations relate: for example, an agent invocation (parent) triggers LLM calls and tool executions (children). Use the *Collapse all* option to quickly fold all expanded spans. + +// TODO: Clarify MCP trace structure +When agents invoke remote MCP servers, transcripts fold together under a tree structure to provide a unified view of the complete operation across service boundaries. The glossterm:trace ID[] originates at the initial request touchpoint and propagates across all involved services, linking spans from both the agent and MCP server under a single transcript. Use the tree view to follow the trace flow across multiple services and understand the complete request lifecycle. + +// TODO: Confirm how transcripts from external agents appear +If you use external agents that directly invoke MCP servers in the Redpanda Agentic Data Plane, you may only see MCP-level parent transcripts, unless you have configured the agents to also emit traces to the Redpanda glossterm:OpenTelemetry[OTEL] ingestion pipeline. + +// TODO: Confirm how gateway traces appear + +Selected spans display detailed information at multiple levels, from high-level summaries to complete raw data: + +* Start with summary tab for quick assessment +* Inspect attributes for detailed investigation using structured metadata +* Use raw data when you need complete information + +[NOTE] +==== +Rows labeled "awaiting root — waiting for parent span" indicate incomplete transcripts where child spans have been received but the parent span is missing or hasn't arrived yet. This can occur due to network latency between services, processing delays in the OpenTelemetry pipeline, or lost parent spans from service failures. +If you consistently see awaiting root entries, this suggests instrumentation or trace collection issues that should be investigated. +==== + +=== Summary tab + +Click on any span in the transcript table to open the detail panel on the right side of the interface. The first tab displays a context-specific summary based on the span type (for example, *Tool Call* for tool execution spans). + +For tool call spans, the detail panel shows: + +* *Description*: The purpose and context of the tool call +* *Arguments*: JSON showing the parameters passed to the tool +* *Response*: JSON showing the tool's output or result + +The summary panel for other span types provides high-level information such as: + +* Total nested operations (span count) and execution time +* Token usage for LLM operations +* Counts of LLM calls and tool calls +* Full conversation history for agent spans, including user prompts, configured xref:ai-agents:agents/create-agent.adoc#write-the-system-prompt[system prompts], and LLM outputs + +TIP: Expand the summary panel view to easily read long conversations and complex JSON structures. + +=== Attributes tab + +The attributes view shows structured metadata for each transcript span. Use this view to inspect span attributes and understand the context of each operation. See xref:ai-agents:observability/concepts.adoc#key-attributes-by-layer[Transcripts and AI Observability] for details on standard attributes by instrumentation layer. + +=== Raw data tab + +The raw data view provides the complete span structure: + +* Full OpenTelemetry span in JSON format +* All fields including those not displayed in summary or attributes views +* Structured data suitable for export or programmatic access + +You can also view the raw transcript data in the `redpanda.otel_traces` topic. + +== Investigate and analyze operations + +The following patterns demonstrate how to use the Transcripts view for understanding and troubleshooting your agentic systems. + +=== Debug errors + +. Use *Errors Only* to filter for failed operations, or review the timeline to identify and zoom in to when errors began occurring. +. Expand error spans to examine the failure context. +. Check preceding tool call arguments and LLM responses for root cause. + +=== Investigate performance issues + +. Use the *Slow (>5s)* filter to identify operations with high latency. +. Expand slow spans to identify bottlenecks in the execution tree. +. Compare duration bars across similar operations to spot anomalies. + +=== Analyze tool usage + +. Apply the *Tool Calls* filter and optionally use the *Attribute* filter to focus on a specific tool. +. Review tool execution frequency in the timeline. +. Click individual tool call spans to inspect arguments and responses. +.. Check the Description field to understand tool invocation context. +.. Use the Arguments field to verify correct parameter passing. + +=== Monitor LLM interactions + +. Click *LLM Calls* to focus on model invocations and optionally filter by model name and provider using the *Attribute* filter. +. Review token usage patterns across different time periods. +. Examine conversation history to understand model behavior. +. Spot unexpected model calls or token consumption spikes. + +=== Trace multi-service operations + +. Locate the parent agent or gateway span in the transcript table. +. Use the *Attribute* filter to follow the trace ID through agent and MCP server boundaries. +. Expand the transcript tree to reveal child spans across services. +. Review durations to understand where latency occurs in distributed calls. + +== Next steps + +* xref:ai-agents:agents/monitor-agents.adoc[] +* xref:ai-agents:mcp/remote/monitor-mcp-servers.adoc[] +* xref:ai-agents:agents/troubleshooting.adoc[] \ No newline at end of file diff --git a/modules/ai-agents/pages/observability/view-transcripts.adoc b/modules/ai-agents/pages/observability/view-transcripts.adoc deleted file mode 100644 index 5d3506db6..000000000 --- a/modules/ai-agents/pages/observability/view-transcripts.adoc +++ /dev/null @@ -1,104 +0,0 @@ -= View Transcripts -:description: Learn how to filter and navigate the Transcripts interface to investigate agent execution traces using multiple detail views and interactive timeline navigation. -:page-topic-type: how-to -:personas: agent_developer, platform_admin -:learning-objective-1: Filter transcripts to find specific execution traces -:learning-objective-2: Navigate between detail views to inspect span information at different levels -:learning-objective-3: Use the timeline interactively to navigate to specific time periods - -The Transcripts view provides filtering and navigation capabilities for investigating agent, MCP server, and AI Gateway execution glossterm:transcript[transcripts]. Use this view to quickly locate specific operations, analyze performance patterns, and debug issues across glossterm:tool[] invocations, LLM calls, and glossterm:agent[] reasoning steps. - -After reading this page, you will be able to: - -* [ ] {learning-objective-1} -* [ ] {learning-objective-2} -* [ ] {learning-objective-3} - -For basic orientation on monitoring each Redpanda Agentic Data Plane component, see: - -* xref:ai-agents:ai-gateway/observability-metrics.adoc[] -* xref:ai-agents:agents/monitor-agents.adoc[] -* xref:ai-agents:mcp/remote/monitor-mcp-servers.adoc[] - -For conceptual background on what transcripts capture and how glossterm:span[spans] are organized hierarchically, see xref:ai-agents:observability/concepts.adoc[]. - -== Prerequisites - -* xref:ai-agents:agents/create-agent.adoc[Running agent] or xref:ai-agents:mcp/remote/quickstart.adoc[MCP server] with at least one execution -* Access to the Transcripts view (requires appropriate permissions to read the `redpanda.otel_traces` topic) - -== Navigate the Transcripts interface - -=== Use the interactive timeline - -Use the timeline visualization to quickly identify when errors began or patterns changed, and navigate directly to transcripts from particular timestamps. - -When viewing time periods with many transcripts (hundreds or thousands), the timeline displays a subset of the data to maintain performance and usability. The timeline bar indicates the actual time range of currently visible data, which may be narrower than your <>. - -TIP: See xref:ai-agents:agents/monitor-agents.adoc[] and xref:ai-agents:mcp/remote/monitor-mcp-servers.adoc[] to learn basic execution patterns and health indicators to investigate. - -=== Filter transcripts - -Use filters to narrow down transcripts and quickly locate specific executions. When you use any of the filters, the transcript list updates to show only matching results. You can toggle *Full transcript* on to see the complete execution context, in grayed-out text, for the filtered transcripts. - -==== Filter by attribute - -// Add details when available - -==== Adjust time range - -Use the time range selector to focus on specific time periods (from the last five minutes up to the last 24 hours): - -* View recent executions (for example, over the last hour) to monitor real-time activity -* Expand to longer periods for trend analysis over the last day -* Narrow to specific time windows when investigating issues that occurred at known times - -== Inspect span details - -Each row in the transcript table represents a high-level agent or MCP server request flow. Expand each parent glossterm:span[] to see the xref:ai-agents:observability/concepts.adoc#agent-transcript-hierarchy[hierarchical structure] of nested operations, including tool calls, LLM interactions, and internal processing steps. Parent-child spans show how operations relate: for example, an agent invocation (parent) triggers LLM calls and tool executions (children). - -When agents invoke remote MCP servers, transcripts fold together under a tree structure to provide a unified view of the complete operation across service boundaries. The glossterm:trace ID[] originates at the initial request touchpoint and propagates across all involved services, linking spans from both the agent and MCP server under a single transcript. Use the tree view to follow the trace flow across multiple services and understand the complete request lifecycle. - -If you use external agents that directly invoke MCP servers in the Redpanda Agentic Data Plane, you may only see MCP-level parent transcripts, unless you have configured the agents to also emit traces to the Redpanda glossterm:OpenTelemetry[OTEL] ingestion pipeline. - -Selected spans display detailed information at multiple levels, from high-level summaries to complete raw data: - -* Start with summary view for quick assessment -* Inspect attributes for detailed investigation -* Use raw data when you need complete information - -=== Summary view - -The summary panel provides high-level span information: - -* Total nested operations (span count) and execution time -* Token usage for LLM operations -* Counts of LLM calls and tool calls - -Click on an individual span to drill down into the execution context: - -* View the full conversation history saved for that session, including user prompts, configured xref:ai-agents:agents/create-agent.adoc#write-the-system-prompt[system prompts] to guide agent behavior, and LLM outputs -* Inspect individual tool calls made by the agent and any of its sub-agents, including request arguments and responses - -TIP: Expand the summary panel to full view to easily read long conversations. - -=== Detailed attributes view - -The attributes view shows structured metadata for each transcript span. Use this view to inspect span attributes and understand the context of each operation. See xref:ai-agents:observability/concepts.adoc#key-attributes-by-layer[Transcripts and AI Observability] for details on standard attributes by instrumentation layer. - -=== Raw data view - -The raw data view provides the complete span structure: - -* Full OpenTelemetry span in JSON format -* All fields including those not displayed in summary or attributes views -* Structured data suitable for export or programmatic access - -You can also view the raw transcript data in the `redpanda.otel_traces` topic. - -== Next steps - -* xref:ai-agents:agents/monitor-agents.adoc[] -* xref:ai-agents:mcp/remote/monitor-mcp-servers.adoc[] -* xref:ai-agents:observability/concepts.adoc[] -* xref:ai-agents:agents/troubleshooting.adoc[] \ No newline at end of file From c96ec25c29568a087211b008ca8fea17dd8508fa Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Tue, 3 Feb 2026 19:55:22 -0800 Subject: [PATCH 04/19] Update nav tree with correct filename --- modules/ROOT/nav.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 93a521c7a..2745814e6 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -44,7 +44,7 @@ **** xref:ai-agents:mcp/local/configuration.adoc[Configure] ** xref:ai-agents:observability/index.adoc[Transcripts] *** xref:ai-agents:observability/concepts.adoc[Concepts] -** xref:ai-agents:observability/view-transcripts.adoc[View Transcripts] +** xref:ai-agents:observability/transcripts.adoc[View Transcripts] ** xref:ai-agents:observability/ingest-custom-traces.adoc[Ingest Traces from Custom Agents] * xref:develop:connect/about.adoc[Redpanda Connect] From 43557cbdadc48838a28bcf0955c64bc60747d682 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Tue, 3 Feb 2026 20:20:29 -0800 Subject: [PATCH 05/19] Fix nav tree and add xrefs --- modules/ROOT/nav.adoc | 4 ++-- modules/ai-agents/pages/observability/transcripts.adoc | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 2745814e6..9f381d3b6 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -44,8 +44,8 @@ **** xref:ai-agents:mcp/local/configuration.adoc[Configure] ** xref:ai-agents:observability/index.adoc[Transcripts] *** xref:ai-agents:observability/concepts.adoc[Concepts] -** xref:ai-agents:observability/transcripts.adoc[View Transcripts] -** xref:ai-agents:observability/ingest-custom-traces.adoc[Ingest Traces from Custom Agents] +*** xref:ai-agents:observability/transcripts.adoc[View Transcripts] +*** xref:ai-agents:observability/ingest-custom-traces.adoc[Ingest Traces from Custom Agents] * xref:develop:connect/about.adoc[Redpanda Connect] ** xref:develop:connect/connect-quickstart.adoc[Quickstart] diff --git a/modules/ai-agents/pages/observability/transcripts.adoc b/modules/ai-agents/pages/observability/transcripts.adoc index 8b3d66add..74467546a 100644 --- a/modules/ai-agents/pages/observability/transcripts.adoc +++ b/modules/ai-agents/pages/observability/transcripts.adoc @@ -90,7 +90,7 @@ Each row in the transcript table represents a high-level agent or MCP server req * Span: Span type indicator and span name, with hierarchical tree structure * Duration: Total duration, or duration of child spans relative to the parent span, represented as visual bars -Expand each parent span to see the xref:ai-agents:observability/concepts.adoc#agent-transcript-hierarchy[hierarchical structure] of nested operations, including internal processing steps, LLM interactions, and tool calls. Parent-child spans show how operations relate: for example, an agent invocation (parent) triggers LLM calls and tool executions (children). Use the *Collapse all* option to quickly fold all expanded spans. +Expand each parent span to see the hierarchical structure of nested operations, including internal processing steps, LLM interactions, and tool calls. xref:ai-agents/observability/concepts.adoc#parent-child-relationships[Parent-child spans] show how operations relate: for example, an agent invocation (parent) triggers LLM calls and tool executions (children). Use the *Collapse all* option to quickly fold all expanded spans. // TODO: Clarify MCP trace structure When agents invoke remote MCP servers, transcripts fold together under a tree structure to provide a unified view of the complete operation across service boundaries. The glossterm:trace ID[] originates at the initial request touchpoint and propagates across all involved services, linking spans from both the agent and MCP server under a single transcript. Use the tree view to follow the trace flow across multiple services and understand the complete request lifecycle. @@ -114,9 +114,9 @@ If you consistently see awaiting root entries, this suggests instrumentation or === Summary tab -Click on any span in the transcript table to open the detail panel on the right side of the interface. The first tab displays a context-specific summary based on the span type (for example, *Tool Call* for tool execution spans). +Click on any span in the transcript table to open the detail panel on the right side of the interface. The first tab displays a context-specific summary based on the span type. -For tool call spans, the detail panel shows: +For example, for tool call spans, the summary shows: * *Description*: The purpose and context of the tool call * *Arguments*: JSON showing the parameters passed to the tool From 5d4f2dcc9413d2f25108948c6c90b8db4cb4c717 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Tue, 3 Feb 2026 20:21:43 -0800 Subject: [PATCH 06/19] Minor edits to concepts --- modules/ai-agents/pages/observability/concepts.adoc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/modules/ai-agents/pages/observability/concepts.adoc b/modules/ai-agents/pages/observability/concepts.adoc index 5b4d5f08c..b8be057c7 100644 --- a/modules/ai-agents/pages/observability/concepts.adoc +++ b/modules/ai-agents/pages/observability/concepts.adoc @@ -6,7 +6,7 @@ :learning-objective-2: Interpret transcript structure for debugging and monitoring :learning-objective-3: Distinguish between transcripts and audit logs -Redpanda automatically captures execution transcripts for both AI agents and MCP servers, providing complete observability into how your agentic systems operate. +Redpanda automatically captures transcripts (also referred to as execution logs or traces) for both AI agents and MCP servers, providing complete observability into how your agentic systems operate. After reading this page, you will be able to: @@ -330,6 +330,8 @@ Transcripts may contain sensitive information from your tool inputs and outputs. == Transcripts compared to audit logs +// TODO: Ask SME to review and confirm whether we want to rephrase or change +// "not designed for audit logging or compliance" Transcripts are designed for observability and debugging, not audit logging or compliance. Transcripts provide: From f3bd1bca4af18caccc80e66c88d650e72754c2da Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Tue, 3 Feb 2026 20:41:47 -0800 Subject: [PATCH 07/19] Minor edit --- .../ai-agents/pages/observability/transcripts.adoc | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/modules/ai-agents/pages/observability/transcripts.adoc b/modules/ai-agents/pages/observability/transcripts.adoc index 74467546a..854046653 100644 --- a/modules/ai-agents/pages/observability/transcripts.adoc +++ b/modules/ai-agents/pages/observability/transcripts.adoc @@ -20,7 +20,7 @@ For basic orientation on monitoring each Redpanda Agentic Data Plane component, * xref:ai-agents:agents/monitor-agents.adoc[] * xref:ai-agents:mcp/remote/monitor-mcp-servers.adoc[] -For conceptual background on what transcripts capture and how glossterm:span[spans] are organized hierarchically, see xref:ai-agents:observability/concepts.adoc[]. +For conceptual background on what transcripts capture, glossterm:span[] types, and how they are organized hierarchically, see xref:ai-agents:observability/concepts.adoc[]. == Prerequisites @@ -33,14 +33,14 @@ For conceptual background on what transcripts capture and how glossterm:span[spa Use filters to narrow down transcripts and quickly locate specific executions. When you use any of the filters, the transcript list updates to show only matching results. -The Transcripts view provides several quick-filter buttons to focus on specific operation types: +The Transcripts view provides several quick-filter buttons: * *Service*: Isolate operations from a particular component in your agentic data plane (agents, MCP servers, or AI Gateway) -* *LLM Calls*: Spans representing large language model (LLM) invocations, including chat completions and embeddings -* *Tool Calls*: Spans where tools were executed by agents -* *Agent Spans*: Agent invocation and reasoning spans -* *Errors Only*: Transcripts containing failed operations or errors -* *Slow (>5s)*: Operations that exceeded five seconds in duration, useful for performance investigation +* *LLM Calls*: Inspect large language model (LLM) invocations, including chat completions and embeddings +* *Tool Calls*: View tool executions by agents +* *Agent Spans*: Inspect agent invocation and reasoning +* *Errors Only*: Filter for failed operations or errors +* *Slow (>5s)*: Isolate operations that exceeded five seconds in duration, useful for performance investigation You can combine multiple filters to narrow results further. For example, use *Tool Calls* and *Errors Only* together to investigate failed tool executions. From a7d3e925de1b9d70ce864b77119364ae391de85f Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Tue, 3 Feb 2026 20:52:07 -0800 Subject: [PATCH 08/19] Fix build errors --- modules/ai-agents/pages/observability/concepts.adoc | 2 +- .../ai-agents/pages/observability/ingest-custom-traces.adoc | 4 ++-- modules/ai-agents/pages/observability/transcripts.adoc | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/modules/ai-agents/pages/observability/concepts.adoc b/modules/ai-agents/pages/observability/concepts.adoc index b8be057c7..b35ab3980 100644 --- a/modules/ai-agents/pages/observability/concepts.adoc +++ b/modules/ai-agents/pages/observability/concepts.adoc @@ -350,6 +350,6 @@ For compliance and audit requirements, use the session and task topics for agent == Next steps -* xref:ai-agents:observability/view-transcripts.adoc[] +* xref:ai-agents:observability/transcripts.adoc[] * xref:ai-agents:agents/monitor-agents.adoc[] * xref:ai-agents:mcp/remote/monitor-mcp-servers.adoc[] \ No newline at end of file diff --git a/modules/ai-agents/pages/observability/ingest-custom-traces.adoc b/modules/ai-agents/pages/observability/ingest-custom-traces.adoc index c66d4b617..ce8f11014 100644 --- a/modules/ai-agents/pages/observability/ingest-custom-traces.adoc +++ b/modules/ai-agents/pages/observability/ingest-custom-traces.adoc @@ -407,7 +407,7 @@ Your custom agent transcripts display with: * **Agent name** in span details (from the `gen_ai.agent.name` attribute) * **Operation names** like `"invoke_agent my-assistant"` indicating agent executions -For detailed instructions on filtering, searching, and navigating transcripts in the UI, see xref:ai-agents:observability/view-transcripts.adoc[View Transcripts]. +For detailed instructions on filtering, searching, and navigating transcripts in the UI, see xref:ai-agents:observability/transcripts.adoc[View Transcripts]. ==== Token usage tracking @@ -451,7 +451,7 @@ If requests succeed but traces do not appear in `redpanda.otel_traces`: == Next steps -* xref:ai-agents:observability/view-transcripts.adoc[] +* xref:ai-agents:observability/transcripts.adoc[] * xref:ai-agents:agents/monitor-agents.adoc[Observability for declarative agents] * https://docs.redpanda.com/redpanda-connect/components/inputs/otlp_http/[OTLP HTTP input reference^] - Complete configuration options for the `otlp_http` component * https://docs.redpanda.com/redpanda-connect/components/inputs/otlp_grpc/[OTLP gRPC input reference^] - Alternative gRPC-based trace ingestion \ No newline at end of file diff --git a/modules/ai-agents/pages/observability/transcripts.adoc b/modules/ai-agents/pages/observability/transcripts.adoc index 854046653..9cf7f288e 100644 --- a/modules/ai-agents/pages/observability/transcripts.adoc +++ b/modules/ai-agents/pages/observability/transcripts.adoc @@ -90,7 +90,7 @@ Each row in the transcript table represents a high-level agent or MCP server req * Span: Span type indicator and span name, with hierarchical tree structure * Duration: Total duration, or duration of child spans relative to the parent span, represented as visual bars -Expand each parent span to see the hierarchical structure of nested operations, including internal processing steps, LLM interactions, and tool calls. xref:ai-agents/observability/concepts.adoc#parent-child-relationships[Parent-child spans] show how operations relate: for example, an agent invocation (parent) triggers LLM calls and tool executions (children). Use the *Collapse all* option to quickly fold all expanded spans. +Expand each parent span to see the hierarchical structure of nested operations, including internal processing steps, LLM interactions, and tool calls. xref:ai-agents:observability/concepts.adoc#parent-child-relationships[Parent-child spans] show how operations relate: for example, an agent invocation (parent) triggers LLM calls and tool executions (children). Use the *Collapse all* option to quickly fold all expanded spans. // TODO: Clarify MCP trace structure When agents invoke remote MCP servers, transcripts fold together under a tree structure to provide a unified view of the complete operation across service boundaries. The glossterm:trace ID[] originates at the initial request touchpoint and propagates across all involved services, linking spans from both the agent and MCP server under a single transcript. Use the tree view to follow the trace flow across multiple services and understand the complete request lifecycle. From c258c0cea90d13666948a7fbef54b9b75fc97b0c Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Tue, 3 Feb 2026 21:05:44 -0800 Subject: [PATCH 09/19] Minor edit --- modules/ai-agents/pages/observability/transcripts.adoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/modules/ai-agents/pages/observability/transcripts.adoc b/modules/ai-agents/pages/observability/transcripts.adoc index 9cf7f288e..de536a2bc 100644 --- a/modules/ai-agents/pages/observability/transcripts.adoc +++ b/modules/ai-agents/pages/observability/transcripts.adoc @@ -14,7 +14,7 @@ After reading this page, you will be able to: * [ ] {learning-objective-2} * [ ] {learning-objective-3} -For basic orientation on monitoring each Redpanda Agentic Data Plane component, see: +For basic orientation on monitoring each Redpanda Agentic Data Plane (ADP) component, see: * xref:ai-agents:ai-gateway/observability-metrics.adoc[] * xref:ai-agents:agents/monitor-agents.adoc[] @@ -84,13 +84,13 @@ Refer to the timeline header to check the exact range and count of visible trans == Inspect span details -Each row in the transcript table represents a high-level agent or MCP server request flow. The table displays the following: +The transcript table displays the following: * Time: Timestamp when the span started (sortable) * Span: Span type indicator and span name, with hierarchical tree structure * Duration: Total duration, or duration of child spans relative to the parent span, represented as visual bars -Expand each parent span to see the hierarchical structure of nested operations, including internal processing steps, LLM interactions, and tool calls. xref:ai-agents:observability/concepts.adoc#parent-child-relationships[Parent-child spans] show how operations relate: for example, an agent invocation (parent) triggers LLM calls and tool executions (children). Use the *Collapse all* option to quickly fold all expanded spans. +Each top-level row in the transcript table represents a service-level request flow in an ADP component. Expand each parent span to see the hierarchical structure of nested operations, including internal processing steps, LLM interactions, and tool calls. xref:ai-agents:observability/concepts.adoc#parent-child-relationships[Parent-child spans] show how operations relate: for example, an agent invocation (parent) triggers LLM calls and tool executions (children). Use the *Collapse all* option to quickly fold all expanded spans. // TODO: Clarify MCP trace structure When agents invoke remote MCP servers, transcripts fold together under a tree structure to provide a unified view of the complete operation across service boundaries. The glossterm:trace ID[] originates at the initial request touchpoint and propagates across all involved services, linking spans from both the agent and MCP server under a single transcript. Use the tree view to follow the trace flow across multiple services and understand the complete request lifecycle. From 13c2b8bff1ed24664dcf7de6b0f53f76afeca582 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Johannes=20Br=C3=BCderl?= Date: Thu, 5 Feb 2026 16:06:11 +0100 Subject: [PATCH 10/19] Fix OTLP trace ingestion docs and add code examples (#502) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Fix endpoint URL format: use pipeline URL pattern instead of internal ports (4317/4318) - Simplify pipeline YAML config with Cloud-standard env vars - Add authentication section with link to Cloud API auth docs - Add Go examples for both HTTP and gRPC transports - Add gRPC examples for Python and Node.js - Remove internal implementation details users don't need Co-authored-by: Johannes Brüderl --- .../observability/ingest-custom-traces.adoc | 218 +++++++++++++++--- 1 file changed, 188 insertions(+), 30 deletions(-) diff --git a/modules/ai-agents/pages/observability/ingest-custom-traces.adoc b/modules/ai-agents/pages/observability/ingest-custom-traces.adoc index ce8f11014..f6f2516b2 100644 --- a/modules/ai-agents/pages/observability/ingest-custom-traces.adoc +++ b/modules/ai-agents/pages/observability/ingest-custom-traces.adoc @@ -37,8 +37,8 @@ If you're using LangChain with OpenTelemetry tracing, you can send traces to Red export LANGSMITH_OTEL_ENABLED=true export LANGSMITH_TRACING=true -# Send traces to Redpanda Connect pipeline -export OTEL_EXPORTER_OTLP_ENDPOINT="https://:4318" +# Send traces to Redpanda Connect pipeline (use your pipeline URL) +export OTEL_EXPORTER_OTLP_ENDPOINT="https://.pipelines..clusters.rdpa.co" export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer " ---- @@ -110,27 +110,24 @@ The `otlp_http` input component: * Exposes an OpenTelemetry Collector HTTP receiver * Accepts traces at the standard `/v1/traces` endpoint -* Listens on port 4318 by default (standard OTLP/HTTP port) * Converts incoming OTLP data into individual Redpanda OTEL v1 Protobuf messages and publishes them to the `redpanda.otel_traces` topic -Create a file named `trace-ingestion.yaml`: +The following example shows a minimal pipeline configuration. Redpanda Cloud automatically injects authentication handling, so you don't need to configure `auth_token` in the input. [,yaml] ---- input: - otlp_http: - address: "0.0.0.0:4318" - auth_token: "${secrets.TRACE_AUTH_TOKEN}" - max_body_size: 4194304 # 4MB default - read_timeout: "10s" - write_timeout: "10s" + otlp_http: {} output: redpanda: - seed_brokers: ["${REDPANDA_BROKERS}"] + seed_brokers: + - "${PRIVATE_REDPANDA_BROKERS}" + tls: + enabled: ${PRIVATE_REDPANDA_TLS_ENABLED} + sasl: + - mechanism: "REDPANDA_CLOUD_SERVICE_ACCOUNT" topic: "redpanda.otel_traces" - compression: snappy - max_in_flight: 10 ---- The `otlp_http` input automatically handles format conversion, so no processors are needed for basic trace ingestion. Each span becomes a separate message in the `redpanda.otel_traces` topic. @@ -143,32 +140,46 @@ If your custom agent requires gRPC transport, use the `otlp_grpc` input instead: [,yaml] ---- input: - otlp_grpc: - address: "0.0.0.0:4317" # Standard OTLP/gRPC port - auth_token: "${secrets.TRACE_AUTH_TOKEN}" - max_recv_msg_size: 4194304 + otlp_grpc: {} output: redpanda: - seed_brokers: ["${REDPANDA_BROKERS}"] + seed_brokers: + - "${PRIVATE_REDPANDA_BROKERS}" + tls: + enabled: ${PRIVATE_REDPANDA_TLS_ENABLED} + sasl: + - mechanism: "REDPANDA_CLOUD_SERVICE_ACCOUNT" topic: "redpanda.otel_traces" - compression: snappy - max_in_flight: 10 ---- -The gRPC input works identically to HTTP but uses Protobuf encoding over gRPC. Clients must include the authentication token in gRPC metadata as `authorization: Bearer `. +Clients must include the authentication token in gRPC metadata as `authorization: Bearer `. === Deploy the pipeline in Redpanda Cloud . In the *Connect* page of your Redpanda Cloud cluster, click *Create Pipeline*. . For the input, select the *otlp_http* (or *otlp_grpc*) component. . Skip to *Add a topic* and select `redpanda.otel_traces` from the list of existing topics. Leave the default advanced settings. -. In the *Add permissions* step, you can create a service account with write access to the `redpanda.otel_traces` topic. -. In the *Create pipeline* step, enter a name for your ingestion pipeline and paste your `trace-ingestion.yaml` configuration. Ensure that you've created the TRACE_AUTH_TOKEN secret you're referencing in the configuration. +. In the *Add permissions* step, create a service account with write access to the `redpanda.otel_traces` topic. +. In the *Create pipeline* step, enter a name for your pipeline and paste the configuration. Redpanda Cloud automatically handles authentication for incoming requests. == Send traces from your custom agent -Configure your custom agent to send OpenTelemetry traces to the ingestion endpoint. The endpoint accepts traces in OTLP format via HTTP on port 4318 at the `/v1/traces` path. +Configure your custom agent to send OpenTelemetry traces to the pipeline endpoint. After deploying the pipeline, you can find its URL in the Redpanda Cloud UI on the pipeline details page. + +The endpoint URL format is: + +* **HTTP**: `https://.pipelines..clusters.rdpa.co/v1/traces` +* **gRPC**: `.pipelines..clusters.rdpa.co:443` + +=== Authenticate to the pipeline + +The OTLP pipeline uses the same authentication mechanism as the Redpanda Cloud API. Obtain an access token using your service account credentials as described in xref:redpanda-cloud:security:cloud-authentication.adoc#authenticate-to-the-cloud-api[Authenticate to the Cloud API]. + +Include the token in your requests: + +* **HTTP**: Set the `Authorization` header to `Bearer ` +* **gRPC**: Set the `authorization` metadata field to `Bearer ` === Configure your OTEL exporter @@ -176,7 +187,7 @@ Install the OpenTelemetry SDK for your language and configure the OTLP exporter The exporter configuration requires: -* **Endpoint**: Your pipeline's URL including the `/v1/traces` path +* **Endpoint**: Your pipeline's URL (the SDK adds `/v1/traces` automatically for HTTP) * **Headers**: Authorization header with your bearer token * **Protocol**: HTTP to match the `otlp_http` input (or gRPC for `otlp_grpc`) @@ -197,7 +208,7 @@ resource = Resource(attributes={ # Configure the OTLP HTTP exporter exporter = OTLPSpanExporter( - endpoint=":4318/v1/traces", + endpoint="https://.pipelines..clusters.rdpa.co/v1/traces", headers={"Authorization": "Bearer YOUR_TOKEN"} ) @@ -244,7 +255,7 @@ const resource = new Resource({ // Configure OTLP HTTP exporter const exporter = new OTLPTraceExporter({ - url: 'https://your-pipeline-endpoint.redpanda.cloud:4318/v1/traces', + url: 'https://.pipelines..clusters.rdpa.co/v1/traces', headers: { 'Authorization': 'Bearer YOUR_TOKEN' } @@ -276,8 +287,155 @@ processRequest().then(result => { }); ---- +.Go example for OTLP HTTP exporter +[,go] +---- +package main + +import ( + "context" + "log" + + "go.opentelemetry.io/otel" + "go.opentelemetry.io/otel/attribute" + "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp" + "go.opentelemetry.io/otel/sdk/resource" + sdktrace "go.opentelemetry.io/otel/sdk/trace" + semconv "go.opentelemetry.io/otel/semconv/v1.26.0" + "go.opentelemetry.io/otel/trace" +) + +func main() { + ctx := context.Background() + + // Configure OTLP HTTP exporter + exporter, err := otlptracehttp.New(ctx, + otlptracehttp.WithEndpoint(".pipelines..clusters.rdpa.co"), + otlptracehttp.WithHeaders(map[string]string{ + "Authorization": "Bearer YOUR_TOKEN", + }), + ) + if err != nil { + log.Fatalf("Failed to create exporter: %v", err) + } + + // Configure resource + res, _ := resource.New(ctx, + resource.WithAttributes( + semconv.ServiceName("my-custom-agent"), + semconv.ServiceVersion("1.0.0"), + ), + ) + + // Set up tracer provider + tp := sdktrace.NewTracerProvider( + sdktrace.WithBatcher(exporter), + sdktrace.WithResource(res), + ) + defer tp.Shutdown(ctx) + otel.SetTracerProvider(tp) + + tracer := tp.Tracer("my-agent") + + // Create span with GenAI semantic conventions + _, span := tracer.Start(ctx, "invoke_agent my-assistant", + trace.WithSpanKind(trace.SpanKindInternal), + ) + span.SetAttributes( + attribute.String("gen_ai.operation.name", "invoke_agent"), + attribute.String("gen_ai.agent.name", "my-assistant"), + attribute.String("gen_ai.provider.name", "openai"), + attribute.String("gen_ai.request.model", "gpt-4"), + attribute.Int("gen_ai.usage.input_tokens", 150), + attribute.Int("gen_ai.usage.output_tokens", 75), + ) + span.End() + + tp.ForceFlush(ctx) +} +---- + TIP: Use environment variables for the endpoint URL and authentication token to keep credentials out of your code. +==== gRPC transport examples + +If you're using the `otlp_grpc` input, configure your exporter to use gRPC transport. + +.Python example for OTLP gRPC exporter +[,python] +---- +from opentelemetry import trace +from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.sdk.trace.export import BatchSpanProcessor +from opentelemetry.sdk.resources import Resource + +resource = Resource(attributes={ + "service.name": "my-custom-agent", + "service.version": "1.0.0" +}) + +# gRPC endpoint without https:// prefix +exporter = OTLPSpanExporter( + endpoint=".pipelines..clusters.rdpa.co:443", + headers={"authorization": "Bearer YOUR_TOKEN"} +) + +provider = TracerProvider(resource=resource) +provider.add_span_processor(BatchSpanProcessor(exporter)) +trace.set_tracer_provider(provider) +---- + +.Node.js example for OTLP gRPC exporter +[,javascript] +---- +const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node'); +const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc'); +const { BatchSpanProcessor } = require('@opentelemetry/sdk-trace-base'); +const { Resource } = require('@opentelemetry/resources'); + +const resource = new Resource({ + 'service.name': 'my-custom-agent', + 'service.version': '1.0.0' +}); + +// gRPC exporter with TLS +const exporter = new OTLPTraceExporter({ + url: 'https://.pipelines..clusters.rdpa.co:443', + headers: { + 'authorization': 'Bearer YOUR_TOKEN' + } +}); + +const provider = new NodeTracerProvider({ resource }); +provider.addSpanProcessor(new BatchSpanProcessor(exporter)); +provider.register(); +---- + +.Go example for OTLP gRPC exporter +[,go] +---- +package main + +import ( + "context" + + "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc" + "google.golang.org/grpc" + "google.golang.org/grpc/credentials" +) + +func createGRPCExporter(ctx context.Context) (*otlptracegrpc.Exporter, error) { + return otlptracegrpc.New(ctx, + otlptracegrpc.WithEndpoint(".pipelines..clusters.rdpa.co:443"), + otlptracegrpc.WithDialOption(grpc.WithTransportCredentials(credentials.NewTLS(nil))), + otlptracegrpc.WithHeaders(map[string]string{ + "authorization": "Bearer YOUR_TOKEN", + }), + ) +} +---- + === Use recommended semantic conventions The Transcripts view recognizes https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/[OpenTelemetry semantic conventions for GenAI operations^]. Following these conventions ensures your traces display correctly with proper attribution, token usage, and operation identification. @@ -427,9 +585,9 @@ If your spans include the recommended token usage attributes (`gen_ai.usage.inpu If your custom agent cannot reach the ingestion endpoint: -. Verify the endpoint URL includes the correct port and path: - * HTTP: `https://your-endpoint:4318/v1/traces` - * gRPC: `https://your-endpoint:4317` +. Verify the endpoint URL format: + * HTTP: `https://.pipelines..clusters.rdpa.co/v1/traces` + * gRPC: `.pipelines..clusters.rdpa.co:443` (no `https://` prefix for gRPC clients) . Check network connectivity and firewall rules. . Ensure authentication tokens are valid and properly formatted in the `Authorization: Bearer ` header (HTTP) or `authorization` metadata field (gRPC). . Verify the Content-Type header matches your data format (`application/x-protobuf` or `application/json`). From bf1dccb2032719d9a06556dbbfc7a08ba9a62c03 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Wed, 4 Feb 2026 09:07:24 -0800 Subject: [PATCH 11/19] Edit for length --- .../pages/observability/transcripts.adoc | 32 +++---------------- 1 file changed, 5 insertions(+), 27 deletions(-) diff --git a/modules/ai-agents/pages/observability/transcripts.adoc b/modules/ai-agents/pages/observability/transcripts.adoc index de536a2bc..181dc92ce 100644 --- a/modules/ai-agents/pages/observability/transcripts.adoc +++ b/modules/ai-agents/pages/observability/transcripts.adoc @@ -57,27 +57,15 @@ Click the *Attribute* button to query exact matches on specific span metadata su You can add multiple attribute filters to refine results. -==== Adjust time range - -Use the time range selector to focus on specific time periods (from the last five minutes up to the last 24 hours): - -* View recent executions, for example, over the last hour, to monitor real-time activity -* Expand to longer periods for trend analysis over the last day - === Use the interactive timeline -Use the timeline visualization to quickly identify when errors began or patterns changed, and navigate directly to transcripts from specific time windows when investigating issues that occurred at known times - -The timeline displays transcript volume as a bar chart. Each bar represents a time bucket that recalibrates dynamically based on your <>, with color-coded indicators: - -* Green: Successful operations -* Red: Operations with errors +Use the timeline visualization to quickly identify when errors began or patterns changed, and navigate directly to transcripts from specific time windows when investigating issues that occurred at known times. Click on any bar in the timeline to zoom into transcripts from that specific time period. The transcript table automatically scrolls to show operations from the time bucket in view. [NOTE] ==== -When viewing time ranges with many transcripts (hundreds or thousands), the table displays a subset of the data to maintain performance and usability. The timeline bar indicates the actual time range of currently loaded data, which may be narrower than your selected time range. +When viewing time ranges with many transcripts (hundreds or thousands), the table displays a subset of the data to maintain performance and usability. The timeline bar indicates the actual time range of data currently loaded into the view, which may be narrower than your selected time range. Refer to the timeline header to check the exact range and count of visible transcripts, for example, "Showing 100 of 299 transcripts from 13:17 to 15:16". ==== @@ -90,13 +78,13 @@ The transcript table displays the following: * Span: Span type indicator and span name, with hierarchical tree structure * Duration: Total duration, or duration of child spans relative to the parent span, represented as visual bars -Each top-level row in the transcript table represents a service-level request flow in an ADP component. Expand each parent span to see the hierarchical structure of nested operations, including internal processing steps, LLM interactions, and tool calls. xref:ai-agents:observability/concepts.adoc#parent-child-relationships[Parent-child spans] show how operations relate: for example, an agent invocation (parent) triggers LLM calls and tool executions (children). Use the *Collapse all* option to quickly fold all expanded spans. +Each top-level row in the transcript table represents a service-level request flow in an ADP component. Expand each parent span to see the hierarchical structure of nested operations, including internal processing steps, LLM interactions, and tool calls. xref:ai-agents:observability/concepts.adoc#parent-child-relationships[Parent-child spans] show how operations relate: for example, an agent invocation (parent) triggers LLM calls and tool executions (children). // TODO: Clarify MCP trace structure When agents invoke remote MCP servers, transcripts fold together under a tree structure to provide a unified view of the complete operation across service boundaries. The glossterm:trace ID[] originates at the initial request touchpoint and propagates across all involved services, linking spans from both the agent and MCP server under a single transcript. Use the tree view to follow the trace flow across multiple services and understand the complete request lifecycle. // TODO: Confirm how transcripts from external agents appear -If you use external agents that directly invoke MCP servers in the Redpanda Agentic Data Plane, you may only see MCP-level parent transcripts, unless you have configured the agents to also emit traces to the Redpanda glossterm:OpenTelemetry[OTEL] ingestion pipeline. +If you use external agents that directly invoke MCP servers in the ADP, you may only see MCP-level parent transcripts, unless you have configured the agents to also emit traces to the Redpanda glossterm:OpenTelemetry[OTEL] ingestion pipeline. // TODO: Confirm how gateway traces appear @@ -114,23 +102,13 @@ If you consistently see awaiting root entries, this suggests instrumentation or === Summary tab -Click on any span in the transcript table to open the detail panel on the right side of the interface. The first tab displays a context-specific summary based on the span type. - -For example, for tool call spans, the summary shows: - -* *Description*: The purpose and context of the tool call -* *Arguments*: JSON showing the parameters passed to the tool -* *Response*: JSON showing the tool's output or result - -The summary panel for other span types provides high-level information such as: +Click on any span in the transcript table to open the detail panel on the right side of the interface. The first tab displays a context-specific summary based on the span type, which may include: * Total nested operations (span count) and execution time * Token usage for LLM operations * Counts of LLM calls and tool calls * Full conversation history for agent spans, including user prompts, configured xref:ai-agents:agents/create-agent.adoc#write-the-system-prompt[system prompts], and LLM outputs -TIP: Expand the summary panel view to easily read long conversations and complex JSON structures. - === Attributes tab The attributes view shows structured metadata for each transcript span. Use this view to inspect span attributes and understand the context of each operation. See xref:ai-agents:observability/concepts.adoc#key-attributes-by-layer[Transcripts and AI Observability] for details on standard attributes by instrumentation layer. From abfcd1652646e9cfdaf7f516da47588381463ca2 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Thu, 5 Feb 2026 14:34:00 -0500 Subject: [PATCH 12/19] Address SME feedback --- modules/ai-agents/pages/observability/concepts.adoc | 12 ++++++------ .../ai-agents/pages/observability/transcripts.adoc | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/modules/ai-agents/pages/observability/concepts.adoc b/modules/ai-agents/pages/observability/concepts.adoc index b35ab3980..04ceb472f 100644 --- a/modules/ai-agents/pages/observability/concepts.adoc +++ b/modules/ai-agents/pages/observability/concepts.adoc @@ -1,5 +1,5 @@ = Transcripts and AI Observability -:description: Understand how Redpanda captures execution transcripts for agents and MCP servers using OpenTelemetry. +:description: Understand how Redpanda captures execution transcripts using OpenTelemetry. :page-topic-type: concepts :personas: agent_developer, platform_admin, data_engineer :learning-objective-1: Explain how transcripts and spans capture execution flow @@ -16,7 +16,7 @@ After reading this page, you will be able to: == What are transcripts -Every agent and MCP server automatically emits OpenTelemetry traces to a glossterm:topic[] called `redpanda.otel_traces`. These traces provide detailed observability into operations, creating complete transcripts. +The AI Gateway and every agent and MCP server in your Agentic Data Plane (ADP) automatically emit OpenTelemetry traces to a glossterm:topic[] called `redpanda.otel_traces`. These traces provide detailed observability into operations, creating complete transcripts. Transcripts capture: @@ -58,7 +58,7 @@ Agent transcripts contain these span types: | Track reasoning time and identify iteration patterns. | `invoke_agent` -| Agent and sub-agent invocation ( in multi-agent architectures). Represents one agent calling another via the A2A protocol. +| Agent and sub-agent invocation in multi-agent architectures, following the https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/[OpenTelemetry agent invocation semantic conventions^]. Represents one agent calling another via the A2A protocol. | Trace calls between root agents and sub-agents, measure cross-agent latency, and identify which sub-agent was invoked. | `openai`, `anthropic`, or other LLM providers @@ -214,17 +214,17 @@ Both use the same `traceId`, allowing you to follow a request across service bou Different layers expose different attributes: -HTTP Server/Client layer: +HTTP Server/Client layer (following https://opentelemetry.io/docs/specs/semconv/http/http-spans/[OpenTelemetry semantic conventions for HTTP^]): - `http.request.method`, `http.response.status_code` - `server.address`, `url.path`, `url.full` - `network.peer.address`, `network.peer.port` - `http.request.body.size`, `http.response.body.size` -AI SDK layer: +AI SDK layer (following https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/[OpenTelemetry semantic conventions for generative AI^]): - `gen_ai.operation.name`: Operation type (`invoke_agent`, `chat`, `execute_tool`) -- `gen_ai.conversation.id`: Links spans to the same conversation +- `gen_ai.conversation.id`: Links spans to the same conversation session. A conversation may include multiple agent invocations (one per user request). Each invocation creates a separate trace that shares the same conversation ID. - `gen_ai.agent.name`: Sub-agent name for multi-agent systems - `gen_ai.provider.name`, `gen_ai.request.model`: LLM provider and model - `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`: Token consumption diff --git a/modules/ai-agents/pages/observability/transcripts.adoc b/modules/ai-agents/pages/observability/transcripts.adoc index 181dc92ce..0190cfc31 100644 --- a/modules/ai-agents/pages/observability/transcripts.adoc +++ b/modules/ai-agents/pages/observability/transcripts.adoc @@ -1,5 +1,5 @@ = View Transcripts -:description: Learn how to filter and navigate the Transcripts interface to investigate agent execution traces using multiple detail views and interactive timeline navigation. +:description: Learn how to filter and navigate the Transcripts interface to investigate execution traces using multiple detail views and interactive timeline navigation. :page-topic-type: how-to :personas: agent_developer, platform_admin :learning-objective-1: Filter transcripts to find specific execution traces From 8a1465c4f82308728a70738b458bfc86c3aa52d0 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Thu, 5 Feb 2026 14:57:12 -0500 Subject: [PATCH 13/19] Trim Transcripts UI doc --- .../pages/observability/transcripts.adoc | 62 ++++--------------- 1 file changed, 13 insertions(+), 49 deletions(-) diff --git a/modules/ai-agents/pages/observability/transcripts.adoc b/modules/ai-agents/pages/observability/transcripts.adoc index 0190cfc31..f8356a861 100644 --- a/modules/ai-agents/pages/observability/transcripts.adoc +++ b/modules/ai-agents/pages/observability/transcripts.adoc @@ -6,7 +6,9 @@ :learning-objective-2: Use the timeline interactively to navigate to specific time periods :learning-objective-3: Navigate between detail views to inspect span information at different levels -The Transcripts view provides filtering and navigation capabilities for investigating agent, MCP server, and AI Gateway execution glossterm:transcript[transcripts]. Use this view to quickly locate specific operations, analyze performance patterns, and debug issues across glossterm:tool[] invocations, LLM calls, and glossterm:agent[] reasoning steps. +Use the Transcripts view to investigate agent, MCP server, and AI Gateway execution traces. Filter by operation type, inspect span details, and trace issues across your agentic systems. + +For conceptual background on spans and trace structure, see xref:ai-agents:observability/concepts.adoc[]. After reading this page, you will be able to: @@ -14,14 +16,6 @@ After reading this page, you will be able to: * [ ] {learning-objective-2} * [ ] {learning-objective-3} -For basic orientation on monitoring each Redpanda Agentic Data Plane (ADP) component, see: - -* xref:ai-agents:ai-gateway/observability-metrics.adoc[] -* xref:ai-agents:agents/monitor-agents.adoc[] -* xref:ai-agents:mcp/remote/monitor-mcp-servers.adoc[] - -For conceptual background on what transcripts capture, glossterm:span[] types, and how they are organized hierarchically, see xref:ai-agents:observability/concepts.adoc[]. - == Prerequisites * xref:ai-agents:agents/create-agent.adoc[Running agent] or xref:ai-agents:mcp/remote/quickstart.adoc[MCP server] with at least one execution @@ -72,57 +66,27 @@ Refer to the timeline header to check the exact range and count of visible trans == Inspect span details -The transcript table displays the following: - -* Time: Timestamp when the span started (sortable) -* Span: Span type indicator and span name, with hierarchical tree structure -* Duration: Total duration, or duration of child spans relative to the parent span, represented as visual bars +The transcript table shows: -Each top-level row in the transcript table represents a service-level request flow in an ADP component. Expand each parent span to see the hierarchical structure of nested operations, including internal processing steps, LLM interactions, and tool calls. xref:ai-agents:observability/concepts.adoc#parent-child-relationships[Parent-child spans] show how operations relate: for example, an agent invocation (parent) triggers LLM calls and tool executions (children). +* **Time**: When the span started (sortable) +* **Span**: Span type and name with hierarchical tree structure +* **Duration**: Total time or relative duration shown as visual bars -// TODO: Clarify MCP trace structure -When agents invoke remote MCP servers, transcripts fold together under a tree structure to provide a unified view of the complete operation across service boundaries. The glossterm:trace ID[] originates at the initial request touchpoint and propagates across all involved services, linking spans from both the agent and MCP server under a single transcript. Use the tree view to follow the trace flow across multiple services and understand the complete request lifecycle. - -// TODO: Confirm how transcripts from external agents appear -If you use external agents that directly invoke MCP servers in the ADP, you may only see MCP-level parent transcripts, unless you have configured the agents to also emit traces to the Redpanda glossterm:OpenTelemetry[OTEL] ingestion pipeline. +To view nested operations, expand any parent span. To learn more about span hierarchies and cross-service traces, see xref:ai-agents:observability/concepts.adoc[]. // TODO: Confirm how gateway traces appear -Selected spans display detailed information at multiple levels, from high-level summaries to complete raw data: +Click any span to view details in the right panel: -* Start with summary tab for quick assessment -* Inspect attributes for detailed investigation using structured metadata -* Use raw data when you need complete information +* **Summary tab**: High-level overview with token usage, operation counts, and conversation history. +* **Attributes tab**: Structured metadata for debugging (see xref:ai-agents:observability/concepts.adoc#key-attributes-by-layer[standard attributes by layer]). +* **Raw data tab**: Complete OpenTelemetry span in JSON format. You can also view raw transcript data in the `redpanda.otel_traces` topic. [NOTE] ==== -Rows labeled "awaiting root — waiting for parent span" indicate incomplete transcripts where child spans have been received but the parent span is missing or hasn't arrived yet. This can occur due to network latency between services, processing delays in the OpenTelemetry pipeline, or lost parent spans from service failures. -If you consistently see awaiting root entries, this suggests instrumentation or trace collection issues that should be investigated. +Rows labeled "awaiting root — waiting for parent span" indicate incomplete traces. This occurs when child spans arrive before parent spans due to network latency or service failures. Consistent "awaiting root" entries suggest instrumentation issues. ==== -=== Summary tab - -Click on any span in the transcript table to open the detail panel on the right side of the interface. The first tab displays a context-specific summary based on the span type, which may include: - -* Total nested operations (span count) and execution time -* Token usage for LLM operations -* Counts of LLM calls and tool calls -* Full conversation history for agent spans, including user prompts, configured xref:ai-agents:agents/create-agent.adoc#write-the-system-prompt[system prompts], and LLM outputs - -=== Attributes tab - -The attributes view shows structured metadata for each transcript span. Use this view to inspect span attributes and understand the context of each operation. See xref:ai-agents:observability/concepts.adoc#key-attributes-by-layer[Transcripts and AI Observability] for details on standard attributes by instrumentation layer. - -=== Raw data tab - -The raw data view provides the complete span structure: - -* Full OpenTelemetry span in JSON format -* All fields including those not displayed in summary or attributes views -* Structured data suitable for export or programmatic access - -You can also view the raw transcript data in the `redpanda.otel_traces` topic. - == Investigate and analyze operations The following patterns demonstrate how to use the Transcripts view for understanding and troubleshooting your agentic systems. From 8010a20a73702e24d29abc1226fa2740a39bd28e Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Thu, 5 Feb 2026 15:41:22 -0500 Subject: [PATCH 14/19] Trim concepts content --- .../pages/observability/concepts.adoc | 24 +++++++------------ 1 file changed, 8 insertions(+), 16 deletions(-) diff --git a/modules/ai-agents/pages/observability/concepts.adoc b/modules/ai-agents/pages/observability/concepts.adoc index 04ceb472f..e6ef6b7bc 100644 --- a/modules/ai-agents/pages/observability/concepts.adoc +++ b/modules/ai-agents/pages/observability/concepts.adoc @@ -81,14 +81,7 @@ ai-agent (6.65 seconds) │ │ └── openai: chat gpt-5.2 (6.2 seconds) ---- -This shows: - -1. Total agent invocation: 6.65 seconds -2. Agent reasoning: 6.41 seconds -3. Sub-agent call: 6.39 seconds (most of the time) -4. LLM API call: 6.2 seconds (the actual bottleneck) - -Examine span durations to identify where time is spent and optimize accordingly. +This hierarchy shows that the LLM API call (6.2 seconds) accounts for most of the total agent invocation time (6.65 seconds), revealing the bottleneck in this execution flow. == MCP server transcript hierarchy @@ -240,7 +233,7 @@ Redpanda Connect layer: - Component-specific attributes from your tool configuration -Use `scope.name` to filter spans by layer when analyzing transcripts. +The `scope.name` field identifies which instrumentation layer created each span. == Understand the transcript structure @@ -263,13 +256,12 @@ Each span captures a unit of work. Here's what a typical MCP tool invocation loo } ---- -Key elements to understand: - -* `traceId`: Links all spans belonging to the same request. Use this to follow a tool invocation through its entire lifecycle. -* `name`: The tool or operation name (`http_processor` in this example). This tells you which component was invoked. -* `instrumentationScope.name`: When this is `rpcn-mcp`, the span represents an MCP tool. When it's `redpanda-connect`, it's internal processing. -* `attributes`: Context about the operation, like input parameters or result metadata. -* `status.code`: `0` means success, `2` means error. +* **traceId** links all spans in the same request across services +* **spanId** uniquely identifies this span +* **name** identifies the operation or tool +* **instrumentationScope.name** identifies which layer created the span (`rpcn-mcp` for MCP tools, `redpanda-connect` for internal processing) +* **attributes** contain operation-specific metadata +* **status.code** indicates success (0) or error (2) === Parent-child relationships From 30b7b325dfc9efa3d0fb46e5a58c6930a6c0bffe Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Thu, 5 Feb 2026 16:39:28 -0500 Subject: [PATCH 15/19] Polish byoa telemetry doc --- .../observability/ingest-custom-traces.adoc | 199 +++++++++--------- 1 file changed, 94 insertions(+), 105 deletions(-) diff --git a/modules/ai-agents/pages/observability/ingest-custom-traces.adoc b/modules/ai-agents/pages/observability/ingest-custom-traces.adoc index f6f2516b2..c2561d1b5 100644 --- a/modules/ai-agents/pages/observability/ingest-custom-traces.adoc +++ b/modules/ai-agents/pages/observability/ingest-custom-traces.adoc @@ -17,7 +17,7 @@ After reading this page, you will be able to: * A BYOC cluster * Ability to manage secrets in Redpanda Cloud -* The latest version of `rpk` installed +* The latest version of xref:manage:rpk/rpk-install.adoc[`rpk`] installed * Custom agent or application instrumented with OpenTelemetry SDK * Basic understanding of the https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/[OpenTelemetry span format^] and https://opentelemetry.io/docs/specs/otlp/[OpenTelemetry Protocol (OTLP)^] @@ -55,17 +55,13 @@ For non-LangChain applications or custom instrumentation, continue with the sect == About custom trace ingestion -Custom agents include applications you build with OpenTelemetry instrumentation that operate independently of Redpanda's Remote MCP servers or declarative agents. Examples include: +Custom agents are applications with OpenTelemetry instrumentation that operate independently of Redpanda's Remote MCP servers or declarative agents (such as LangChain, CrewAI, or manually instrumented applications). -* Custom AI agents built with LangChain, CrewAI, or other frameworks -* Applications with manual OpenTelemetry instrumentation -* Services that integrate with third-party AI platforms - -When these applications send traces to Redpanda's `redpanda.otel_traces` glossterm:topic[], you gain unified observability across all agentic components in your system. Custom agent transcripts appear alongside Remote MCP server and declarative agent transcripts in the Transcripts view, creating xref:ai-agents:observability/concepts.adoc#cross-service-transcripts[cross-service transcripts] that allow you to correlate operations and analyze end-to-end request flows. +When these agents send traces to `redpanda.otel_traces`, you gain unified observability alongside Remote MCP server and declarative agent traces. See xref:ai-agents:observability/concepts.adoc#cross-service-transcripts[Cross-service transcripts] for details on how traces correlate across services. === Trace format requirements -Custom agents must emit traces in OTLP format. The `otlp_http` input accepts both OTLP Protobuf (`application/x-protobuf`) and JSON (`application/json`) payloads. For <>, use the `otlp_grpc` input. +Custom agents must emit traces in OTLP format. The xref:develop:connect/components/inputs/otlp_http.adoc[`otlp_http`] input accepts both OTLP Protobuf (`application/x-protobuf`) and JSON (`application/json`) payloads. For <>, use the xref:develop:connect/components/inputs/otlp_grpc.adoc[`otlp_grpc`] input. Each trace must follow the OTLP specification with these required fields: @@ -93,6 +89,7 @@ Each trace must follow the OTLP specification with these required fields: |=== Optional but recommended fields: + - `parentSpanId` for hierarchical traces - `attributes` for contextual information @@ -167,10 +164,16 @@ Clients must include the authentication token in gRPC metadata as `authorization Configure your custom agent to send OpenTelemetry traces to the pipeline endpoint. After deploying the pipeline, you can find its URL in the Redpanda Cloud UI on the pipeline details page. -The endpoint URL format is: +[cols="1,3", options="header"] +|=== +| Transport | URL Format -* **HTTP**: `https://.pipelines..clusters.rdpa.co/v1/traces` -* **gRPC**: `.pipelines..clusters.rdpa.co:443` +| HTTP +| `+https://.pipelines..clusters.rdpa.co/v1/traces+` + +| gRPC +| `.pipelines..clusters.rdpa.co:443` +|=== === Authenticate to the pipeline @@ -178,8 +181,8 @@ The OTLP pipeline uses the same authentication mechanism as the Redpanda Cloud A Include the token in your requests: -* **HTTP**: Set the `Authorization` header to `Bearer ` -* **gRPC**: Set the `authorization` metadata field to `Bearer ` +* HTTP: Set the `Authorization` header to `Bearer ` +* gRPC: Set the `authorization` metadata field to `Bearer ` === Configure your OTEL exporter @@ -187,11 +190,18 @@ Install the OpenTelemetry SDK for your language and configure the OTLP exporter The exporter configuration requires: -* **Endpoint**: Your pipeline's URL (the SDK adds `/v1/traces` automatically for HTTP) -* **Headers**: Authorization header with your bearer token -* **Protocol**: HTTP to match the `otlp_http` input (or gRPC for `otlp_grpc`) - -.Python example for OTLP HTTP exporter +* Endpoint: Your pipeline's URL (the SDK adds `/v1/traces` automatically for HTTP) +* Headers: Authorization header with your bearer token +* Protocol: HTTP to match the `otlp_http` input (or gRPC for `otlp_grpc`) + +[tabs] +====== +HTTP:: ++ +-- +.View Python example +[%collapsible] +==== [,python] ---- from opentelemetry import trace @@ -237,8 +247,11 @@ with tracer.start_as_current_span( span.set_attribute("gen_ai.usage.input_tokens", 150) span.set_attribute("gen_ai.usage.output_tokens", 75) ---- +==== -.Node.js example for OTLP HTTP exporter +.View Node.js example +[%collapsible] +==== [,javascript] ---- const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node'); @@ -286,8 +299,11 @@ processRequest().then(result => { span.end(); }); ---- +==== -.Go example for OTLP HTTP exporter +.View Go example +[%collapsible] +==== [,go] ---- package main @@ -354,14 +370,15 @@ func main() { tp.ForceFlush(ctx) } ---- +==== +-- -TIP: Use environment variables for the endpoint URL and authentication token to keep credentials out of your code. - -==== gRPC transport examples - -If you're using the `otlp_grpc` input, configure your exporter to use gRPC transport. - -.Python example for OTLP gRPC exporter +gRPC:: ++ +-- +.View Python example +[%collapsible] +==== [,python] ---- from opentelemetry import trace @@ -385,8 +402,11 @@ provider = TracerProvider(resource=resource) provider.add_span_processor(BatchSpanProcessor(exporter)) trace.set_tracer_provider(provider) ---- +==== -.Node.js example for OTLP gRPC exporter +.View Node.js example +[%collapsible] +==== [,javascript] ---- const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node'); @@ -411,8 +431,11 @@ const provider = new NodeTracerProvider({ resource }); provider.addSpanProcessor(new BatchSpanProcessor(exporter)); provider.register(); ---- +==== -.Go example for OTLP gRPC exporter +.View Go example +[%collapsible] +==== [,go] ---- package main @@ -435,6 +458,11 @@ func createGRPCExporter(ctx context.Context) (*otlptracegrpc.Exporter, error) { ) } ---- +==== +-- +====== + +TIP: Use environment variables for the endpoint URL and authentication token to keep credentials out of your code. === Use recommended semantic conventions @@ -456,83 +484,50 @@ Following the OpenTelemetry semantic conventions, agent spans should include the * Session correlation: ** `gen_ai.conversation.id` - Identifier linking related agent invocations in the same conversation -==== Example with semantic conventions - -.Python example with GenAI semantic conventions -[,python] ----- -from opentelemetry import trace - -tracer = trace.get_tracer(__name__) - -# Create an agent invocation span -with tracer.start_as_current_span( - "invoke_agent my-assistant", - kind=trace.SpanKind.INTERNAL -) as span: - # Set required attributes - span.set_attribute("gen_ai.operation.name", "invoke_agent") - span.set_attribute("gen_ai.agent.name", "my-assistant") - span.set_attribute("gen_ai.provider.name", "openai") - span.set_attribute("gen_ai.request.model", "gpt-4") - span.set_attribute("gen_ai.conversation.id", "session-abc-123") +==== Required attributes for proper display - # Your agent logic here - response = process_agent_request(user_input) +Set these attributes on your spans for proper display and filtering in the Transcripts view: - # Set token usage after completion - span.set_attribute("gen_ai.usage.input_tokens", response.usage.input_tokens) - span.set_attribute("gen_ai.usage.output_tokens", response.usage.output_tokens) ----- +[cols="2,3", options="header"] +|=== +| Attribute | Purpose -.Node.js example with GenAI semantic conventions -[,javascript] ----- -const { trace } = require('@opentelemetry/api'); +| `gen_ai.operation.name` +| Set to `"invoke_agent"` for agent execution spans -const tracer = trace.getTracer('my-agent'); +| `gen_ai.agent.name` +| Human-readable name displayed in Transcripts view -const span = tracer.startSpan('invoke_agent my-assistant', { - kind: SpanKind.INTERNAL -}); +| `gen_ai.provider.name` +| LLM provider (e.g., `"openai"`, `"anthropic"`) -// Set required attributes -span.setAttribute('gen_ai.operation.name', 'invoke_agent'); -span.setAttribute('gen_ai.agent.name', 'my-assistant'); -span.setAttribute('gen_ai.provider.name', 'openai'); -span.setAttribute('gen_ai.request.model', 'gpt-4'); -span.setAttribute('gen_ai.conversation.id', 'session-abc-123'); +| `gen_ai.request.model` +| Model name (e.g., `"gpt-4"`, `"claude-sonnet-4"`) -// Your agent logic -const response = await processAgentRequest(userInput); +| `gen_ai.usage.input_tokens` / `gen_ai.usage.output_tokens` +| Token counts for cost tracking -// Set token usage -span.setAttribute('gen_ai.usage.input_tokens', response.usage.inputTokens); -span.setAttribute('gen_ai.usage.output_tokens', response.usage.outputTokens); +| `gen_ai.conversation.id` +| Links related agent invocations in the same conversation +|=== -span.end(); ----- +See the code examples earlier in this page for how to set these attributes in Python, Node.js, or Go. === Validate trace format -Before deploying to production, verify your traces match the expected format. - -//// - -* How to validate trace format against schema -* Common format issues and solutions -* Tools for format validation -==== - -//// - -Test your agent locally and inspect the traces it produces: - -[,bash] ----- -# Example validation steps +Before deploying to production, verify your traces match the expected format: ----- +. Run your agent locally and enable debug logging in your OpenTelemetry SDK to inspect outgoing spans. +. Verify required fields are present: + * `traceId`, `spanId`, `name` + * `startTimeUnixNano`, `endTimeUnixNano` + * `instrumentationScope` with a `name` field + * `status` with a `code` field (0 for success, 2 for error) +. Check that `service.name` is set in the resource attributes to identify your agent in the Transcripts view. +. Verify GenAI semantic convention attributes if you want proper display in the Transcripts view: + * `gen_ai.operation.name` set to `"invoke_agent"` for agent spans + * `gen_ai.agent.name` for agent identification + * Token usage attributes if tracking costs == Verify trace ingestion @@ -553,7 +548,7 @@ Look for spans with your custom `instrumentationScope.name` to identify traces f === View traces in Transcripts -After your custom agent sends traces through the pipeline, they appear in your cluster's *Agentic AI > Transcripts* view alongside traces from Remote MCP servers and declarative agents. +After your custom agent sends traces through the pipeline, they appear in your cluster's *Agentic AI > Transcripts* view alongside traces from Remote MCP servers, declarative agents, and AI Gateway. ==== Identify custom agent transcripts @@ -573,20 +568,14 @@ If your spans include the recommended token usage attributes (`gen_ai.usage.inpu == Troubleshooting -//// -* Common issues and solutions -* How to monitor pipeline health -* Log locations and debugging techniques -* Failure modes and diagnostics - -//// +If traces from your custom agent aren't appearing in the Transcripts view, use these diagnostic steps to identify and resolve common ingestion issues. === Pipeline not receiving requests If your custom agent cannot reach the ingestion endpoint: . Verify the endpoint URL format: - * HTTP: `https://.pipelines..clusters.rdpa.co/v1/traces` + * HTTP: `+https://.pipelines..clusters.rdpa.co/v1/traces` * gRPC: `.pipelines..clusters.rdpa.co:443` (no `https://` prefix for gRPC clients) . Check network connectivity and firewall rules. . Ensure authentication tokens are valid and properly formatted in the `Authorization: Bearer ` header (HTTP) or `authorization` metadata field (gRPC). @@ -610,6 +599,6 @@ If requests succeed but traces do not appear in `redpanda.otel_traces`: == Next steps * xref:ai-agents:observability/transcripts.adoc[] -* xref:ai-agents:agents/monitor-agents.adoc[Observability for declarative agents] -* https://docs.redpanda.com/redpanda-connect/components/inputs/otlp_http/[OTLP HTTP input reference^] - Complete configuration options for the `otlp_http` component -* https://docs.redpanda.com/redpanda-connect/components/inputs/otlp_grpc/[OTLP gRPC input reference^] - Alternative gRPC-based trace ingestion \ No newline at end of file +* xref:ai-agents:agents/monitor-agents.adoc[Observability for declarative agents] +* xref:develop:connect/components/inputs/otlp_http.adoc[OTLP HTTP input reference] - Complete configuration options for the `otlp_http` component +* xref:develop:connect/components/inputs/otlp_grpc.adoc[OTLP gRPC input reference] - Alternative gRPC-based trace ingestion \ No newline at end of file From 422526a92a63540acc0917a85e519abd8dae90c9 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Thu, 5 Feb 2026 17:45:01 -0500 Subject: [PATCH 16/19] Add glossary terms --- modules/ai-agents/pages/observability/concepts.adoc | 8 ++++---- .../pages/observability/ingest-custom-traces.adoc | 2 +- modules/ai-agents/pages/observability/transcripts.adoc | 10 +++++----- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/modules/ai-agents/pages/observability/concepts.adoc b/modules/ai-agents/pages/observability/concepts.adoc index e6ef6b7bc..e250db64d 100644 --- a/modules/ai-agents/pages/observability/concepts.adoc +++ b/modules/ai-agents/pages/observability/concepts.adoc @@ -6,7 +6,7 @@ :learning-objective-2: Interpret transcript structure for debugging and monitoring :learning-objective-3: Distinguish between transcripts and audit logs -Redpanda automatically captures transcripts (also referred to as execution logs or traces) for both AI agents and MCP servers, providing complete observability into how your agentic systems operate. +Redpanda automatically captures glossterm:transcript[transcripts] (also referred to as execution logs or traces) for both AI agents and MCP servers, providing complete glossterm:observability[observability] into how your agentic systems operate. After reading this page, you will be able to: @@ -16,7 +16,7 @@ After reading this page, you will be able to: == What are transcripts -The AI Gateway and every agent and MCP server in your Agentic Data Plane (ADP) automatically emit OpenTelemetry traces to a glossterm:topic[] called `redpanda.otel_traces`. These traces provide detailed observability into operations, creating complete transcripts. +The AI Gateway and every glossterm:ai-agent[agent] and glossterm:mcp-server[MCP server] in your Agentic Data Plane (ADP) automatically emit OpenTelemetry traces to a glossterm:topic[] called `redpanda.otel_traces`. These traces provide detailed observability into operations, creating complete transcripts. Transcripts capture: @@ -31,7 +31,7 @@ With 100% sampling, every operation is captured, enabling comprehensive debuggin == Traces and spans -OpenTelemetry traces provide a complete picture of how a request flows through your system: +glossterm:opentelemetry[OpenTelemetry] traces provide a complete picture of how a request flows through your system: * A _trace_ represents the entire lifecycle of a request (for example, a tool invocation from start to finish). * A _span_ represents a single unit of work within that trace (such as a data processing operation or an external API call). @@ -308,7 +308,7 @@ The `events` array captures what happened and when. Use `timeUnixNano` to see ex [[opentelemetry-traces-topic]] == How Redpanda stores trace data -The `redpanda.otel_traces` topic stores OpenTelemetry spans using Redpanda's Schema Registry wire format, with a custom Protobuf schema named `redpanda.otel_traces-value` that follows the https://opentelemetry.io/docs/specs/otel/protocol/[OpenTelemetry Protocol (OTLP)^] specification. Spans include attributes following OpenTelemetry https://opentelemetry.io/docs/specs/semconv/gen-ai/[semantic conventions for generative AI^], such as `gen_ai.operation.name` and `gen_ai.conversation.id`. The schema is automatically registered in the Schema Registry with the topic, so Kafka clients can consume and deserialize trace data correctly. +The `redpanda.otel_traces` topic stores OpenTelemetry spans using Redpanda's glossterm:schema-registry[Schema Registry] wire format, with a custom Protobuf schema named `redpanda.otel_traces-value` that follows the https://opentelemetry.io/docs/specs/otel/protocol/[OpenTelemetry Protocol (glossterm:otlp[OTLP])^] specification. Spans include attributes following OpenTelemetry https://opentelemetry.io/docs/specs/semconv/gen-ai/[semantic conventions for generative AI^], such as `gen_ai.operation.name` and `gen_ai.conversation.id`. The schema is automatically registered in the Schema Registry with the topic, so Kafka clients can consume and deserialize trace data correctly. Redpanda manages both the `redpanda.otel_traces` topic and its schema automatically. If you delete either the topic or the schema, they are recreated automatically. However, deleting the topic permanently deletes all trace data, and the topic comes back empty. Do not produce your own data to this topic. It is reserved for OpenTelemetry traces. diff --git a/modules/ai-agents/pages/observability/ingest-custom-traces.adoc b/modules/ai-agents/pages/observability/ingest-custom-traces.adoc index c2561d1b5..1b50098a2 100644 --- a/modules/ai-agents/pages/observability/ingest-custom-traces.adoc +++ b/modules/ai-agents/pages/observability/ingest-custom-traces.adoc @@ -575,7 +575,7 @@ If traces from your custom agent aren't appearing in the Transcripts view, use t If your custom agent cannot reach the ingestion endpoint: . Verify the endpoint URL format: - * HTTP: `+https://.pipelines..clusters.rdpa.co/v1/traces` + * HTTP: `\https://.pipelines..clusters.rdpa.co/v1/traces` * gRPC: `.pipelines..clusters.rdpa.co:443` (no `https://` prefix for gRPC clients) . Check network connectivity and firewall rules. . Ensure authentication tokens are valid and properly formatted in the `Authorization: Bearer ` header (HTTP) or `authorization` metadata field (gRPC). diff --git a/modules/ai-agents/pages/observability/transcripts.adoc b/modules/ai-agents/pages/observability/transcripts.adoc index f8356a861..a8ada4838 100644 --- a/modules/ai-agents/pages/observability/transcripts.adoc +++ b/modules/ai-agents/pages/observability/transcripts.adoc @@ -19,7 +19,7 @@ After reading this page, you will be able to: == Prerequisites * xref:ai-agents:agents/create-agent.adoc[Running agent] or xref:ai-agents:mcp/remote/quickstart.adoc[MCP server] with at least one execution -* Access to the Transcripts view (requires appropriate permissions to read the `redpanda.otel_traces` topic) +* Access to the Transcripts view (requires appropriate permissions to read the `redpanda.otel_traces` glossterm:topic[]) == Navigate the Transcripts interface @@ -68,7 +68,7 @@ Refer to the timeline header to check the exact range and count of visible trans The transcript table shows: -* **Time**: When the span started (sortable) +* **Time**: When the glossterm:span[span] started (sortable) * **Span**: Span type and name with hierarchical tree structure * **Duration**: Total time or relative duration shown as visual bars @@ -80,14 +80,14 @@ Click any span to view details in the right panel: * **Summary tab**: High-level overview with token usage, operation counts, and conversation history. * **Attributes tab**: Structured metadata for debugging (see xref:ai-agents:observability/concepts.adoc#key-attributes-by-layer[standard attributes by layer]). -* **Raw data tab**: Complete OpenTelemetry span in JSON format. You can also view raw transcript data in the `redpanda.otel_traces` topic. +* **Raw data tab**: Complete glossterm:opentelemetry[OpenTelemetry] span in JSON format. You can also view raw transcript data in the `redpanda.otel_traces` topic. [NOTE] ==== -Rows labeled "awaiting root — waiting for parent span" indicate incomplete traces. This occurs when child spans arrive before parent spans due to network latency or service failures. Consistent "awaiting root" entries suggest instrumentation issues. +Rows labeled "awaiting root — waiting for parent span" indicate incomplete glossterm:trace[traces]. This occurs when child spans arrive before parent spans due to network latency or service failures. Consistent "awaiting root" entries suggest instrumentation issues. ==== -== Investigate and analyze operations +== Common investigation tasks The following patterns demonstrate how to use the Transcripts view for understanding and troubleshooting your agentic systems. From 9a3cd03d5dd7fb324b399271d8538a764480b352 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Thu, 5 Feb 2026 18:00:14 -0500 Subject: [PATCH 17/19] Fix gloss terms --- modules/ai-agents/pages/observability/concepts.adoc | 8 ++++---- modules/ai-agents/pages/observability/transcripts.adoc | 6 +++--- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/modules/ai-agents/pages/observability/concepts.adoc b/modules/ai-agents/pages/observability/concepts.adoc index e250db64d..5d4eb6102 100644 --- a/modules/ai-agents/pages/observability/concepts.adoc +++ b/modules/ai-agents/pages/observability/concepts.adoc @@ -6,7 +6,7 @@ :learning-objective-2: Interpret transcript structure for debugging and monitoring :learning-objective-3: Distinguish between transcripts and audit logs -Redpanda automatically captures glossterm:transcript[transcripts] (also referred to as execution logs or traces) for both AI agents and MCP servers, providing complete glossterm:observability[observability] into how your agentic systems operate. +Redpanda automatically captures glossterm:transcript[,transcripts] (also referred to as execution logs or traces) for both AI agents and MCP servers, providing complete glossterm:observability[] into how your agentic systems operate. After reading this page, you will be able to: @@ -16,7 +16,7 @@ After reading this page, you will be able to: == What are transcripts -The AI Gateway and every glossterm:ai-agent[agent] and glossterm:mcp-server[MCP server] in your Agentic Data Plane (ADP) automatically emit OpenTelemetry traces to a glossterm:topic[] called `redpanda.otel_traces`. These traces provide detailed observability into operations, creating complete transcripts. +The AI Gateway and every glossterm:ai-agent[,agent] and glossterm:mcp-server[,MCP server] in your Agentic Data Plane (ADP) automatically emit OpenTelemetry traces to a glossterm:topic[] called `redpanda.otel_traces`. These traces provide detailed observability into operations, creating complete transcripts. Transcripts capture: @@ -31,7 +31,7 @@ With 100% sampling, every operation is captured, enabling comprehensive debuggin == Traces and spans -glossterm:opentelemetry[OpenTelemetry] traces provide a complete picture of how a request flows through your system: +glossterm:opentelemetry[,OpenTelemetry] traces provide a complete picture of how a request flows through your system: * A _trace_ represents the entire lifecycle of a request (for example, a tool invocation from start to finish). * A _span_ represents a single unit of work within that trace (such as a data processing operation or an external API call). @@ -308,7 +308,7 @@ The `events` array captures what happened and when. Use `timeUnixNano` to see ex [[opentelemetry-traces-topic]] == How Redpanda stores trace data -The `redpanda.otel_traces` topic stores OpenTelemetry spans using Redpanda's glossterm:schema-registry[Schema Registry] wire format, with a custom Protobuf schema named `redpanda.otel_traces-value` that follows the https://opentelemetry.io/docs/specs/otel/protocol/[OpenTelemetry Protocol (glossterm:otlp[OTLP])^] specification. Spans include attributes following OpenTelemetry https://opentelemetry.io/docs/specs/semconv/gen-ai/[semantic conventions for generative AI^], such as `gen_ai.operation.name` and `gen_ai.conversation.id`. The schema is automatically registered in the Schema Registry with the topic, so Kafka clients can consume and deserialize trace data correctly. +The `redpanda.otel_traces` topic stores OpenTelemetry spans using Redpanda's glossterm:schema-registry[] wire format, with a custom Protobuf schema named `redpanda.otel_traces-value` that follows the https://opentelemetry.io/docs/specs/otel/protocol/[OpenTelemetry Protocol (glossterm:otlp[])^] specification. Spans include attributes following OpenTelemetry https://opentelemetry.io/docs/specs/semconv/gen-ai/[semantic conventions for generative AI^], such as `gen_ai.operation.name` and `gen_ai.conversation.id`. The schema is automatically registered in the Schema Registry with the topic, so Kafka clients can consume and deserialize trace data correctly. Redpanda manages both the `redpanda.otel_traces` topic and its schema automatically. If you delete either the topic or the schema, they are recreated automatically. However, deleting the topic permanently deletes all trace data, and the topic comes back empty. Do not produce your own data to this topic. It is reserved for OpenTelemetry traces. diff --git a/modules/ai-agents/pages/observability/transcripts.adoc b/modules/ai-agents/pages/observability/transcripts.adoc index a8ada4838..02c33f64b 100644 --- a/modules/ai-agents/pages/observability/transcripts.adoc +++ b/modules/ai-agents/pages/observability/transcripts.adoc @@ -68,7 +68,7 @@ Refer to the timeline header to check the exact range and count of visible trans The transcript table shows: -* **Time**: When the glossterm:span[span] started (sortable) +* **Time**: When the glossterm:span[] started (sortable) * **Span**: Span type and name with hierarchical tree structure * **Duration**: Total time or relative duration shown as visual bars @@ -80,11 +80,11 @@ Click any span to view details in the right panel: * **Summary tab**: High-level overview with token usage, operation counts, and conversation history. * **Attributes tab**: Structured metadata for debugging (see xref:ai-agents:observability/concepts.adoc#key-attributes-by-layer[standard attributes by layer]). -* **Raw data tab**: Complete glossterm:opentelemetry[OpenTelemetry] span in JSON format. You can also view raw transcript data in the `redpanda.otel_traces` topic. +* **Raw data tab**: Complete glossterm:opentelemetry[] span in JSON format. You can also view raw transcript data in the `redpanda.otel_traces` topic. [NOTE] ==== -Rows labeled "awaiting root — waiting for parent span" indicate incomplete glossterm:trace[traces]. This occurs when child spans arrive before parent spans due to network latency or service failures. Consistent "awaiting root" entries suggest instrumentation issues. +Rows labeled "awaiting root — waiting for parent span" indicate incomplete glossterm:trace[,traces]. This occurs when child spans arrive before parent spans due to network latency or service failures. Consistent "awaiting root" entries suggest instrumentation issues. ==== == Common investigation tasks From e430e0aa3452bd4468c62ba96a817192ab0d2821 Mon Sep 17 00:00:00 2001 From: Kat Batuigas Date: Fri, 6 Feb 2026 13:01:39 -0500 Subject: [PATCH 18/19] Reorganize http/grpc section and fix more gloss terms --- .../pages/observability/concepts.adoc | 24 ++++++------- .../observability/ingest-custom-traces.adoc | 34 +++++++++++++------ .../pages/observability/transcripts.adoc | 12 +++---- 3 files changed, 41 insertions(+), 29 deletions(-) diff --git a/modules/ai-agents/pages/observability/concepts.adoc b/modules/ai-agents/pages/observability/concepts.adoc index 5d4eb6102..1d49a479d 100644 --- a/modules/ai-agents/pages/observability/concepts.adoc +++ b/modules/ai-agents/pages/observability/concepts.adoc @@ -6,7 +6,7 @@ :learning-objective-2: Interpret transcript structure for debugging and monitoring :learning-objective-3: Distinguish between transcripts and audit logs -Redpanda automatically captures glossterm:transcript[,transcripts] (also referred to as execution logs or traces) for both AI agents and MCP servers, providing complete glossterm:observability[] into how your agentic systems operate. +Redpanda automatically captures glossterm:transcript[,transcripts] (also referred to as execution logs or traces) for both AI agents and MCP servers, providing complete glossterm:observability (o11y)[,observability] into how your agentic systems operate. After reading this page, you will be able to: @@ -16,7 +16,7 @@ After reading this page, you will be able to: == What are transcripts -The AI Gateway and every glossterm:ai-agent[,agent] and glossterm:mcp-server[,MCP server] in your Agentic Data Plane (ADP) automatically emit OpenTelemetry traces to a glossterm:topic[] called `redpanda.otel_traces`. These traces provide detailed observability into operations, creating complete transcripts. +The AI Gateway and every glossterm:AI agent[,agent] and glossterm:MCP server[] in your Agentic Data Plane (ADP) automatically emit OpenTelemetry traces to a glossterm:topic[] called `redpanda.otel_traces`. These traces provide detailed observability into operations, creating complete transcripts. Transcripts capture: @@ -31,7 +31,7 @@ With 100% sampling, every operation is captured, enabling comprehensive debuggin == Traces and spans -glossterm:opentelemetry[,OpenTelemetry] traces provide a complete picture of how a request flows through your system: +glossterm:OpenTelemetry[] traces provide a complete picture of how a request flows through your system: * A _trace_ represents the entire lifecycle of a request (for example, a tool invocation from start to finish). * A _span_ represents a single unit of work within that trace (such as a data processing operation or an external API call). @@ -54,11 +54,11 @@ Agent transcripts contain these span types: | Measure total request duration and identify slow agent invocations. | `agent` -| Internal agent processing that represents reasoning and decision-making. Shows time spent in the LLM reasoning loop, including context processing, tool selection, and response generation. Multiple `agent` spans may appear when the agent iterates through its reasoning loop. +| Internal agent processing that represents reasoning and decision-making. Shows time spent in the glossterm:large language model (LLM)[,LLM] reasoning loop, including context processing, tool selection, and response generation. Multiple `agent` spans may appear when the agent iterates through its reasoning loop. | Track reasoning time and identify iteration patterns. | `invoke_agent` -| Agent and sub-agent invocation in multi-agent architectures, following the https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/[OpenTelemetry agent invocation semantic conventions^]. Represents one agent calling another via the A2A protocol. +| Agent and sub-agent invocation in multi-agent architectures, following the https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/[OpenTelemetry agent invocation semantic conventions^]. Represents one agent calling another via the glossterm:Agent2Agent (A2A) protocol[,A2A protocol]. | Trace calls between root agents and sub-agents, measure cross-agent latency, and identify which sub-agent was invoked. | `openai`, `anthropic`, or other LLM providers @@ -256,12 +256,12 @@ Each span captures a unit of work. Here's what a typical MCP tool invocation loo } ---- -* **traceId** links all spans in the same request across services -* **spanId** uniquely identifies this span -* **name** identifies the operation or tool -* **instrumentationScope.name** identifies which layer created the span (`rpcn-mcp` for MCP tools, `redpanda-connect` for internal processing) -* **attributes** contain operation-specific metadata -* **status.code** indicates success (0) or error (2) +* `traceId` links all spans in the same request across services +* `spanId` uniquely identifies this span +* `name` identifies the operation or tool +* `instrumentationScope.name` identifies which layer created the span (`rpcn-mcp` for MCP tools, `redpanda-connect` for internal processing) +* `attributes` contain operation-specific metadata +* `status.code` indicates success (0) or error (2) === Parent-child relationships @@ -308,7 +308,7 @@ The `events` array captures what happened and when. Use `timeUnixNano` to see ex [[opentelemetry-traces-topic]] == How Redpanda stores trace data -The `redpanda.otel_traces` topic stores OpenTelemetry spans using Redpanda's glossterm:schema-registry[] wire format, with a custom Protobuf schema named `redpanda.otel_traces-value` that follows the https://opentelemetry.io/docs/specs/otel/protocol/[OpenTelemetry Protocol (glossterm:otlp[])^] specification. Spans include attributes following OpenTelemetry https://opentelemetry.io/docs/specs/semconv/gen-ai/[semantic conventions for generative AI^], such as `gen_ai.operation.name` and `gen_ai.conversation.id`. The schema is automatically registered in the Schema Registry with the topic, so Kafka clients can consume and deserialize trace data correctly. +The `redpanda.otel_traces` topic stores OpenTelemetry spans using Redpanda's glossterm:Schema Registry[] wire format, with a custom Protobuf schema named `redpanda.otel_traces-value` that follows the https://opentelemetry.io/docs/specs/otel/protocol/[OpenTelemetry Protocol (OTLP)^] specification. Spans include attributes following OpenTelemetry https://opentelemetry.io/docs/specs/semconv/gen-ai/[semantic conventions for generative AI^], such as `gen_ai.operation.name` and `gen_ai.conversation.id`. The schema is automatically registered in the Schema Registry with the topic, so Kafka clients can consume and deserialize trace data correctly. Redpanda manages both the `redpanda.otel_traces` topic and its schema automatically. If you delete either the topic or the schema, they are recreated automatically. However, deleting the topic permanently deletes all trace data, and the topic comes back empty. Do not produce your own data to this topic. It is reserved for OpenTelemetry traces. diff --git a/modules/ai-agents/pages/observability/ingest-custom-traces.adoc b/modules/ai-agents/pages/observability/ingest-custom-traces.adoc index 1b50098a2..d67d38b53 100644 --- a/modules/ai-agents/pages/observability/ingest-custom-traces.adoc +++ b/modules/ai-agents/pages/observability/ingest-custom-traces.adoc @@ -1,7 +1,7 @@ = Ingest OpenTelemetry Traces from Custom Agents :description: Configure a Redpanda Connect pipeline to ingest OTEL traces from custom agents into Redpanda for unified observability. :page-topic-type: how-to -:learning-objective-1: Configure a Redpanda Connect pipeline to receive OpenTelemetry traces from custom agents via HTTP and publish them to redpanda.otel_traces +:learning-objective-1: Configure a Redpanda Connect pipeline to receive OpenTelemetry traces from custom agents via HTTP and publish them to `redpanda.otel_traces` :learning-objective-2: Validate trace data format and compatibility with existing MCP server traces :learning-objective-3: Secure the ingestion endpoint using authentication mechanisms @@ -30,7 +30,7 @@ If you're using LangChain with OpenTelemetry tracing, you can send traces to Red . Deploy a Redpanda Connect pipeline using the `otlp_http` input to receive OTLP traces over HTTP. Create the pipeline in the **Connect** page of your cluster, or see the <> section below for a sample configuration. . Configure your OTEL exporter to send traces to your Redpanda Connect pipeline using environment variables: - ++ [,bash] ---- # Configure LangChain OTEL integration @@ -97,17 +97,22 @@ For complete trace structure details, see xref:ai-agents:observability/concepts. == Configure the ingestion pipeline -Create a Redpanda Connect pipeline that receives HTTP requests containing OTLP traces and publishes them to the `redpanda.otel_traces` topic. The pipeline uses the `otlp_http` input component, which is specifically designed to receive OpenTelemetry Protocol data. +Create a Redpanda Connect pipeline that receives OTLP traces and publishes them to the `redpanda.otel_traces` topic. Choose HTTP or gRPC transport based on your agent's requirements. === Create the pipeline configuration -Create a pipeline configuration file that defines the OTLP HTTP ingestion endpoint. +Create a pipeline configuration file that defines the OTLP ingestion endpoint. +[tabs] +==== +HTTP:: ++ +-- The `otlp_http` input component: * Exposes an OpenTelemetry Collector HTTP receiver * Accepts traces at the standard `/v1/traces` endpoint -* Converts incoming OTLP data into individual Redpanda OTEL v1 Protobuf messages and publishes them to the `redpanda.otel_traces` topic +* Converts incoming OTLP data into individual Redpanda OTEL v1 Protobuf messages The following example shows a minimal pipeline configuration. Redpanda Cloud automatically injects authentication handling, so you don't need to configure `auth_token` in the input. @@ -126,13 +131,18 @@ output: - mechanism: "REDPANDA_CLOUD_SERVICE_ACCOUNT" topic: "redpanda.otel_traces" ---- +-- -The `otlp_http` input automatically handles format conversion, so no processors are needed for basic trace ingestion. Each span becomes a separate message in the `redpanda.otel_traces` topic. +gRPC:: ++ +-- +The `otlp_grpc` input component: -[[use-grpc]] -==== Alternative: Use gRPC instead of HTTP +* Exposes an OpenTelemetry Collector gRPC receiver +* Accepts traces via the OTLP gRPC protocol +* Converts incoming OTLP data into individual Redpanda OTEL v1 Protobuf messages -If your custom agent requires gRPC transport, use the `otlp_grpc` input instead: +The following example shows a minimal pipeline configuration. Redpanda Cloud automatically injects authentication handling. [,yaml] ---- @@ -150,7 +160,11 @@ output: topic: "redpanda.otel_traces" ---- -Clients must include the authentication token in gRPC metadata as `authorization: Bearer `. +NOTE: Clients must include the authentication token in gRPC metadata as `authorization: Bearer `. +-- +==== + +The OTLP input automatically handles format conversion, so no processors are needed for basic trace ingestion. Each span becomes a separate message in the `redpanda.otel_traces` topic. === Deploy the pipeline in Redpanda Cloud diff --git a/modules/ai-agents/pages/observability/transcripts.adoc b/modules/ai-agents/pages/observability/transcripts.adoc index 02c33f64b..9f10fe256 100644 --- a/modules/ai-agents/pages/observability/transcripts.adoc +++ b/modules/ai-agents/pages/observability/transcripts.adoc @@ -25,7 +25,7 @@ After reading this page, you will be able to: === Filter transcripts -Use filters to narrow down transcripts and quickly locate specific executions. When you use any of the filters, the transcript list updates to show only matching results. +Use filters to narrow down transcripts and quickly locate specific executions. When you use any of the filters, the transcript table updates to show only matching results. The Transcripts view provides several quick-filter buttons: @@ -38,7 +38,7 @@ The Transcripts view provides several quick-filter buttons: You can combine multiple filters to narrow results further. For example, use *Tool Calls* and *Errors Only* together to investigate failed tool executions. -Toggle *Full traces* on to see the complete execution context, in grayed-out text, for the filtered transcripts. +Toggle *Full traces* on to see the complete execution context, in grayed-out text, for the filtered transcripts in the table. ==== Filter by attribute @@ -59,7 +59,7 @@ Click on any bar in the timeline to zoom into transcripts from that specific tim [NOTE] ==== -When viewing time ranges with many transcripts (hundreds or thousands), the table displays a subset of the data to maintain performance and usability. The timeline bar indicates the actual time range of data currently loaded into the view, which may be narrower than your selected time range. +When viewing time ranges with many transcripts (hundreds or thousands), the table displays a subset of the data to maintain performance and usability. The timeline bar indicates the actual time range of data currently loaded into view, which may be narrower than your selected time range. Refer to the timeline header to check the exact range and count of visible transcripts, for example, "Showing 100 of 299 transcripts from 13:17 to 15:16". ==== @@ -74,13 +74,11 @@ The transcript table shows: To view nested operations, expand any parent span. To learn more about span hierarchies and cross-service traces, see xref:ai-agents:observability/concepts.adoc[]. -// TODO: Confirm how gateway traces appear - -Click any span to view details in the right panel: +Click any span to view details in the panel: * **Summary tab**: High-level overview with token usage, operation counts, and conversation history. * **Attributes tab**: Structured metadata for debugging (see xref:ai-agents:observability/concepts.adoc#key-attributes-by-layer[standard attributes by layer]). -* **Raw data tab**: Complete glossterm:opentelemetry[] span in JSON format. You can also view raw transcript data in the `redpanda.otel_traces` topic. +* **Raw data tab**: Complete glossterm:OpenTelemetry[] span in JSON format. You can also view raw transcript data in the `redpanda.otel_traces` topic. [NOTE] ==== From e365b09e7af1d2a640b416fc0fd3d93d6b4f034b Mon Sep 17 00:00:00 2001 From: micheleRP Date: Sat, 7 Feb 2026 14:01:53 -0700 Subject: [PATCH 19/19] improved messaging from Alex --- .../pages/observability/concepts.adoc | 21 +++++++------------ .../ai-agents/pages/observability/index.adoc | 2 +- .../observability/ingest-custom-traces.adoc | 2 +- .../pages/observability/transcripts.adoc | 4 ++-- 4 files changed, 11 insertions(+), 18 deletions(-) diff --git a/modules/ai-agents/pages/observability/concepts.adoc b/modules/ai-agents/pages/observability/concepts.adoc index 1d49a479d..aa777b29a 100644 --- a/modules/ai-agents/pages/observability/concepts.adoc +++ b/modules/ai-agents/pages/observability/concepts.adoc @@ -1,12 +1,12 @@ = Transcripts and AI Observability -:description: Understand how Redpanda captures execution transcripts using OpenTelemetry. +:description: Understand how Redpanda captures end-to-end execution transcripts on an immutable distributed log for agent governance and observability. :page-topic-type: concepts :personas: agent_developer, platform_admin, data_engineer :learning-objective-1: Explain how transcripts and spans capture execution flow :learning-objective-2: Interpret transcript structure for debugging and monitoring :learning-objective-3: Distinguish between transcripts and audit logs -Redpanda automatically captures glossterm:transcript[,transcripts] (also referred to as execution logs or traces) for both AI agents and MCP servers, providing complete glossterm:observability (o11y)[,observability] into how your agentic systems operate. +Redpanda automatically captures glossterm:transcript[,transcripts] for AI agents, MCP servers, and AI Gateway operations. A transcript is the end-to-end execution record of an agentic behavior. It may span multiple agents, tools, and models and last from minutes to days. Redpanda's immutable distributed log stores every transcript, providing a correct record with no gaps. Transcripts form the keystone of Redpanda's governance for agents. After reading this page, you will be able to: @@ -16,7 +16,7 @@ After reading this page, you will be able to: == What are transcripts -The AI Gateway and every glossterm:AI agent[,agent] and glossterm:MCP server[] in your Agentic Data Plane (ADP) automatically emit OpenTelemetry traces to a glossterm:topic[] called `redpanda.otel_traces`. These traces provide detailed observability into operations, creating complete transcripts. +A transcript records the complete execution of an agentic behavior from start to finish. It captures every step — across multiple agents, tools, models, and services — in a single, traceable record. The AI Gateway and every glossterm:AI agent[,agent] and glossterm:MCP server[] in your Agentic Data Plane (ADP) automatically emit OpenTelemetry traces to a glossterm:topic[] called `redpanda.otel_traces`. Redpanda's immutable distributed log stores these traces. Transcripts capture: @@ -27,7 +27,7 @@ Transcripts capture: * Error conditions * Performance metrics -With 100% sampling, every operation is captured, enabling comprehensive debugging, monitoring, and performance analysis. +With 100% sampling, every operation is captured with no gaps. The underlying storage uses a distributed log built on Raft consensus (with TLA+ proven correctness), giving transcripts a trustworthy, immutable record for governance, debugging, and performance analysis. == Traces and spans @@ -322,23 +322,16 @@ Transcripts may contain sensitive information from your tool inputs and outputs. == Transcripts compared to audit logs -// TODO: Ask SME to review and confirm whether we want to rephrase or change -// "not designed for audit logging or compliance" -Transcripts are designed for observability and debugging, not audit logging or compliance. +Transcripts and audit logs serve different but complementary purposes. Transcripts provide: +* A complete, immutable record of every execution step, stored on Redpanda's distributed log with no gaps * Hierarchical view of request flow through your system (parent-child span relationships) * Detailed timing information for performance analysis * Ability to reconstruct execution paths and identify bottlenecks -* Insights into how operations flow through distributed systems -Transcripts are not: - -* Immutable audit records for compliance purposes -* Designed for "who did what" accountability tracking - -For compliance and audit requirements, use the session and task topics for agents, which provide records of agent conversations and execution. +Transcripts are optimized for execution-level observability and governance. For user-level accountability tracking ("who initiated what"), use the session and task topics for agents, which provide records of agent conversations and task execution. == Next steps diff --git a/modules/ai-agents/pages/observability/index.adoc b/modules/ai-agents/pages/observability/index.adoc index 92ba2a5a5..d54b6e359 100644 --- a/modules/ai-agents/pages/observability/index.adoc +++ b/modules/ai-agents/pages/observability/index.adoc @@ -1,5 +1,5 @@ = Transcripts :page-layout: index -:description: Monitor agent and MCP server execution using complete OpenTelemetry traces captured by Redpanda. +:description: Govern agentic AI with complete execution transcripts built on Redpanda's immutable distributed log. {description} diff --git a/modules/ai-agents/pages/observability/ingest-custom-traces.adoc b/modules/ai-agents/pages/observability/ingest-custom-traces.adoc index d67d38b53..c9eeef879 100644 --- a/modules/ai-agents/pages/observability/ingest-custom-traces.adoc +++ b/modules/ai-agents/pages/observability/ingest-custom-traces.adoc @@ -1,5 +1,5 @@ = Ingest OpenTelemetry Traces from Custom Agents -:description: Configure a Redpanda Connect pipeline to ingest OTEL traces from custom agents into Redpanda for unified observability. +:description: Configure a Redpanda Connect pipeline to ingest OpenTelemetry traces from custom agents into Redpanda's immutable log for unified governance and observability. :page-topic-type: how-to :learning-objective-1: Configure a Redpanda Connect pipeline to receive OpenTelemetry traces from custom agents via HTTP and publish them to `redpanda.otel_traces` :learning-objective-2: Validate trace data format and compatibility with existing MCP server traces diff --git a/modules/ai-agents/pages/observability/transcripts.adoc b/modules/ai-agents/pages/observability/transcripts.adoc index 9f10fe256..94cf5bcb4 100644 --- a/modules/ai-agents/pages/observability/transcripts.adoc +++ b/modules/ai-agents/pages/observability/transcripts.adoc @@ -1,12 +1,12 @@ = View Transcripts -:description: Learn how to filter and navigate the Transcripts interface to investigate execution traces using multiple detail views and interactive timeline navigation. +:description: Filter and navigate the Transcripts interface to investigate end-to-end agent execution records stored on Redpanda's immutable log. :page-topic-type: how-to :personas: agent_developer, platform_admin :learning-objective-1: Filter transcripts to find specific execution traces :learning-objective-2: Use the timeline interactively to navigate to specific time periods :learning-objective-3: Navigate between detail views to inspect span information at different levels -Use the Transcripts view to investigate agent, MCP server, and AI Gateway execution traces. Filter by operation type, inspect span details, and trace issues across your agentic systems. +Use the Transcripts view to investigate end-to-end execution records for agents, MCP servers, and AI Gateway. Each transcript captures the complete lifecycle of an agentic behavior on Redpanda's immutable distributed log. Filter by operation type, inspect span details, and trace issues across your agentic systems. For conceptual background on spans and trace structure, see xref:ai-agents:observability/concepts.adoc[].