livekit · toubatbrian · Jan 16, 2026 · Jan 21, 2026 · Jan 27, 2026 · Jan 27, 2026
diff --git a/.changeset/fair-beers-wave.md b/.changeset/fair-beers-wave.md
diff --git a/.gitignore b/.gitignore
@@ -199,6 +199,8 @@ examples/src/test_*.ts
 !CONTRIBUTING.md
 !.CODE_OF_CONDUCT.md
 
+!**/CLAUDE.md
+
 # OpenTelemetry trace test output
 .traces/
 *.traces.json
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,136 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+LiveKit Agents for Node.js — a TypeScript framework for building realtime, multimodal, and voice AI agents that run on servers. This is the Node.js distribution of the [LiveKit Agents framework](https://github.com/livekit/agents) (originally Python).
+
+## Monorepo Structure
+
+- **`agents/`** — Core framework (`@livekit/agents`). Contains agent orchestration, LLM/STT/TTS abstractions, voice pipeline, metrics, IPC/process pooling, and the CLI.
+- **`plugins/`** — Provider plugins (`@livekit/agents-plugin-*`). Each implements one or more of: LLM, STT, TTS, VAD, EOU (end-of-utterance), or Avatar.
+- **`examples/`** — Example agents (private, not published). Run with `pnpm dlx tsx ./examples/src/<file>.ts dev`.
+
+**Tooling:** pnpm 9.7.0 workspaces, Turborepo for builds, tsup for bundling (CJS + ESM), TypeScript 5.4+, Vitest for tests, Changesets for versioning.
+
+## Common Commands
+
+```bash
+pnpm build                  # Build all packages (turbo)
+pnpm clean:build            # Clean dist/ dirs then rebuild
+pnpm test                   # Run all tests (vitest)
+pnpm test -- --testPathPattern=agents/src/llm  # Run tests by path
+pnpm test -- --testNamePattern="chat context"  # Run tests by name
+pnpm test:watch             # Watch mode
+pnpm lint                   # ESLint all packages
+pnpm lint:fix               # ESLint with auto-fix
+pnpm format:check           # Prettier check
+pnpm format:write           # Prettier format
+pnpm api:check              # API Extractor validation
+pnpm api:update             # Update API declarations
+```
+
+### Running an example agent
+
+```bash
+pnpm build && pnpm dlx tsx ./examples/src/basic_agent.ts dev --log-level=debug
+```
+
+Required env vars: `LIVEKIT_URL`, `LIVEKIT_API_KEY`, `LIVEKIT_API_SECRET`, plus provider keys (e.g. `OPENAI_API_KEY`).
+
+### Debugging individual plugins
+
+Create a test file prefixed with `test_` in `examples/src/`. No `defineAgent` wrapper needed — just import the plugin directly and run:
+
+```bash
+pnpm build && node ./examples/src/test_my_plugin.ts
+```
+
+## Architecture
+
+Each module under `agents/src/` has its own `CLAUDE.md` with detailed architecture notes. High-level overview:
+
+- **Voice pipeline** (`voice/`): Audio In → VAD → STT → LLM → TTS → Audio Out. `AgentSession` orchestrates, `AgentActivity` manages state machine. `defineAgent({ prewarm, entry })` is the entrypoint pattern.
+- **LLM** (`llm/`): `ChatContext` (chronologically ordered), `ChatMessage`, tool calling with Zod schemas, `handoff()` for multi-agent transfers. Provider format adapters for OpenAI and Google.
+- **STT** (`stt/`): `SpeechStream` with automatic retry. `StreamAdapter` converts non-streaming STT + VAD to streaming.
+- **TTS** (`tts/`): `SynthesizeStream`, `ChunkedStream`. `FallbackAdapter` for multi-provider failover. `StreamAdapter` for non-streaming providers.
+- **VAD** (`vad.ts`): Voice Activity Detection interface. Silero plugin is the primary implementation.
+- **Inference** (`inference/`): LiveKit Inference Gateway clients. Always use full `provider/model` format (e.g., `'openai/gpt-4o-mini'`).
+- **Stream** (`stream/`): Composable Web Streams API primitives (`StreamChannel`, `DeferredStream`, `MultiInputStream`).
+- **IPC** (`ipc/`): Process pool for running agents in child processes. Two-way IPC: child sends inference requests back to parent.
+- **Worker** (`worker.ts`): Main process connecting to LiveKit server, receives job assignments, spawns agent processes.
+- **Plugins** (`livekit-plugins/`): Each extends `Plugin` base class. Pattern: `@livekit/agents-plugin-<provider>`. Exports typed implementations (e.g., `openai.LLM`, `deepgram.STT`).
+
+## Code Conventions
+
+- **License header** required on every new file:
+  ```
+  // SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+  //
+  // SPDX-License-Identifier: Apache-2.0
+  ```
+- **Prettier**: single quotes, trailing commas, 100 char width, sorted imports.
+- **ESLint**: `@typescript-eslint` with strict rules. Prefix unused vars with `_`. Use `type` imports (`consistent-type-imports`).
+- **TypeScript**: strict mode, `noUncheckedIndexedAccess`, `verbatimModuleSyntax`, target ES2022, module node16.
+- **Time units**: Use milliseconds for all time-based values by default. Only use seconds when the name explicitly ends with `InS`.
+- **Changesets**: All packages in `agents/` and `plugins/` release together (fixed versioning). Run `pnpm changeset` to add a changeset before PRing. The examples package is ignored.
+- **API Extractor**: Public API surface is tracked. Run `pnpm api:check` after changing exports and `pnpm api:update` to update declarations.
+
+## Testing
+
+- **Framework**: Vitest with 5s default timeout.
+- **Pattern**: `*.test.ts` files co-located with source.
+- **Snapshots**: Used in LLM chat/tool context tests (`agents/src/llm/__snapshots__/`).
+- **Inference LLM tests**: Always use full model names from `agents/src/inference/models.ts` (e.g. `'openai/gpt-4o-mini'`, not `'gpt-4o-mini'`). Initialize logger first: `initializeLogger({ pretty: true })`.
+
+## Porting from Python (`livekit-agents`)
+
+When porting features or fixes from the Python `livekit-agents` repo to this JS/TS repo, follow these rules:
+
+### 1. Python reference comments (`// Ref`)
+
+Every JS change that corresponds to a Python change must carry an inline reference comment directly above the relevant line(s):
+
+```ts
+// Ref: python <relative-file-path> - <line-range> lines
+```
+
+Examples:
+
+```ts
+// Ref: python livekit-agents/livekit/agents/voice/agent_session.py - 362-369 lines
+private _aecWarmupRemaining = 0;
+
+// Ref: python livekit-agents/livekit/agents/voice/agent_activity.py - 1236-1240 lines
+if (this.agentSession._aecWarmupRemaining > 0) { ... }
+```
+
+Use the Python file path relative to the repo root. Include the line range from the Python diff so reviewers can cross-reference directly.
+
+### 2. Time unit unification
+
+Python uses **seconds** (`float`) for all time values. JS/TS uses **milliseconds** (`number`) by default.
+
+When porting a Python time parameter:
+
+- Multiply the Python default by `1000` for the JS default (e.g. `3.0 s` → `3000 ms`)
+- Use `setTimeout` / `clearTimeout` directly with the ms value — do **not** multiply by `1000` at call sites
+- Name the field in plain form (e.g. `aecWarmupDuration`, `userAwayTimeout`) — the ms convention is implied
+- Only use seconds as the unit if the variable name explicitly ends with `InS` (e.g. `delayInS`)
+
+Example mapping:
+
+| Python                                            | JS/TS                                      |
+| ------------------------------------------------- | ------------------------------------------ |
+| `aec_warmup_duration: float = 3.0`                | `aecWarmupDuration: number \| null = 3000` |
+| `user_away_timeout: float = 15.0`                 | `userAwayTimeout: number \| null = 15000`  |
+| `loop.call_later(self._aec_warmup_remaining, cb)` | `setTimeout(cb, this._aecWarmupRemaining)` |
+
+## CI Requirements
+
+- REUSE/SPDX license compliance
+- ESLint passes
+- Prettier formatting passes
+- Full build succeeds
+- Base branch: `main`
diff --git a/agents/src/inference/CLAUDE.md b/agents/src/inference/CLAUDE.md
@@ -0,0 +1,21 @@
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
+
+LiveKit Inference Gateway clients for LLM, STT, and TTS. Provides unified interface over LiveKit's cloud inference service.
+
+## Key Classes
+
+- **LLM** — OpenAI-compatible client pointing at LiveKit Inference Gateway. Dynamic JWT token generation for auth. Supports provider format adapters (OpenAI, Google).
+- **STT** — WebSocket-based STT client. Streams audio as base64 frames in 50ms chunks. Supports live model/language switching via reconnect events.
+- **TTS** — WebSocket-based TTS client (sibling pattern to STT).
+
+## Non-Obvious Patterns
+
+- **Model strings must be `provider/model` format**: e.g., `'openai/gpt-4o-mini'`, `'deepgram/nova-3'`. Never just `'gpt-4o-mini'`.
+- **STT language parsing**: Parses `model:language` from the model string (e.g., `'deepgram/nova-3:en'`).
+- **STT fallback chains**: If primary model fails, gateway tries fallback models in order.
+- **Zod validation**: All gateway protocol messages validated with Zod schemas in `api_protos.ts`.
+- **Google thought_signature**: LLMStream preserves `thoughtSignature` across parallel tool calls in a batch, only resets at end of response.
+
+## Subdirectory
+
+- `interruption/` — Advanced interrupt detection logic (ML-based adaptive detector).
diff --git a/agents/src/ipc/CLAUDE.md b/agents/src/ipc/CLAUDE.md
@@ -0,0 +1,22 @@
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
+
+Inter-process communication for running agents in child Node.js processes.
+
+## Key Classes
+
+- **ProcPool** — Manages warm process pool. Pre-spawns processes and queues them for reuse. Uses `MultiMutex` to control warm process count.
+- **SupervisedProc** — Base class for child process lifecycle: health monitoring (ping/pong), memory limits (warns at threshold, kills at limit), graceful shutdown.
+- **JobProcExecutor** — Extends SupervisedProc. Forks child process for job execution. Handles inference requests from child by delegating to parent's `InferenceExecutor`.
+- **InferenceExecutor** — Interface with single `doInference(method, data)` method. Runs in parent process to share GPU/model resources.
+
+## IPC Protocol
+
+Strongly-typed message union in `message.ts`: `initializeRequest/Response`, `pingRequest/pongResponse`, `startJobRequest`, `shutdownRequest`, `inferenceRequest/Response`, `exiting`, `done`.
+
+## Non-Obvious Patterns
+
+- **Two-way IPC**: Child sends inference requests → parent executes with shared models → parent sends results back. This avoids loading models in every child process.
+- **TypeScript child process**: `createProcess()` detects TS files and passes appropriate `execArgv` so the TS loader works in the child.
+- **Future-based sync**: `init` and `join` Futures prevent race conditions during process startup and shutdown.
+- **Graceful shutdown**: Sends `shutdownRequest`, waits up to `closeTimeout`, then forceful `kill()`.
+- **Only `InferenceExecutor` is publicly exported** — the rest is internal.
diff --git a/agents/src/llm/CLAUDE.md b/agents/src/llm/CLAUDE.md
@@ -0,0 +1,36 @@
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
+
+LLM integration: chat context management, tool/function calling, provider format adapters, and realtime model abstractions.
+
+## Key Classes
+
+- **ChatContext** — Ordered container of `ChatItem` (ChatMessage | FunctionCall | FunctionCallOutput | AgentHandoffItem). Items sorted by `createdAt` timestamp, enabling out-of-order insertion.
+- **ChatMessage** — Single message with polymorphic content (string, ImageContent, AudioContent). Role: 'developer' | 'system' | 'user' | 'assistant'.
+- **FunctionCall / FunctionCallOutput** — Tool invocation and result, matched by `callId`. FunctionCall has `groupId` for parallel calls and `thoughtSignature` for Gemini thinking mode.
+- **ReadonlyChatContext** — Immutable wrapper that throws on mutation. Used in callbacks.
+- **LLM / LLMStream** — Abstract base classes for all LLM plugins. LLMStream handles retry with exponential backoff and metrics (TTFT, token counts).
+- **RealtimeModel / RealtimeSession** — Abstractions for streaming/realtime APIs (e.g., OpenAI Realtime).
+
+## Tool System (`tool_context.ts`)
+
+- `tool({ description, parameters, execute })` — Factory function. Parameters accept Zod v3, Zod v4, or raw JSON Schema.
+- `handoff({ agent, returns })` — Return from tool to transfer to another agent.
+- **Symbol-based type markers**: Tools use private symbols (`TOOL_SYMBOL`, `FUNCTION_TOOL_SYMBOL`, etc.) for runtime discrimination — prevents spoofing.
+- **ToolOptions**: Tools receive `{ ctx: RunContext<UserData>, toolCallId, abortSignal }`.
+
+## Provider Format Adapters (`provider_format/`)
+
+Three formats: `'openai'`, `'openai.responses'`, `'google'`.
+
+- **`groupToolCalls()`** — Core algorithm shared by all adapters. Groups assistant messages with their tool calls and outputs by ID/groupId.
+- **OpenAI**: Standard chat completions format with `tool_calls` array and `tool` role responses.
+- **Google**: Turn-based with parts array. System messages extracted separately. Injects dummy user message (`.`) if last turn isn't user (Gemini requirement). Preserves `thoughtSignature` for thinking-mode models.
+- **Image caching**: `ImageContent._cache` stores serialized versions to avoid re-encoding across provider conversions.
+
+## Non-Obvious Patterns
+
+- **Chronological insertion**: `ChatContext` maintains sorted order by `createdAt`. Late-arriving items (e.g., streamed chunks with timestamps) are inserted in correct position.
+- **LCS-based diff**: `computeChatCtxDiff()` uses longest common subsequence for minimal create/remove operations — used by `RemoteChatContext` for IPC sync.
+- **RemoteChatContext**: Linked-list based context for incremental updates. Insert by previous item ID, convert back via `toChatCtx()`.
+- **Zod dual-version**: `zod-utils.ts` auto-detects Zod v3 (`_def.typeName`) vs v4 (`_zod` property) and routes schema conversion accordingly.
+- **FallbackAdapter**: Multi-LLM failover with availability tracking, recovery tasks, and `availability_changed` events.
diff --git a/agents/src/metrics/CLAUDE.md b/agents/src/metrics/CLAUDE.md
@@ -0,0 +1,14 @@
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
+
+Per-model/provider usage tracking and aggregation for billing and analytics.
+
+## Key Components
+
+- **ModelUsageCollector** — Aggregates metrics by `provider:model` key. Handles both standard LLM metrics and RealtimeModel metrics (with token detail breakdowns: text, image, audio, cached).
+- **Usage types**: `LLMModelUsage`, `TTSModelUsage`, `STTModelUsage`, `InterruptionModelUsage` — each with provider-specific fields.
+- **`filterZeroValues()`** — Strips zero-valued fields from usage objects for clean JSON output.
+
+## Non-Obvious Patterns
+
+- **Session duration tracking**: Some models (xAI) bill by session duration rather than tokens — tracked in `sessionDurationMs`.
+- **UsageCollector is deprecated** — use `ModelUsageCollector` instead.
diff --git a/agents/src/stream/CLAUDE.md b/agents/src/stream/CLAUDE.md
@@ -0,0 +1,17 @@
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
+
+Low-level async stream composition primitives built on the Web Streams API (`ReadableStream`, `WritableStream`, `TransformStream`).
+
+## Key Classes
+
+- **StreamChannel<T, E>** — Bidirectional stream: write to it, read from it. `addStreamInput()` launches async reader loops to pipe external streams in.
+- **DeferredReadableStream<T>** — Readable stream where the actual source is set later via `setSource()`. Supports detach/reattach.
+- **MultiInputStream<T>** — Fan-in multiplexer: N dynamic inputs → 1 output. Inputs can be added/removed at runtime. Output stays open after all inputs end (waits for new inputs).
+- **IdentityTransform<T>** — Pass-through `TransformStream` with `highWaterMark` set to `MAX_SAFE_INTEGER` to prevent backpressure.
+- **mergeReadableStreams()** — Functional merge of N streams (adapted from Deno). If one errors, merged output closes.
+
+## Non-Obvious Patterns
+
+- **IdentityTransform high water mark**: Intentionally disables backpressure on both sides. This follows the Python agents `channel.py` pattern — needed for concurrent sources.
+- **Reader lock cleanup**: TypeErrors from releasing already-released locks are caught and ignored throughout. This is intentional.
+- **MultiInputStream resilience**: Errors in one input don't kill the output stream. Failed inputs are removed silently.
diff --git a/agents/src/stt/CLAUDE.md b/agents/src/stt/CLAUDE.md
@@ -0,0 +1,24 @@
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
+
+Speech-to-text abstractions with streaming, VAD-based adapters, and automatic retry.
+
+## Key Classes
+
+- **STT** — Abstract base. Subclasses implement `_recognize()` (one-shot) and `stream()` (streaming). Emits `metrics_collected` and `error` events.
+- **SpeechStream** — Async iterable consuming audio frames via `pushFrame()`, yielding `SpeechEvent` objects. Handles audio resampling internally if sample rates don't match.
+- **StreamAdapter** — Wraps a non-streaming STT + VAD to create a streaming interface. Buffers audio during speech, calls `recognize()` on end-of-speech.
+
+## Architecture
+
+```
+pushFrame() → AudioResampler (if needed) → AsyncIterableQueue → run() (provider impl) → output queue → consumer
+```
+
+## Non-Obvious Patterns
+
+- **Dual queue architecture**: Input queue, intermediate queue (for metrics monitoring), and output queue run concurrently.
+- **FLUSH_SENTINEL**: Private static symbol signals flush operations internally without creating actual events.
+- **startSoon() in constructor**: Defers `mainTask()` until after constructor completes to avoid accessing uninitialized fields.
+- **Resampler created on-demand**: Only instantiated when first frame with different sample rate arrives.
+- **Retry with exponential backoff**: `mainTask()` retries on `APIError`/`APIConnectionError`; other errors are immediately fatal.
+- **startTimeOffset**: Can offset transcription timestamps for stream resumption.
diff --git a/agents/src/telemetry/CLAUDE.md b/agents/src/telemetry/CLAUDE.md
@@ -0,0 +1,16 @@
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
+
+OpenTelemetry integration for distributed tracing, logging, and session report uploads.
+
+## Key Components
+
+- **DynamicTracer** — Runtime-switchable tracer provider wrapper. Global instance exported as `tracer`.
+- **setupCloudTracer()** — Complete cloud observability setup: OTLP exporter, metadata span processor, Pino cloud log exporter. Uses JWT for auth.
+- **uploadSessionReport()** — Uploads chat history (JSON), metrics (protobuf header), and audio (OGG) to LiveKit Cloud via multipart FormData.
+- **MetadataLogProcessor / ExtraDetailsProcessor** — Inject room_id, job_id, logger names into all log records.
+
+## Non-Obvious Patterns
+
+- **Monotonic timestamp ordering**: Session report adds 1μs offsets to colliding timestamps to ensure correct dashboard display ordering.
+- **Dynamic tracer provider**: Can change tracer provider mid-session (used when cloud connection establishes after startup).
+- **Metadata injection**: All spans automatically tagged with room_id and job_id via `MetadataSpanProcessor`.
diff --git a/agents/src/tokenize/CLAUDE.md b/agents/src/tokenize/CLAUDE.md
@@ -0,0 +1,15 @@
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
-# CLAUDE.md
+<!--
+SPDX-FileCopyrightText: 2026 LiveKit, Inc.
+
+SPDX-License-Identifier: Apache-2.0
+-->
+
+# CLAUDE.md
+
+Streaming text tokenization for real-time TTS. Incrementally splits text into sentences or words with configurable buffering.
+
+## Key Classes
+
+- **SentenceTokenizer / WordTokenizer** — Abstract bases with `tokenize()` (batch) and `stream()` (streaming) methods.
+- **BufferedTokenStream** — Core streaming implementation. Buffers input until `minContextLength`, then tokenizes and holds output until `minTokenLength` before emitting. Each `flush()` generates a new `segmentId`.
+- **Basic implementations** (`basic/`) — Default English tokenizers using rule-based sentence/word splitting. Includes hyphenation support.
+
+## Non-Obvious Patterns
+
+- **Designed for TTS pipeline**: Text arrives incrementally from LLM streaming. Tokenizer buffers enough context for accurate sentence boundaries before emitting.
+- **Tuple tokens**: Some tokenizers return `[text, startPos, endPos]` tuples for position tracking, not just strings.
+- **Segment tracking**: `flush()` creates new segment IDs, allowing consumers to distinguish continuous speech from intentional breaks.