Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 90 additions & 28 deletions server/services/s2s/openai.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,8 @@ Before using OpenAI Realtime services, you need:
<ParamField path="model" type="str" default="gpt-realtime-1.5">
OpenAI Realtime model name. This is a connection-level parameter set via the
WebSocket URL and cannot be changed during the session.

**Deprecated**: Pass via `settings=OpenAIRealtimeLLMSettings(model="...")` instead.
</ParamField>

<ParamField
Expand All @@ -95,6 +97,13 @@ Before using OpenAI Realtime services, you need:
Configuration properties for the realtime session. These are session-level
settings that can be updated during the session (except for voice and model).
See [SessionProperties](#sessionproperties) below.

**Deprecated**: Use `settings=OpenAIRealtimeLLMSettings(session_properties=...)` instead.
</ParamField>

<ParamField path="settings" type="OpenAIRealtimeLLMSettings" default="None">
Runtime-updatable settings for this service. Preferred method for configuring
the service. See [OpenAIRealtimeLLMSettings](#openairealtimellmsettings) below.
</ParamField>

<ParamField path="start_audio_paused" type="bool" default="False">
Expand All @@ -112,6 +121,24 @@ Before using OpenAI Realtime services, you need:
`"high"` provides more detail.
</ParamField>

### OpenAIRealtimeLLMSettings

Runtime-updatable settings for OpenAI Realtime. All fields from `SessionProperties` can be passed here, and will be automatically routed to the session configuration.

<ParamField path="model" type="str" default="None">
Model to use. Syncs bidirectionally with `session_properties.model`.
</ParamField>

<ParamField path="system_instruction" type="str" default="None">
System instructions for the assistant. Syncs bidirectionally with `session_properties.instructions`.
</ParamField>

<ParamField path="session_properties" type="SessionProperties" default="None">
OpenAI Realtime session properties (modalities, audio config, tools, etc.).
`model` and `instructions` fields are synced with the top-level `model` and
`system_instruction` fields. Top-level values take precedence when both are set.
</ParamField>

### SessionProperties

Session-level configuration passed via the `session_properties` constructor argument. These settings can be updated during the session using `LLMUpdateSettingsFrame`.
Expand Down Expand Up @@ -173,18 +200,20 @@ Alternatively, use `SemanticTurnDetection` for semantic-based detection:

```python
import os
from pipecat.services.openai.realtime import OpenAIRealtimeLLMService
from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMService

llm = OpenAIRealtimeLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-realtime-1.5",
)
```

### With Session Configuration

```python
from pipecat.services.openai.realtime import OpenAIRealtimeLLMService
from pipecat.services.openai.realtime.llm import (
OpenAIRealtimeLLMService,
OpenAIRealtimeLLMSettings,
)
from pipecat.services.openai.realtime.events import (
SessionProperties,
AudioConfiguration,
Expand All @@ -194,51 +223,62 @@ from pipecat.services.openai.realtime.events import (
SemanticTurnDetection,
)

session_properties = SessionProperties(
instructions="You are a helpful assistant.",
audio=AudioConfiguration(
input=AudioInput(
transcription=InputAudioTranscription(model="gpt-4o-transcribe"),
turn_detection=SemanticTurnDetection(eagerness="medium"),
),
output=AudioOutput(
voice="alloy",
speed=1.0,
),
),
max_output_tokens=4096,
)

llm = OpenAIRealtimeLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-realtime-1.5",
session_properties=session_properties,
settings=OpenAIRealtimeLLMSettings(
system_instruction="You are a helpful assistant.",
session_properties=SessionProperties(
audio=AudioConfiguration(
input=AudioInput(
transcription=InputAudioTranscription(model="gpt-4o-transcribe"),
turn_detection=SemanticTurnDetection(eagerness="medium"),
),
output=AudioOutput(
voice="alloy",
speed=1.0,
),
),
max_output_tokens=4096,
),
),
)
```

### With Disabled Turn Detection (Manual Control)

```python
session_properties = SessionProperties(
audio=AudioConfiguration(
input=AudioInput(
turn_detection=False,
),
),
from pipecat.services.openai.realtime.llm import (
OpenAIRealtimeLLMService,
OpenAIRealtimeLLMSettings,
)
from pipecat.services.openai.realtime.events import (
SessionProperties,
AudioConfiguration,
AudioInput,
)

llm = OpenAIRealtimeLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-realtime-1.5",
session_properties=session_properties,
settings=OpenAIRealtimeLLMSettings(
session_properties=SessionProperties(
audio=AudioConfiguration(
input=AudioInput(
turn_detection=False,
),
),
),
),
)
```

### Updating Settings at Runtime

Settings can be updated during a session using `LLMUpdateSettingsFrame`. You can pass `SessionProperties` fields directly in the settings dict, and they will be automatically routed to `session_properties`:

```python
from pipecat.frames.frames import LLMUpdateSettingsFrame

# Update session properties - keys are automatically routed to session_properties
await task.queue_frame(
LLMUpdateSettingsFrame(
settings={
Expand All @@ -249,8 +289,30 @@ await task.queue_frame(
)
```

Alternatively, you can use `LLMUpdateSettingsFrame` with a `delta` parameter for type-safe updates:

```python
from pipecat.frames.frames import LLMUpdateSettingsFrame
from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMSettings
from pipecat.services.openai.realtime.events import SessionProperties

await task.queue_frame(
LLMUpdateSettingsFrame(
delta=OpenAIRealtimeLLMSettings(
system_instruction="Now speak in Spanish.",
session_properties=SessionProperties(
max_output_tokens=2048,
),
)
)
)
```

## Notes

- **New settings pattern**: Use `settings=OpenAIRealtimeLLMSettings(...)` for configuration. The legacy `session_properties` and `model` constructor parameters are deprecated but still supported.
- **Bidirectional sync**: `model` and `system_instruction` at the top level are synced with `session_properties.model` and `session_properties.instructions`. Top-level values take precedence when both are set.
- **Automatic routing**: When updating settings with `LLMUpdateSettingsFrame`, `SessionProperties` fields (like `instructions`, `output_modalities`, etc.) are automatically routed to `session_properties`.
- **Model is connection-level**: The `model` parameter is set via the WebSocket URL at connection time and cannot be changed during a session.
- **Output modalities are single-mode**: The API supports either `["text"]` or `["audio"]` output, not both simultaneously.
- **Turn detection options**: Use `TurnDetection` for traditional VAD, `SemanticTurnDetection` for AI-based turn detection, or `False` to disable server-side detection and manage turns manually.
Expand Down