feat(gladia, soniox): add translation support with input_language/input_text on SpeechData by MSameerAbbas · Pull Request #5111 · livekit/agents

MSameerAbbas · 2026-03-15T18:44:10Z

Summary

Adds real-time translation support to the Soniox and Gladia STT plugins by introducing input_language and input_text fields on the core SpeechData dataclass. Also includes a cleanup of the Soniox SpeechStream to align with patterns used by other plugins like Deepgram.

Closes #4943, closes #4402. Supersedes #5148.

Core type change

Two new optional fields on SpeechData in livekit-agents/livekit/agents/stt/stt.py:

input_language: LanguageCode | None -- the detected/input language spoken by the user. Populated by STT services that support translation, where language holds the target language and input_language holds the original spoken language.
input_text: str | None -- the original transcription in the input language, when translation is active.

Both default to None so existing behavior is completely unchanged.

Soniox: new features + SpeechStream cleanup

New features:

Real-time translation (one-way and two-way) via TranslationConfig dataclass with __post_init__ validation
Configurable max_endpoint_delay_ms (500-3000ms) for tuning endpoint detection latency
models.py with Literal type aliases (SonioxRTModels, SonioxLanguages) for IDE autocomplete -- follows the same pattern as the Google STT plugin
Flush sentinel mapped to Soniox's documented FINALIZE_MSG for clean session shutdown. Previously not handled.
End-of-stream signal sent after input channel closes for graceful server-side shutdown

SpeechStream cleanup (while adding translation, simplified the streaming implementation to match other plugins):

Consolidated into a single _run() that connects, runs tasks, and cleans up. The base class _main_task() already handles retry logic, so the plugin doesn't need its own retry loop.
Reduced task count (4 -> 3): the intermediate audio_queue between _prepare_audio_task and _send_audio_task was consolidated into a single _send_task that reads _input_ch directly.
Subtasks receive the WebSocket as a parameter rather than reading self._ws, similar to how the Deepgram plugin passes connection state.
Server errors now raise APIConnectionError (5xx) or APIStatusError (4xx) so the base class can decide whether to retry. Unexpected WebSocket closure raises instead of silently returning.
END_OF_SPEECH decoupled from FINAL_TRANSCRIPT in flush_endpoint -- previously both were gated on final text presence, so an error arriving mid-speech (after interim tokens but before finalization) would skip END_OF_SPEECH, leaving downstream consumers stuck in speaking state. Pre-existing bug also present on main. Only affects turn_detection="stt" (no VAD).

Translation design: Dict-keyed accumulators ("original" / "translation") route tokens by translation_status. At output time, _pick_primary selects the translation accumulator if it has content, otherwise falls back to original. _build_speech_data attaches input_text/input_language from the original accumulator when the primary is a translation. When translation is off, all tokens route to "original" and the identity check (primary is not original) skips the input fields -- one code path, no flags, no branching.

Gladia: translation fields

Surgical addition of input_language and input_text to the existing translation handler in _process_gladia_message. Extracts original utterance language and text from the translation message data and attaches them to the SpeechData. No structural changes.

What was NOT changed

_TokenAccumulator class -- kept as-is, added a merge classmethod
STT class -- kept as-is
All STTOptions defaults preserved (model, sample_rate, num_channels, etc.)
Context dataclasses (ContextObject, ContextGeneralItem, ContextTranslationTerm) -- unchanged
Gladia plugin structure -- no cleanup, only the translation field addition

Files changed

File	Change
`livekit-agents/livekit/agents/stt/stt.py`	Add `input_language`, `input_text` to `SpeechData`
`livekit-plugins/.../soniox/stt.py`	SpeechStream rewrite + translation support
`livekit-plugins/.../soniox/__init__.py`	Export `TranslationConfig`, `SonioxLanguages`, `SonioxRTModels`
`livekit-plugins/.../soniox/models.py`	New file with `Literal` type aliases
`livekit-plugins/.../gladia/stt.py`	Add `input_language`/`input_text` to translation handler

Test plan

Two-way translation (en/ur) -- verified both directions produce correct input_text and input_language
One-way translation (to ur) -- verified single target language translation
No translation (backward compat) -- verified input_text and input_language are None, identical to previous behavior
max_endpoint_delay_ms -- verified API accepts the parameter
TranslationConfig validation -- verified __post_init__ catches missing required fields
END_OF_SPEECH lifecycle -- verified flush_endpoint emits END_OF_SPEECH independently of final text presence
Ruff format and lint -- all checks passed
mypy strict -- 0 new errors (1 pre-existing across all STT plugins)
Unit test suite (294 passed, 2 skipped, 9 errors from missing LiveKit server -- pre-existing)

Refs: #4943, #4402

MSameerAbbas · 2026-03-15T19:14:41Z

Hey @tinalenguyen, I saw this was assigned to you - hope it's helpful! Would love your review.

Rewrite the Soniox STT plugin to support all WebSocket API features and fix structural issues in the streaming implementation. New features: - Real-time translation (one-way and two-way) via TranslationConfig - Configurable max_endpoint_delay_ms (500-3000ms) - Typed Literal autocomplete for models, languages, and translation type - Flush sentinel mapped to FINALIZE_MSG for clean session shutdown Structural fixes: - Remove dead reconnect machinery (_reconnect_event was never set) - Eliminate unnecessary intermediate audio queue (2 tasks -> 1) - Pass ws as parameter to subtasks instead of mutable self._ws - Single connection lifecycle in _run(); base class handles retries - Proper error semantics (5xx -> APIConnectionError, 4xx -> APIStatusError) - Raise on unexpected WS closure instead of silent hang - Handle _FlushSentinel (was silently dropped) - Remove unreachable except clause Translation design: - alternatives[0] = original text (always present) - alternatives[1] = translated text (when translation is enabled) - Fully backward-compatible: all consumers read alternatives[0] - Dict-keyed accumulators with no special cases Refs: livekit#4943

…on time

… translation support Add input_language and input_text fields to the core SpeechData dataclass so STT plugins can expose the original spoken text alongside translations. Update both Soniox and Gladia plugins to populate these fields. - SpeechData.input_language: the detected language spoken by the user - SpeechData.input_text: the original transcription before translation - Soniox: use dict-keyed accumulators with _pick_primary selection - Gladia: extract original utterance from translation message data - Replaces the previous alternatives[1] approach with first-class fields

…dpoint Emit END_OF_SPEECH based on speaking state, not final text presence. Previously both were inside the same conditional, so if an error or finished message arrived while speaking but before final tokens accumulated, END_OF_SPEECH was skipped. This left downstream consumers in speaking state with no turn detection triggered. Only affects agents using turn_detection=stt (no VAD). Pre-existing bug also present on main and livekit#5148.

This comment was marked as resolved.

Sign in to view

MSameerAbbas changed the title ~~feat(soniox): add real-time translation support and rewrite SpeechStream~~ feat(gladia, soniox): add translation support with input_language/input_text on SpeechData Mar 20, 2026

This comment was marked as resolved.

Sign in to view

MSameerAbbas added 4 commits March 20, 2026 20:42

fix(soniox): validate TranslationConfig required fields at constructi…

212e3c3

…on time

MSameerAbbas force-pushed the feat/soniox-full-feature-support branch from 47983a7 to 27fc775 Compare March 20, 2026 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gladia, soniox): add translation support with input_language/input_text on SpeechData#5111

feat(gladia, soniox): add translation support with input_language/input_text on SpeechData#5111
MSameerAbbas wants to merge 4 commits intolivekit:mainfrom
MSameerAbbas:feat/soniox-full-feature-support

MSameerAbbas commented Mar 15, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

MSameerAbbas commented Mar 15, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MSameerAbbas commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core type change

Soniox: new features + SpeechStream cleanup

Gladia: translation fields

What was NOT changed

Files changed

Test plan

Uh oh!

This comment was marked as resolved.

Uh oh!

MSameerAbbas commented Mar 15, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MSameerAbbas commented Mar 15, 2026 •

edited

Loading