Conversation
Refactor error handling to maintain exception context.
"fetch latest chnages from upstream "
| "voice_id": self._resolved_voice, | ||
| "language": self._resolved_lang_name, # display name, e.g. "Hindi" | ||
| } | ||
| await ws.send_str(json.dumps(payload)) |
There was a problem hiding this comment.
🔴 Missing _mark_started() call prevents TTS metrics from being emitted
The fonadalabs SynthesizeStream._run() never calls self._mark_started(), so self._started_time remains 0. In the base class _metrics_monitor_task at livekit-agents/livekit/agents/tts/tts.py:539, the check if not self._started_time causes _emit_metrics() to return early, meaning no TTS metrics (TTFB, duration, audio duration, etc.) are ever emitted for this plugin. Every other streaming TTS plugin in the repository (Sarvam, Cartesia, Deepgram, ElevenLabs, Google, etc.) calls self._mark_started() before or when sending input to the TTS service.
| await ws.send_str(json.dumps(payload)) | |
| self._mark_started() | |
| await ws.send_str(json.dumps(payload)) |
Was this helpful? React with 👍 or 👎 to provide feedback.
| if not segments: | ||
| raise ValueError("No text received from input channel.") | ||
|
|
||
| text = " ".join(segments) |
There was a problem hiding this comment.
🔴 " ".join(segments) introduces spurious spaces between LLM tokens
LLM tokens pushed via push_text() already contain their own whitespace (e.g. "Hello", " world", "!"). The base class itself concatenates them without spaces at livekit-agents/livekit/agents/tts/tts.py:593 (self._pushed_text += token). Using " ".join(segments) on line 288 inserts an extra space between every token, producing text like "Hello world !" instead of "Hello world!". This corrupts the text sent to the TTS API, degrading speech quality. Should use "".join(segments) instead.
Was this helpful? React with 👍 or 👎 to provide feedback.
| f"[FonadaLabs] Could not load catalog from {FONADALABS_SUPPORTED_VOICES_URL}: {exc}. " | ||
| "Language/voice validation will be skipped — server will validate instead." | ||
| ) | ||
| _catalog_cache = _Catalog(voices={}, code_to_name={}, name_to_code={}) |
There was a problem hiding this comment.
🟡 Failed catalog fetch is permanently cached, preventing recovery after transient errors
When _load_catalog fails (e.g., due to a transient network error), it caches an empty _Catalog at line 123. Because subsequent calls check if _catalog_cache is not None at line 77 and return the cached empty catalog immediately, the plugin never retries fetching the catalog. The _invalidate_catalog() function exists but is only called for specific TTS server error types (unsupported_voice, invalid_language). If the user's input happens to be valid (e.g., language="Hindi", voice="Vaanee"), the server may accept the request and _invalidate_catalog is never triggered, so the empty catalog persists for the entire process lifetime. This means client-side language/voice validation is permanently disabled after a single transient failure at startup.
Prompt for agents
In livekit-plugins/livekit-plugins-fonadalabs/livekit/plugins/fonadalabs/tts.py, the _load_catalog function at line 118-123 caches an empty _Catalog on failure. Instead, it should NOT cache the result on failure so that subsequent calls can retry. Change line 123 from caching an empty catalog to returning a temporary empty catalog without setting _catalog_cache. For example, remove the assignment to _catalog_cache in the except block and instead return a fresh empty _Catalog(voices={}, code_to_name={}, name_to_code={}) directly. This way, the next call to _load_catalog will retry the API fetch. To prevent retry storms, consider adding an asyncio.Lock and/or a backoff timer.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Adds a new TTS plugin
livekit-plugins-fonadalabsfor FonadaLabs API —a high-quality text-to-speech service specializing in Indian languages.
Features
Environment Variable
FONADALABS_API_KEY— FonadaLabs API key