Hi, thank you for your work on the WebRTC Real-Time Video Interaction Demo — it's very helpful.
I have successfully implemented the demo, but I’m facing issues while trying to customize and modify the pipeline components, specifically:
1. Changing STT, LLM, and TTS Components
I want to replace the default:
-
Speech-to-Text (STT)
-
Language Model (LLM)
-
Text-to-Speech (TTS)
However, it’s unclear:
-
Where these components are initialized in the codebase
-
What configuration or interfaces need to be modified to swap them
-
Whether there are recommended extension points or documentation for customizing these modules
Could you please provide guidance on how to properly replace or reconfigure STT, LLM, and TTS in the demo?
2. TTS Breaking / Missing During Initialization
When initializing the demo:
-
The TTS component sometimes fails to load
-
Audio output breaks or does not play properly
-
In some cases, TTS appears to be missing or not triggered
I would appreciate help with:
-
Possible reasons for TTS failing during initialization
-
Required dependencies or environment setup for TTS
-
Debugging steps or logs to check
-
Recommended fixes or configuration changes
Environment
OS: Ubuntu 22.04
Python Version: 3.12
GPU/CPU Setup: L4, 24GB of GPU memory
Browser: Brave
Any guidance or documentation references would be greatly appreciated.
Hi, thank you for your work on the WebRTC Real-Time Video Interaction Demo — it's very helpful.
I have successfully implemented the demo, but I’m facing issues while trying to customize and modify the pipeline components, specifically:
1. Changing STT, LLM, and TTS Components
I want to replace the default:
Speech-to-Text (STT)
Language Model (LLM)
Text-to-Speech (TTS)
However, it’s unclear:
Where these components are initialized in the codebase
What configuration or interfaces need to be modified to swap them
Whether there are recommended extension points or documentation for customizing these modules
Could you please provide guidance on how to properly replace or reconfigure STT, LLM, and TTS in the demo?
2. TTS Breaking / Missing During Initialization
When initializing the demo:
The TTS component sometimes fails to load
Audio output breaks or does not play properly
In some cases, TTS appears to be missing or not triggered
I would appreciate help with:
Possible reasons for TTS failing during initialization
Required dependencies or environment setup for TTS
Debugging steps or logs to check
Recommended fixes or configuration changes
Environment
OS: Ubuntu 22.04
Python Version: 3.12
GPU/CPU Setup: L4, 24GB of GPU memory
Browser: Brave
Any guidance or documentation references would be greatly appreciated.