Summary
In demo/web_demo/WebRTC_Demo, duplex mode can show normal subtitles/state transitions but produce no audible output, or very choppy output (sentence split into discontinuous fragments).
Environment
- macOS (Apple Silicon), M1 Max, 64GB RAM
- WebRTC_Demo started via
bash oneclick.sh start
- model:
openbmb/MiniCPM-o-4_5-gguf
- LiveKit + backend + cpp server all healthy
Symptoms
- Frontend receives
<state><audio_start>, <state><generate_end>, subtitles are correct
- Browser side shows remote audio track attached, but sound is missing or highly discontinuous
- C++ logs show generation succeeded and WAV chunks sent
Example (from logs):
- total generation time ~4.7s
- total audio duration ~2.0s
- RTF ~2.35x
- only a few chunks sent
Root Cause (confirmed locally)
In backend audio path, int16 PCM is directly fed into scipy.signal.resample_poly(...) in voice_chat/omni_stream.py. In this path, resampled output can become near-zero/all-zero for some inputs, so LiveKit receives effectively silent frames.
Additionally, coarse chunking + queue underflow + aggressive generate_end/play_end transition can make output choppy.
Minimal Fix
Before resampling:
- Convert source audio to normalized float32 in [-1, 1]
- Run
resample_poly on float32
- Convert back to int16 after resample
This fixed the silent-audio issue immediately in local verification.
Suggested Code Locations
WebRTC_Demo/WebRTC_Demo/omini_backend_code/code/voice_chat/omni_stream.py
- model wav -> WebRTC path
- local TTS wav -> WebRTC path
- any helper resampling functions
Optional Improvements for Choppy Playback
- reduce output chunk size (finer granularity)
- avoid queue starvation in
output_audio
- delay
play_end decision to avoid premature turn-end during short gaps
Repro Steps
cd demo/web_demo/WebRTC_Demo
bash oneclick.sh start
- open frontend URL, start duplex voice conversation
- observe subtitle/state normal but audio silent/choppy
Summary
In
demo/web_demo/WebRTC_Demo, duplex mode can show normal subtitles/state transitions but produce no audible output, or very choppy output (sentence split into discontinuous fragments).Environment
bash oneclick.sh startopenbmb/MiniCPM-o-4_5-ggufSymptoms
<state><audio_start>,<state><generate_end>, subtitles are correctExample (from logs):
Root Cause (confirmed locally)
In backend audio path, int16 PCM is directly fed into
scipy.signal.resample_poly(...)invoice_chat/omni_stream.py. In this path, resampled output can become near-zero/all-zero for some inputs, so LiveKit receives effectively silent frames.Additionally, coarse chunking + queue underflow + aggressive
generate_end/play_endtransition can make output choppy.Minimal Fix
Before resampling:
resample_polyon float32This fixed the silent-audio issue immediately in local verification.
Suggested Code Locations
WebRTC_Demo/WebRTC_Demo/omini_backend_code/code/voice_chat/omni_stream.pyOptional Improvements for Choppy Playback
output_audioplay_enddecision to avoid premature turn-end during short gapsRepro Steps
cd demo/web_demo/WebRTC_Demobash oneclick.sh start