Skip to content

WebRTC Real-Time Video Interaction Demo — Custom Pipeline & TTS Issues #105

@ajayjakkampudi

Description

@ajayjakkampudi

Hi, thank you for your work on the WebRTC Real-Time Video Interaction Demo — it's very helpful.

I have successfully implemented the demo, but I’m facing issues while trying to customize and modify the pipeline components, specifically:

1. Changing STT, LLM, and TTS Components

I want to replace the default:

  • Speech-to-Text (STT)

  • Language Model (LLM)

  • Text-to-Speech (TTS)

However, it’s unclear:

  • Where these components are initialized in the codebase

  • What configuration or interfaces need to be modified to swap them

  • Whether there are recommended extension points or documentation for customizing these modules

Could you please provide guidance on how to properly replace or reconfigure STT, LLM, and TTS in the demo?

2. TTS Breaking / Missing During Initialization

When initializing the demo:

  • The TTS component sometimes fails to load

  • Audio output breaks or does not play properly

  • In some cases, TTS appears to be missing or not triggered

I would appreciate help with:

  • Possible reasons for TTS failing during initialization

  • Required dependencies or environment setup for TTS

  • Debugging steps or logs to check

  • Recommended fixes or configuration changes

Environment

OS: Ubuntu 22.04

Python Version: 3.12

GPU/CPU Setup: L4, 24GB of GPU memory

Browser: Brave

Any guidance or documentation references would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions