Add JARVIS-style TTS status reports with neural voice and HTTP API by DannyNs · Pull Request #9 · dyoburon/vibetotext

DannyNs · 2026-03-18T22:13:56Z

Neural TTS via edge-tts (en-GB-RyanNeural) with chunked playback for low-latency long text. First chunk plays immediately while rest generates in background. Falls back to platform TTS (SAPI/say/espeak-ng) offline.
HTTP API server on port 7865: POST /api/speak lets any external tool (Claude Code, scripts, browser) trigger spoken feedback.
Feedback hotkey mode (Cmd+Shift+F) pastes transcription with endpoint instructions so the receiving LLM can speak back via the API.
Cross-platform: Python CLI, Windows native (C#), macOS native (Swift). Native apps shell out to edge-tts CLI with ffplay/afplay for headless playback, falling back to built-in speech synthesizers.

Configurable via ~/.vibetotext/config.json: tts_enabled, tts_voice, tts_edge_rate, tts_edge_pitch, tts_rate, tts_volume.

flowchart TD
    A[🎤 User speaks\nhold hotkey + speak] --> B[Whisper transcribes\nlocal, offline]
    B --> C{Which hotkey mode?}

    C -->|Ctrl+Shift| D[Transcribe]
    C -->|Alt+Shift| E[Cleanup\nGemini refines]
    C -->|Cmd+Alt| F[Plan\nGemini generates]
    C -->|Greppy| G[Greppy\nsemantic search]
    C -->|Cmd+Shift+F| H[Feedback mode]

    D --> I[paste_at_cursor\ntext lands in editor]
    E --> I
    F --> I
    G --> I

    I --> J[speak_status via edge-tts]
    J --> K[Chunked playback pipeline]

    K --> K1{text > 100 chars?}
    K1 -->|No| K2[Single mp3 → 🔊]
    K1 -->|Yes| K3[Rolling chunks of 2-3 sentences]
    K3 --> K4[Chunk 1: generate + play 🔊]
    K3 --> K5[Chunk 2: generate in parallel...]
    K5 --> K6[Play when ready 🔊]

    H --> L[Paste transcription\n+ endpoint info]
    L --> M[LLM reads paste\ne.g. Claude Code]
    M --> N[LLM does work, then calls\nPOST /api/speak]

    N --> O[HTTP API Server\n127.0.0.1:7865]
    O --> P[edge-tts → mp3\nffplay/afplay → 🔊]

    style A fill:#4CAF50,color:#fff
    style H fill:#FF9800,color:#fff
    style O fill:#2196F3,color:#fff
    style K2 fill:#8BC34A,color:#fff
    style K6 fill:#8BC34A,color:#fff
    style P fill:#8BC34A,color:#fff

- Neural TTS via edge-tts (en-GB-RyanNeural) with chunked playback for low-latency long text. First chunk plays immediately while rest generates in background. Falls back to platform TTS (SAPI/say/espeak-ng) offline. - HTTP API server on port 7865: POST /api/speak lets any external tool (Claude Code, scripts, browser) trigger spoken feedback. - Feedback hotkey mode (Cmd+Shift+F) pastes transcription with endpoint instructions so the receiving LLM can speak back via the API. - Cross-platform: Python CLI, Windows native (C#), macOS native (Swift). Native apps shell out to edge-tts CLI with ffplay/afplay for headless playback, falling back to built-in speech synthesizers. - Configurable via ~/.vibetotext/config.json: tts_enabled, tts_voice, tts_edge_rate, tts_edge_pitch, tts_rate, tts_volume. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Split long text into rolling chunks of 2-3 sentences and pipeline generation + playback so the first chunk plays almost immediately while subsequent chunks generate in the background. Reference: dyoburon/vibetotext#9

DannyNs mentioned this pull request Mar 18, 2026

feat(voice): chunked TTS playback for long text DannyNsITServices/devglide#16

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add JARVIS-style TTS status reports with neural voice and HTTP API#9

Add JARVIS-style TTS status reports with neural voice and HTTP API#9
DannyNs wants to merge 1 commit intodyoburon:mainfrom
DannyNs:feature/tts-status-reports

DannyNs commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DannyNs commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DannyNs commented Mar 18, 2026 •

edited

Loading