Add JARVIS-style TTS status reports with neural voice and HTTP API#9
Open
DannyNs wants to merge 1 commit intodyoburon:mainfrom
Open
Add JARVIS-style TTS status reports with neural voice and HTTP API#9DannyNs wants to merge 1 commit intodyoburon:mainfrom
DannyNs wants to merge 1 commit intodyoburon:mainfrom
Conversation
- Neural TTS via edge-tts (en-GB-RyanNeural) with chunked playback for low-latency long text. First chunk plays immediately while rest generates in background. Falls back to platform TTS (SAPI/say/espeak-ng) offline. - HTTP API server on port 7865: POST /api/speak lets any external tool (Claude Code, scripts, browser) trigger spoken feedback. - Feedback hotkey mode (Cmd+Shift+F) pastes transcription with endpoint instructions so the receiving LLM can speak back via the API. - Cross-platform: Python CLI, Windows native (C#), macOS native (Swift). Native apps shell out to edge-tts CLI with ffplay/afplay for headless playback, falling back to built-in speech synthesizers. - Configurable via ~/.vibetotext/config.json: tts_enabled, tts_voice, tts_edge_rate, tts_edge_pitch, tts_rate, tts_volume. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DannyNs
added a commit
to DannyNsITServices/devglide
that referenced
this pull request
Mar 18, 2026
Split long text into rolling chunks of 2-3 sentences and pipeline generation + playback so the first chunk plays almost immediately while subsequent chunks generate in the background. Reference: dyoburon/vibetotext#9
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Neural TTS via edge-tts (en-GB-RyanNeural) with chunked playback for low-latency long text. First chunk plays immediately while rest generates in background. Falls back to platform TTS (SAPI/say/espeak-ng) offline.
HTTP API server on port 7865: POST /api/speak lets any external tool (Claude Code, scripts, browser) trigger spoken feedback.
Feedback hotkey mode (Cmd+Shift+F) pastes transcription with endpoint instructions so the receiving LLM can speak back via the API.
Cross-platform: Python CLI, Windows native (C#), macOS native (Swift). Native apps shell out to edge-tts CLI with ffplay/afplay for headless playback, falling back to built-in speech synthesizers.
Configurable via ~/.vibetotext/config.json: tts_enabled, tts_voice, tts_edge_rate, tts_edge_pitch, tts_rate, tts_volume.
flowchart TD A[🎤 User speaks\nhold hotkey + speak] --> B[Whisper transcribes\nlocal, offline] B --> C{Which hotkey mode?} C -->|Ctrl+Shift| D[Transcribe] C -->|Alt+Shift| E[Cleanup\nGemini refines] C -->|Cmd+Alt| F[Plan\nGemini generates] C -->|Greppy| G[Greppy\nsemantic search] C -->|Cmd+Shift+F| H[Feedback mode] D --> I[paste_at_cursor\ntext lands in editor] E --> I F --> I G --> I I --> J[speak_status via edge-tts] J --> K[Chunked playback pipeline] K --> K1{text > 100 chars?} K1 -->|No| K2[Single mp3 → 🔊] K1 -->|Yes| K3[Rolling chunks of 2-3 sentences] K3 --> K4[Chunk 1: generate + play 🔊] K3 --> K5[Chunk 2: generate in parallel...] K5 --> K6[Play when ready 🔊] H --> L[Paste transcription\n+ endpoint info] L --> M[LLM reads paste\ne.g. Claude Code] M --> N[LLM does work, then calls\nPOST /api/speak] N --> O[HTTP API Server\n127.0.0.1:7865] O --> P[edge-tts → mp3\nffplay/afplay → 🔊] style A fill:#4CAF50,color:#fff style H fill:#FF9800,color:#fff style O fill:#2196F3,color:#fff style K2 fill:#8BC34A,color:#fff style K6 fill:#8BC34A,color:#fff style P fill:#8BC34A,color:#fff