Skip to content

Allow passing pre-computed phonemes to Kokoro TTS#946

Merged
IgorSwat merged 3 commits intosoftware-mansion:mainfrom
yocontra:feat/custom-phonemizer-escape-hatch
Mar 9, 2026
Merged

Allow passing pre-computed phonemes to Kokoro TTS#946
IgorSwat merged 3 commits intosoftware-mansion:mainfrom
yocontra:feat/custom-phonemizer-escape-hatch

Conversation

@yocontra
Copy link
Contributor

@yocontra yocontra commented Mar 8, 2026

Right now if you want to use Kokoro TTS, you have to go through the built-in phonemis G2P pipeline. There's no way around it. This PR adds generateFromPhonemes / streamFromPhonemes methods that let you skip phonemis and pass your own IPA phoneme strings directly to the synthesis engine.

Why would you want this? A few reasons we've run into:

  • phonemis doesn't handle every word well. Libraries like phonemizer (espeak-ng backend) do better on edge cases, foreign words, etc.
  • Custom lexicons. If you have domain-specific pronunciation (game character names, medical terms), you probably want control over the G2P step.
  • Server-side G2P. Pre-compute phonemes on a server with a proper NLP pipeline, send them to the device.
  • Languages phonemis doesn't cover yet.

What changed

The existing generate() / stream() methods now delegate to shared internal helpers (generateFromPhonemesImpl / streamFromPhonemesImpl). The new public methods call the same helpers but skip the phonemizer_.process() step. No behavior change for existing callers.

Changes across layers:

  • C++ Kokoro: generateFromPhonemes, streamFromPhonemes + input validation (empty string, invalid UTF-8)
  • JSI ModelHostObject: exposes new methods
  • TextToSpeechModule: forwardFromPhonemes(), streamFromPhonemes() (shared streamImpl helper, no copy-paste)
  • useTextToSpeech hook: same, with shared guard + streaming orchestration
  • Types: TextToSpeechPhonemeInput, TextToSpeechStreamingPhonemeInput, TextToSpeechStreamingCallbacks

Usage

const tts = new TextToSpeechModule();
await tts.load(config);

// text path (unchanged -- goes through phonemis)
const audio = await tts.forward("Hello world");

// phoneme path (bypasses phonemis)
const audio = await tts.forwardFromPhonemes("həloʊ wɝːld");

// streaming
for await (const chunk of tts.streamFromPhonemes({ phonemes: "həloʊ wɝːld", speed: 1.0 })) {
  playAudio(chunk);
}

Test plan

  • Existing generate() and stream() still work (refactor is internal)
  • generateFromPhonemes() with known Kokoro IPA strings
  • streamFromPhonemes() produces same audio as stream() for identical phonemes
  • Multi-byte UTF-8 phoneme characters (ʊ, ɪ, ŋ, etc.)
  • Empty string and invalid UTF-8 rejected with proper error

yocontra added 2 commits March 7, 2026 21:07
Add `generateFromPhonemes` / `streamFromPhonemes` methods that accept
pre-computed IPA phoneme strings, bypassing the built-in phonemis
pipeline. This enables users to plug in any external G2P system
(e.g. the Python `phonemizer` library, espeak-ng, or custom
phonemizers) while still using the Kokoro synthesis engine.

Changes across all layers:
- C++ Kokoro: new public methods + shared impl helpers + UTF-8→UTF-32
- JSI ModelHostObject: expose new methods via promiseHostFunction
- TextToSpeechModule: `forwardFromPhonemes()` and `streamFromPhonemes()`
- useTextToSpeech hook: corresponding hook methods
- Types: `TextToSpeechPhonemeInput`, `TextToSpeechStreamingPhonemeInput`
@yocontra yocontra changed the title feat: custom phonemizer escape hatch (BYO G2P) Allow passing pre-computed phonemes to Kokoro TTS Mar 8, 2026
@benITo47 benITo47 requested a review from IgorSwat March 9, 2026 06:09
@IgorSwat
Copy link
Contributor

IgorSwat commented Mar 9, 2026

Hey @yocontra !

Thank you very much for your contribution. I reviewed the code and it looks pretty good - and the new API methods indeed make a lot of sense.
I just added some changes to the docs to match the new API.

I can also reveal that we have started to work on big update to both Phonemis phonemizer as well as Kokoro-based core, which will add new languages support and improve the performance of the model significantly. So if you are interested in this module, stay tuned for that :)

@IgorSwat IgorSwat merged commit 310465d into software-mansion:main Mar 9, 2026
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants