Allow passing pre-computed phonemes to Kokoro TTS#946
Merged
IgorSwat merged 3 commits intosoftware-mansion:mainfrom Mar 9, 2026
Merged
Allow passing pre-computed phonemes to Kokoro TTS#946IgorSwat merged 3 commits intosoftware-mansion:mainfrom
IgorSwat merged 3 commits intosoftware-mansion:mainfrom
Conversation
Add `generateFromPhonemes` / `streamFromPhonemes` methods that accept pre-computed IPA phoneme strings, bypassing the built-in phonemis pipeline. This enables users to plug in any external G2P system (e.g. the Python `phonemizer` library, espeak-ng, or custom phonemizers) while still using the Kokoro synthesis engine. Changes across all layers: - C++ Kokoro: new public methods + shared impl helpers + UTF-8→UTF-32 - JSI ModelHostObject: expose new methods via promiseHostFunction - TextToSpeechModule: `forwardFromPhonemes()` and `streamFromPhonemes()` - useTextToSpeech hook: corresponding hook methods - Types: `TextToSpeechPhonemeInput`, `TextToSpeechStreamingPhonemeInput`
…xtract shared helpers, add input validation
Contributor
|
Hey @yocontra ! Thank you very much for your contribution. I reviewed the code and it looks pretty good - and the new API methods indeed make a lot of sense. I can also reveal that we have started to work on big update to both Phonemis phonemizer as well as Kokoro-based core, which will add new languages support and improve the performance of the model significantly. So if you are interested in this module, stay tuned for that :) |
IgorSwat
approved these changes
Mar 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Right now if you want to use Kokoro TTS, you have to go through the built-in phonemis G2P pipeline. There's no way around it. This PR adds
generateFromPhonemes/streamFromPhonemesmethods that let you skip phonemis and pass your own IPA phoneme strings directly to the synthesis engine.Why would you want this? A few reasons we've run into:
What changed
The existing
generate()/stream()methods now delegate to shared internal helpers (generateFromPhonemesImpl/streamFromPhonemesImpl). The new public methods call the same helpers but skip thephonemizer_.process()step. No behavior change for existing callers.Changes across layers:
Kokoro:generateFromPhonemes,streamFromPhonemes+ input validation (empty string, invalid UTF-8)ModelHostObject: exposes new methodsTextToSpeechModule:forwardFromPhonemes(),streamFromPhonemes()(sharedstreamImplhelper, no copy-paste)useTextToSpeechhook: same, with shared guard + streaming orchestrationTextToSpeechPhonemeInput,TextToSpeechStreamingPhonemeInput,TextToSpeechStreamingCallbacksUsage
Test plan
generate()andstream()still work (refactor is internal)generateFromPhonemes()with known Kokoro IPA stringsstreamFromPhonemes()produces same audio asstream()for identical phonemes