Conversation
.../react-native-executorch/common/rnexecutorch/models/speech_to_text/common/schema/OnlineASR.h
Outdated
Show resolved
Hide resolved
packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/ASR.h
Outdated
Show resolved
Hide resolved
3ca7f15 to
ea943e4
Compare
packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/SpeechToText.h
Outdated
Show resolved
Hide resolved
7b1e6ff to
2ee6d1d
Compare
msluszniak
left a comment
There was a problem hiding this comment.
Some comments are not needed imo
packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/SpeechToText.h
Outdated
Show resolved
Hide resolved
packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/Utils.h
Outdated
Show resolved
Hide resolved
...ages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/OnlineASR.cpp
Outdated
Show resolved
Hide resolved
...ages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/OnlineASR.cpp
Show resolved
Hide resolved
...act-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/HypothesisBuffer.cpp
Outdated
Show resolved
Hide resolved
chmjkb
left a comment
There was a problem hiding this comment.
Overall solid work, thanks 👏🏻
Left a couple of nits
packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/common/schema/ASR.h
Outdated
Show resolved
Hide resolved
...act-native-executorch/common/rnexecutorch/models/speech_to_text/common/types/ProcessResult.h
Show resolved
Hide resolved
...-native-executorch/common/rnexecutorch/models/speech_to_text/common/types/GenerationResult.h
Show resolved
Hide resolved
packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/Constants.h
Outdated
Show resolved
Hide resolved
packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts
Outdated
Show resolved
Hide resolved
| this->decoder->unload(); | ||
| : callInvoker_(std::move(callInvoker)) { | ||
| // Switch between the ASR implementations based on model name | ||
| if (modelName == "whisper") { |
There was a problem hiding this comment.
food for thought: as we discussed a few days back, think about how we can make it work so that the native side doesn't need the model name, but accepts a bunch of configurable pipeline steps. no need to do this now IMO, but just a note.
Maybe we can have different ASR implementations based on whether the model does support timestamps or not?
| std::shared_ptr<OwningArrayBuffer> | ||
| SpeechToText::encode(std::span<float> waveform) const { | ||
| std::vector<float> encoderOutput = this->asr->encode(waveform); | ||
| std::vector<float> encoderOutput = transcriber_->encode(waveform); |
There was a problem hiding this comment.
I'm thinking whether we need to return std::vector from the encoder? Maybe we would just return a span. We wrap this in OwningArrayBuffer, which copies the data.
chmjkb
left a comment
There was a problem hiding this comment.
Two more things:
- I wasn't able to compile the app for Android (due to Norbert bumping minSdkVersion in RNET). You have to bump the minSdkVersion in the example app.
- Once compiled, it doesn't ask for mic permissions (im using a Pixel 10) and silently fails.
ef854bc to
d253381
Compare
Description
Various improvements & adjustments in Speech-to-Text module. The list of changes includes:
Introduces a breaking change?
Type of change
Tested on
Testing instructions
You can run the tests defined for Speech-to-Text module, as well as test it manually with the 'speech' demo app (SpeechToText screen).
Screenshots
Related issues
Checklist
Additional notes
IMPORTANT:
This PR is not yet ready to be merged - I am still working on some concrete aspects of the streaming algorithm. However, you are welcome to evaluate and review the architectural design of the code - especially the proposed solution to handle multiple different implementations of STT module.