fix: speech to text live transcription by IgorSwat · Pull Request #816 · software-mansion/react-native-executorch

IgorSwat · 2026-02-17T14:26:43Z

Description

Various improvements & adjustments in Speech-to-Text module. The list of changes includes:

Adjusting native implementation to the new format of Whisper models (single file, bundled encode & decode methods)
Refactoring native implementation in order to support multiple STT models in the future
(IN PROGRESS): Fixing an impropriate behavior of Whisper streaming

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

You can run the tests defined for Speech-to-Text module, as well as test it manually with the 'speech' demo app (SpeechToText screen).

Screenshots

Related issues

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

IMPORTANT:
This PR is not yet ready to be merged - I am still working on some concrete aspects of the streaming algorithm. However, you are welcome to evaluate and review the architectural design of the code - especially the proposed solution to handle multiple different implementations of STT module.

.../react-native-executorch/common/rnexecutorch/models/speech_to_text/common/schema/OnlineASR.h

packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/ASR.h

packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/SpeechToText.h

packages/react-native-executorch/src/constants/modelUrls.ts

msluszniak

Some comments are not needed imo

packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/SpeechToText.h

packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/Utils.h

...ages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/OnlineASR.cpp

...act-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/HypothesisBuffer.cpp

packages/react-native-executorch/common/rnexecutorch/models/BaseModel.h

chmjkb

Overall solid work, thanks 👏🏻
Left a couple of nits

packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/common/schema/ASR.h

...act-native-executorch/common/rnexecutorch/models/speech_to_text/common/types/ProcessResult.h

...-native-executorch/common/rnexecutorch/models/speech_to_text/common/types/GenerationResult.h

packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/Constants.h

packages/react-native-executorch/src/constants/modelUrls.ts

packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts

chmjkb · 2026-03-04T17:52:12Z

packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/SpeechToText.cpp

-  this->decoder->unload();
+    : callInvoker_(std::move(callInvoker)) {
+  // Switch between the ASR implementations based on model name
+  if (modelName == "whisper") {


food for thought: as we discussed a few days back, think about how we can make it work so that the native side doesn't need the model name, but accepts a bunch of configurable pipeline steps. no need to do this now IMO, but just a note.

Maybe we can have different ASR implementations based on whether the model does support timestamps or not?

chmjkb · 2026-03-04T17:54:51Z

packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/SpeechToText.cpp

 std::shared_ptr<OwningArrayBuffer>
 SpeechToText::encode(std::span<float> waveform) const {
-  std::vector<float> encoderOutput = this->asr->encode(waveform);
+  std::vector<float> encoderOutput = transcriber_->encode(waveform);


I'm thinking whether we need to return std::vector from the encoder? Maybe we would just return a span. We wrap this in OwningArrayBuffer, which copies the data.

chmjkb

Two more things:

I wasn't able to compile the app for Android (due to Norbert bumping minSdkVersion in RNET). You have to bump the minSdkVersion in the example app.
Once compiled, it doesn't ask for mic permissions (im using a Pixel 10) and silently fails.

Add whisper kv-cache & fix demo app permissions

bab2ffb

IgorSwat requested review from chmjkb and msluszniak February 17, 2026 14:26

Refactor STT native implementation

ea943e4

msluszniak reviewed Feb 17, 2026

View reviewed changes

.../react-native-executorch/common/rnexecutorch/models/speech_to_text/common/schema/OnlineASR.h Outdated Show resolved Hide resolved

packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/ASR.h Outdated Show resolved Hide resolved

IgorSwat force-pushed the @is/speech-to-text branch from 3ca7f15 to ea943e4 Compare February 17, 2026 14:56

Fix infinite streaming in demo app

b54e469

msluszniak reviewed Feb 18, 2026

View reviewed changes

packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/SpeechToText.h Outdated Show resolved Hide resolved

packages/react-native-executorch/src/constants/modelUrls.ts Outdated Show resolved Hide resolved

IgorSwat added 2 commits February 19, 2026 12:05

Various STT streaming fixes

ce5a39a

Add timestamp fix algorithm & other

278985c

msluszniak assigned IgorSwat Feb 20, 2026

msluszniak added the bug fix PRs that are fixing bugs label Feb 20, 2026

msluszniak linked an issue Feb 20, 2026 that may be closed by this pull request

Fix Speech to Text streaming mode #741

Open

msluszniak changed the title ~~@is/speech to text~~ fix: speech to text live transcription Feb 20, 2026

IgorSwat added 3 commits February 20, 2026 18:26

Fix punctation comparision issue

2a37867

Final timestamp fix: silence estimation

f42351b

Remove special tokens

2ee6d1d

IgorSwat force-pushed the @is/speech-to-text branch from 7b1e6ff to 2ee6d1d Compare March 2, 2026 09:21

IgorSwat added 2 commits March 2, 2026 12:34

Add pause to streaming mode

915c8e7

Apply review suggestions

7029184

msluszniak reviewed Mar 2, 2026

View reviewed changes

Set up url's

7aac36d

msluszniak reviewed Mar 3, 2026

View reviewed changes

packages/react-native-executorch/common/rnexecutorch/models/BaseModel.h Show resolved Hide resolved

Final fixes

6e84c3d

chmjkb requested changes Mar 4, 2026

View reviewed changes

chmjkb requested changes Mar 5, 2026

View reviewed changes

Apply review suggestions

d253381

IgorSwat force-pushed the @is/speech-to-text branch from ef854bc to d253381 Compare March 5, 2026 11:25

Enable multilingual models

9041e0a

Conversation

IgorSwat commented Feb 17, 2026

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msluszniak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chmjkb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chmjkb Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

chmjkb Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

chmjkb left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants