Improved Speech2IPA Streaming

Hey guys, I tested the /stream endpoint with websockets and got pretty bad results compared to regular transcription, my guess is that the endpoint is breaking the audio in arbitrary pieces which breaks the context of the model for that word or phrase.

It seems to me that one way to make this works could be to detect silence either on the client or the server and slice the audio there, so each piece of audio is determined by the user silences. 

Just wanted to know what you think, have a good day :) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved Speech2IPA Streaming #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improved Speech2IPA Streaming #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions