Multilingual Voice Understanding Model
-
Updated
Dec 30, 2025 - Python
Multilingual Voice Understanding Model
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singing ASR. FireRedVAD supports speech/singing/music in 100+ langs. FireRedLID supports 100+ langs and 20+ zh dialects. FireRedPunc supports zh and en.
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD
Tensorflow2 implementation of Data-driven Harmonic Filters for Audio Representation Learning
A Few Shot learning technique, called Relation Networks, for classification of audio events
A machine learning task to classify audio events
Code for the paper "Deep Learning Solutions for Audio Event Detection in a Swine Barn Using Environmental Audio and Weak Labels".
Provide accurate voice activity and audio event detection in 100+ languages with high-performance streaming and non-streaming capabilities.
Coach structured answers in real time during mock interviews with question detection, feedback, filler tracking, and live transcription.
Add a description, image, and links to the audio-event-classification topic page so that developers can more easily learn about it.
To associate your repository with the audio-event-classification topic, visit your repo's landing page and select "manage topics."