This project captures all desktop audio from asr_sink.monitor, segments it, transcribes each segment with whisper-cli, and appends to transcript_live.txt.
Features:
- Support for Ubuntu.
- Single language transcription (en, pt-br).
- Automatic multiple language transcription input, at a higher computational cost.
- Automatic translation of transcripted text to english.
- Configurable transcript accuracy mode.
- You could manually set the audio input to be microphone or any other source.
Install system packages:
sudo apt update
sudo apt install -y ffmpeg git build-essential cmakeBuild whisper.cpp in this project folder:
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -S . -B build
cmake --build build -jDownload a model (example base.en):
cd whisper.cpp
./models/download-ggml-model.sh base.enThe script auto-copies the model into ./models/ggml-base.en.bin on first run.
If you want Portuguese or mixed EN/PT, also download a multilingual model:
cd whisper.cpp
./models/download-ggml-model.sh baseThis provides ggml-base.bin (multilingual).
Without a multilingual model, Portuguese detection/translation will not work.
The script now tries to auto-download base when multilingual mode is requested (disable with ASR_AUTO_DOWNLOAD_MODEL=0).
Find sinks:
pactl list short sinksCreate ASR capture sink:
pactl load-module module-null-sink sink_name=asr_sink
pactl set-default-sink asr_sinkForward ASR monitor to your headset (replace sink if needed):
pactl load-module module-loopback source=asr_sink.monitor sink=bluez_sink.AC_80_0A_42_11_8A.a2dp_sink latency_msec=30Quick recording test:
parec -d asr_sink.monitor --file-format=wav desktop_audio.wavStop test recording with Ctrl+C.
From project folder:
python3 live_transcript.pyLanguage options:
# Attempt mixed EN/PT automatically (recommended for bilingual audio)
python3 live_transcript.py --language-mode auto
# Force one language
python3 live_transcript.py --language-mode single --language en
python3 live_transcript.py --language-mode single --language pt
python3 live_transcript.py --language-mode single --language pt-br
# Try both languages and pick best transcript per chunk (slower)
python3 live_transcript.py --language-mode multi-pass --languages en,pt
# Keep original language text (default)
python3 live_transcript.py --translation-mode original
# Translate output to English
python3 live_transcript.py --translation-mode english
# Improve quality (slower)
python3 live_transcript.py --accuracy-mode highNotes:
autois the best "press play and forget" option for mixed English/Portuguese.multi-passis slower because it runs one transcription pass per language.translation-mode englishtranslates recognized speech to English.pt-bris accepted and normalized toptfor whisper-cli compatibility.- For better PT-BR quality, use
--accuracy-mode high.
In another terminal, see the transcription output live (optional):
tail -f transcript_live.txt
tail -f asr_consumer.logtranscript_live.txt now uses absolute local wall-clock timestamps:
[2026-02-07 18:25:10.123] recognized text...
Only the start timestamp is written (end timestamp removed).
Stop transcription with Ctrl+C in the script terminal.
Restore normal desktop audio:
pactl unload-module module-loopback
pactl unload-module module-null-sink
pactl set-default-sink @DEFAULT_SINK@- Script automatically kills stale chunk-writer
ffmpegprocesses from this project on startup. - Script uses per-run chunk folders under
chunks/run_<epoch>/. - Script auto-prunes chunk files and keeps only the latest 20 processed chunks.
- Script auto-skips old backlog chunks if transcription falls behind, so transcript stays near real time.
- If your whisper binary/model are custom:
WHISPER_BIN=/path/to/whisper-cli python3 live_transcript.pyWHISPER_MODEL=/path/to/ggml-base.en.bin python3 live_transcript.py
- No text in
transcript_live.txt:- Check
asr_consumer.logfor whisper errors. - Verify audio is flowing with:
parec -d asr_sink.monitor --file-format=wav desktop_audio.wav
- Check
- No headset sound:
- Ensure loopback sink is correct from
pactl list short sinks.
- Ensure loopback sink is correct from
- Wrong app routed:
- Use
pavucontrol-> Playback tab and ensure apps output toasr_sink.
- Use
- Portuguese not detected / always English:
- Verify model in logs is multilingual (
ggml-base.bin), notggml-base.en.bin. - Download multilingual model:
cd whisper.cpp && ./models/download-ggml-model.sh base
- Restart script.
- Verify model in logs is multilingual (