Live Desktop Audio Transcription

This project captures all desktop audio from asr_sink.monitor, segments it, transcribes each segment with whisper-cli, and appends to transcript_live.txt.

Features:

Support for Ubuntu.
Single language transcription (en, pt-br).
Automatic multiple language transcription input, at a higher computational cost.
Automatic translation of transcripted text to english.
Configurable transcript accuracy mode.
You could manually set the audio input to be microphone or any other source.

1) Prerequisites

Install system packages:

sudo apt update
sudo apt install -y ffmpeg git build-essential cmake

Build whisper.cpp in this project folder:

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -S . -B build
cmake --build build -j

Download a model (example base.en):

cd whisper.cpp
./models/download-ggml-model.sh base.en

The script auto-copies the model into ./models/ggml-base.en.bin on first run.

If you want Portuguese or mixed EN/PT, also download a multilingual model:

cd whisper.cpp
./models/download-ggml-model.sh base

This provides ggml-base.bin (multilingual). Without a multilingual model, Portuguese detection/translation will not work. The script now tries to auto-download base when multilingual mode is requested (disable with ASR_AUTO_DOWNLOAD_MODEL=0).

2) Audio Routing Setup (Desktop Audio -> ASR + Headset)

Find sinks:

pactl list short sinks

Create ASR capture sink:

pactl load-module module-null-sink sink_name=asr_sink
pactl set-default-sink asr_sink

Forward ASR monitor to your headset (replace sink if needed):

pactl load-module module-loopback source=asr_sink.monitor sink=bluez_sink.AC_80_0A_42_11_8A.a2dp_sink latency_msec=30

Quick recording test:

parec -d asr_sink.monitor --file-format=wav desktop_audio.wav

Stop test recording with Ctrl+C.

3) Run Live Transcription

From project folder:

python3 live_transcript.py

Language options:

# Attempt mixed EN/PT automatically (recommended for bilingual audio)
python3 live_transcript.py --language-mode auto

# Force one language
python3 live_transcript.py --language-mode single --language en
python3 live_transcript.py --language-mode single --language pt
python3 live_transcript.py --language-mode single --language pt-br

# Try both languages and pick best transcript per chunk (slower)
python3 live_transcript.py --language-mode multi-pass --languages en,pt

# Keep original language text (default)
python3 live_transcript.py --translation-mode original

# Translate output to English
python3 live_transcript.py --translation-mode english

# Improve quality (slower)
python3 live_transcript.py --accuracy-mode high

Notes:

auto is the best "press play and forget" option for mixed English/Portuguese.
multi-pass is slower because it runs one transcription pass per language.
translation-mode english translates recognized speech to English.
pt-br is accepted and normalized to pt for whisper-cli compatibility.
For better PT-BR quality, use --accuracy-mode high.

In another terminal, see the transcription output live (optional):

tail -f transcript_live.txt
tail -f asr_consumer.log

4) Absolute Timestamps

transcript_live.txt now uses absolute local wall-clock timestamps:

[2026-02-07 18:25:10.123] recognized text...

Only the start timestamp is written (end timestamp removed).

5) Stop and Cleanup

Stop transcription with Ctrl+C in the script terminal.

Restore normal desktop audio:

pactl unload-module module-loopback
pactl unload-module module-null-sink
pactl set-default-sink @DEFAULT_SINK@

6) Notes

Script automatically kills stale chunk-writer ffmpeg processes from this project on startup.
Script uses per-run chunk folders under chunks/run_<epoch>/.
Script auto-prunes chunk files and keeps only the latest 20 processed chunks.
Script auto-skips old backlog chunks if transcription falls behind, so transcript stays near real time.
If your whisper binary/model are custom:
- WHISPER_BIN=/path/to/whisper-cli python3 live_transcript.py
- WHISPER_MODEL=/path/to/ggml-base.en.bin python3 live_transcript.py

7) Troubleshooting

No text in transcript_live.txt:
- Check asr_consumer.log for whisper errors.
- Verify audio is flowing with:
  - parec -d asr_sink.monitor --file-format=wav desktop_audio.wav
No headset sound:
- Ensure loopback sink is correct from pactl list short sinks.
Wrong app routed:
- Use pavucontrol -> Playback tab and ensure apps output to asr_sink.
Portuguese not detected / always English:
- Verify model in logs is multilingual (ggml-base.bin), not ggml-base.en.bin.
- Download multilingual model:
  - cd whisper.cpp && ./models/download-ggml-model.sh base
- Restart script.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
chunks		chunks
models		models
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
live_transcript.py		live_transcript.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Live Desktop Audio Transcription

1) Prerequisites

2) Audio Routing Setup (Desktop Audio -> ASR + Headset)

3) Run Live Transcription

4) Absolute Timestamps

5) Stop and Cleanup

6) Notes

7) Troubleshooting

About

Uh oh!

Releases

Packages

Languages

natobritto/live-transcript

Folders and files

Latest commit

History

Repository files navigation

Live Desktop Audio Transcription

1) Prerequisites

2) Audio Routing Setup (Desktop Audio -> ASR + Headset)

3) Run Live Transcription

4) Absolute Timestamps

5) Stop and Cleanup

6) Notes

7) Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages