Skip to content

natobritto/live-transcript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Live Desktop Audio Transcription

This project captures all desktop audio from asr_sink.monitor, segments it, transcribes each segment with whisper-cli, and appends to transcript_live.txt.

Features:

  • Support for Ubuntu.
  • Single language transcription (en, pt-br).
  • Automatic multiple language transcription input, at a higher computational cost.
  • Automatic translation of transcripted text to english.
  • Configurable transcript accuracy mode.
  • You could manually set the audio input to be microphone or any other source.

1) Prerequisites

Install system packages:

sudo apt update
sudo apt install -y ffmpeg git build-essential cmake

Build whisper.cpp in this project folder:

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -S . -B build
cmake --build build -j

Download a model (example base.en):

cd whisper.cpp
./models/download-ggml-model.sh base.en

The script auto-copies the model into ./models/ggml-base.en.bin on first run.

If you want Portuguese or mixed EN/PT, also download a multilingual model:

cd whisper.cpp
./models/download-ggml-model.sh base

This provides ggml-base.bin (multilingual). Without a multilingual model, Portuguese detection/translation will not work. The script now tries to auto-download base when multilingual mode is requested (disable with ASR_AUTO_DOWNLOAD_MODEL=0).

2) Audio Routing Setup (Desktop Audio -> ASR + Headset)

Find sinks:

pactl list short sinks

Create ASR capture sink:

pactl load-module module-null-sink sink_name=asr_sink
pactl set-default-sink asr_sink

Forward ASR monitor to your headset (replace sink if needed):

pactl load-module module-loopback source=asr_sink.monitor sink=bluez_sink.AC_80_0A_42_11_8A.a2dp_sink latency_msec=30

Quick recording test:

parec -d asr_sink.monitor --file-format=wav desktop_audio.wav

Stop test recording with Ctrl+C.

3) Run Live Transcription

From project folder:

python3 live_transcript.py

Language options:

# Attempt mixed EN/PT automatically (recommended for bilingual audio)
python3 live_transcript.py --language-mode auto

# Force one language
python3 live_transcript.py --language-mode single --language en
python3 live_transcript.py --language-mode single --language pt
python3 live_transcript.py --language-mode single --language pt-br

# Try both languages and pick best transcript per chunk (slower)
python3 live_transcript.py --language-mode multi-pass --languages en,pt

# Keep original language text (default)
python3 live_transcript.py --translation-mode original

# Translate output to English
python3 live_transcript.py --translation-mode english

# Improve quality (slower)
python3 live_transcript.py --accuracy-mode high

Notes:

  • auto is the best "press play and forget" option for mixed English/Portuguese.
  • multi-pass is slower because it runs one transcription pass per language.
  • translation-mode english translates recognized speech to English.
  • pt-br is accepted and normalized to pt for whisper-cli compatibility.
  • For better PT-BR quality, use --accuracy-mode high.

In another terminal, see the transcription output live (optional):

tail -f transcript_live.txt
tail -f asr_consumer.log

4) Absolute Timestamps

transcript_live.txt now uses absolute local wall-clock timestamps:

[2026-02-07 18:25:10.123] recognized text...

Only the start timestamp is written (end timestamp removed).

5) Stop and Cleanup

Stop transcription with Ctrl+C in the script terminal.

Restore normal desktop audio:

pactl unload-module module-loopback
pactl unload-module module-null-sink
pactl set-default-sink @DEFAULT_SINK@

6) Notes

  • Script automatically kills stale chunk-writer ffmpeg processes from this project on startup.
  • Script uses per-run chunk folders under chunks/run_<epoch>/.
  • Script auto-prunes chunk files and keeps only the latest 20 processed chunks.
  • Script auto-skips old backlog chunks if transcription falls behind, so transcript stays near real time.
  • If your whisper binary/model are custom:
    • WHISPER_BIN=/path/to/whisper-cli python3 live_transcript.py
    • WHISPER_MODEL=/path/to/ggml-base.en.bin python3 live_transcript.py

7) Troubleshooting

  • No text in transcript_live.txt:
    • Check asr_consumer.log for whisper errors.
    • Verify audio is flowing with:
      • parec -d asr_sink.monitor --file-format=wav desktop_audio.wav
  • No headset sound:
    • Ensure loopback sink is correct from pactl list short sinks.
  • Wrong app routed:
    • Use pavucontrol -> Playback tab and ensure apps output to asr_sink.
  • Portuguese not detected / always English:
    • Verify model in logs is multilingual (ggml-base.bin), not ggml-base.en.bin.
    • Download multilingual model:
      • cd whisper.cpp && ./models/download-ggml-model.sh base
    • Restart script.

About

Transcribe desktop audio live in multiple languages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published