typeout

CLI utility to transcribe audio or video to text using ASR (Automatic Speech Recognition).

A single self-contained script that auto-detects GPU availability and uses the appropriate backend.

Backend	Hardware	Models
OpenAI Whisper	CPU/GPU	tiny, base, small, medium, large
Distil-Whisper	CPU/GPU	distil-large-v3, distil-medium.en
Cohere Transcribe 2B	CPU/GPU	14 languages
NVIDIA NeMo ASR	GPU	Canary-1B-v2, Canary-Qwen-2.5B, Parakeet-0.6B

Input can be a local file (any format ffmpeg supports — mp3, wav, flac, mp4, mkv, webm, ...), a URL, or a YouTube video ID.

Features

Single command — auto-detects GPU and uses the right backend
Accepts audio or video in any format ffmpeg can read
Transcribe YouTube videos by URL or video ID
Multiple models: Whisper, Cohere Transcribe, NeMo ASR
Multilingual transcription (14 languages with Cohere, 25 with NeMo Canary)
Speech translation (NeMo Canary models only)
Caches downloaded audio and transcripts for instant repeat lookups
Transcript on stdout, diagnostics on stderr (pipe-friendly)

Installation

The only prerequisite is uv. The script is self-contained and manages all Python dependencies automatically on first run. A shortcut to the script can be found at https://tinyurl.com/typeout.

# Install uv
cargo install uv  # or: pip install uv

# Download typeout
curl -O https://raw.githubusercontent.com/miku/typeout/refs/heads/main/typeout

# Make executable and put somewhete into PATH
chmod +x typeout
mv typeout ~/.local/bin/

That's it. No virtualenvs, no pip install, no setup.py.

Usage

# Transcribe a local file
typeout recording.mp3
typeout lecture.mp4

# Write to file
typeout podcast.flac -o transcript.txt

# Use different Whisper model (CPU)
typeout recording.mp3 --model small

# Use Cohere Transcribe (CPU/GPU, requires Hugging Face login)
typeout recording.mp3 --model cohere-transcribe --lang en
typeout lecture.mp4 --model cohere-transcribe --lang ja

# GPU models (auto-detected on systems with NVIDIA GPU)
typeout recording.mp3 --model canary-qwen-2.5b
typeout lecture.mp4 --model parakeet-0.6b

# Use Whisper on GPU (fp16 acceleration)
typeout recording.mp3 --model large

# Multilingual: set source language
typeout interview.wav --lang de

# Translation: German audio to English text (NeMo Canary only)
typeout interview.wav --lang de --target-lang en

# From a URL or YouTube ID
typeout "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
typeout dQw4w9WgXcQ

# Check external tools
typeout --check

# Clear cache
typeout --clear-cache

# List models
typeout --list-models

Models

Model	Size	Languages	Notes
`base` · CPU default	~140MB	multilingual	Whisper, good balance
`tiny`	~40MB	multilingual	Whisper, fastest
`small`	~460MB	multilingual	Whisper
`medium`	~1.5GB	multilingual	Whisper
`large`	~2.9GB	multilingual	Whisper, highest accuracy
`distil-large-v3`	~750MB	multilingual	Distil-Whisper, 6x faster than large
`distil-medium.en`	~400MB	English only	Distil-Whisper, fast
`cohere-transcribe`	~4.1GB	14 languages	Cohere, high accuracy, requires HF login
`canary-1b-v2` · GPU default	~6.4GB	25 languages	NVIDIA only, NeMo, multilingual, translation
`canary-qwen-2.5b`	~5.1GB	multilingual	NVIDIA only, NeMo, highest quality, SLM
`parakeet-0.6b`	~2.5GB	English only	NVIDIA only, NeMo, fast and lightweight

Cohere Transcribe setup (gated model):

# 1. Accept terms at: https://huggingface.co/CohereLabs/cohere-transcribe-03-2026
# 2. Login to Hugging Face
huggingface-cli login

How it works

The typeout script is an amalgamation — it contains both CPU and GPU Python scripts embedded within it. On first run:

Detects if NVIDIA GPU is available (nvidia-smi)
Extracts the appropriate Python script to ~/.cache/typeout/
Runs it with uv run, which installs dependencies automatically

Subsequent runs reuse the extracted script (unless you upgrade typeout).

Caching

Downloaded audio (for URLs) and transcripts are cached in ~/.cache/typeout/ (respects $XDG_CACHE_HOME).

URLs: keyed by URL — same URL hits cache instantly
Local files: keyed by path + modification time + size — cache invalidates on edit
Transcripts: keyed by source + model + language — different models/languages get separate entries

Use --no-cache to bypass, --clear-cache to remove all cached data.

Dependencies

uv — runs the script and manages Python dependencies
ffmpeg — audio extraction and normalization (check with --check)
nvidia-smi — GPU detection (auto-detected)
huggingface-cli login — required for Cohere Transcribe (gated model)

Models are cached in ~/.local/share/typeout/ (respects $XDG_DATA_HOME).

Examples

$ typeout --lang de \
    https://swr-pd.ard-mcdn.de/swr/swrkultur/hoerspiel/ard-hoerspiel-speicher/2303264.mp3

Transcribing a 7h+ audio book [UBIK] takes about 14 minutes on a 70W RTX 4000 SFF with canary-1b-v2.

$ typeout https://www.youtube.com/watch?v=P1qMKFMrpro # UBIK audiobook

Image

From: https://etc.usf.edu/clipart/

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
extra		extra
notes		notes
static		static
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.sh		build.sh
typeout		typeout
typeout-cpu.py		typeout-cpu.py
typeout-gpu.py		typeout-gpu.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

typeout

Features

Installation

Usage

Models

How it works

Caching

Dependencies

Examples

Image

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

typeout

Features

Installation

Usage

Models

How it works

Caching

Dependencies

Examples

Image

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages