Push-to-talk speech-to-text for Linux. Press a hotkey to start recording, press it again to transcribe and type the text wherever your cursor is. No GUI, no app to keep running — just a keyboard shortcut.
- Pluggable backends — swap transcription models without changing anything else
- Works everywhere — GNOME, Sway, Hyprland, i3, X11
- ~160 lines of bash — easy to read, easy to hack on
Ships with faster-whisper by default, plus an optional Moonshine backend for CPU. Or bring your own — anything that reads a WAV and prints text works.
Note: This project is in early development — expect rough edges. If you run into issues, please open a bug.
- Linux (Wayland or X11)
- Audio recorder: ffmpeg (preferred) or PipeWire (
pw-record) - Typing tool (auto-detected, best available is used):
- socat (for server-backed transcription)
For the default backend (faster-whisper):
- NVIDIA GPU with CUDA (or use CPU mode — see Whisper backend options)
git clone https://github.com/csheaff/talktype.git
cd talktype
make installThis will:
- Install system packages (
wtype,ydotool, etc.) - Create a Python venv with
faster-whisper - Symlink
talktypeinto~/.local/bin/
Note: Only needed if you use ydotool. If you use wtype (Wayland) or xdotool (X11), skip this.
ydotool needs access to /dev/uinput. Add yourself to the input group:
sudo usermod -aG input $USER
echo 'KERNEL=="uinput", GROUP="input", MODE="0660"' | sudo tee /etc/udev/rules.d/80-uinput.rules
sudo udevadm control --reload-rules && sudo udevadm triggerThen reboot for the group change to take effect.
make modeltalktype reads ~/.config/talktype/config on startup (follows $XDG_CONFIG_HOME).
This works everywhere — GNOME shortcuts, terminals, Sway, cron — no need to set
environment variables in each context.
mkdir -p ~/.config/talktype
cat > ~/.config/talktype/config << 'EOF'
TALKTYPE_CMD="/path/to/talktype/transcribe-server transcribe"
EOFAny TALKTYPE_* variable can go in this file. Environment variables still work
and are applied after the config file, so they override it.
Set TALKTYPE_TYPE_CMD to control which typing tool is used (auto, wtype,
ydotool, xdotool, or any custom command). Default is auto, which picks
the best available tool: wtype (Wayland) → ydotool+daemon → xdotool (X11).
Bind talktype to a keyboard shortcut:
GNOME: Settings → Keyboard → Keyboard Shortcuts → Custom Shortcuts
- Name:
TalkType - Command:
talktype(or full path~/.local/bin/talktype) - Shortcut: your choice (e.g.
Super+D,F11, etc.)
Sway / Hyprland: Add to your config:
bindsym $mod+d exec talktype
- Press your shortcut → notification says "Listening..."
- Speak
- Press the shortcut again → transcribes and types the text at your cursor
Server backends auto-start on first use — the model loads once and stays in memory for fast subsequent transcriptions.
faster-whisper. Best with a GPU.
Works out of the box after make install with no config needed.
For faster repeated use, switch to server mode in your config:
# ~/.config/talktype/config
TALKTYPE_CMD="/path/to/talktype/transcribe-server transcribe"| Variable | Default | Description |
|---|---|---|
WHISPER_MODEL |
base |
tiny, base, small, medium, large-v3-turbo |
WHISPER_LANG |
en |
Language code |
WHISPER_DEVICE |
cuda |
cuda or cpu |
WHISPER_COMPUTE |
float16 |
float16 (GPU), int8 or float32 (CPU) |
Moonshine by Useful Sensors. 61.5M params, purpose-built for CPU/edge inference.
make moonshine# ~/.config/talktype/config
TALKTYPE_CMD="/path/to/talktype/backends/moonshine-server transcribe"Set MOONSHINE_MODEL=UsefulSensors/moonshine-tiny for an even smaller 27M
param model.
The server starts automatically on first transcription. You can also manage it directly:
./transcribe-server start # start manually
./transcribe-server stop # stop the serverSet TALKTYPE_CMD to any command that takes a WAV file path as its last
argument and prints text to stdout:
# ~/.config/talktype/config
TALKTYPE_CMD="/path/to/my-transcriber"Your command will be called as: $TALKTYPE_CMD /path/to/recording.wav
It should print the transcribed text to stdout and exit. That's the only contract — use whatever model, language, or runtime you want.
[hotkey] → recording starts → [hotkey] → recording stops
↓
$TALKTYPE_CMD audio.wav
↓
type_text → text appears at cursor
The talktype script is ~160 lines of bash. Transcription backends are
swappable. Server mode uses Unix sockets to keep models in memory.
MIT