talktype

Push-to-talk speech-to-text for Linux. Press a hotkey to start recording, press it again to transcribe and type the text wherever your cursor is. No GUI, no app to keep running — just a keyboard shortcut.

Pluggable backends — swap transcription models without changing anything else
Works everywhere — GNOME, Sway, Hyprland, i3, X11
~160 lines of bash — easy to read, easy to hack on

Ships with faster-whisper by default, plus an optional Moonshine backend for CPU. Or bring your own — anything that reads a WAV and prints text works.

Note: This project is in early development — expect rough edges. If you run into issues, please open a bug.

Requirements

Linux (Wayland or X11)
Audio recorder: ffmpeg (preferred) or PipeWire (pw-record)
Typing tool (auto-detected, best available is used):
- wtype — Wayland (Sway, Hyprland; not GNOME)
- ydotool + ydotoold — Wayland & X11 (preferred with daemon)
- xdotool — X11 only (not Wayland)
- ydotool without daemon — last resort, with warning
socat (for server-backed transcription)

For the default backend (faster-whisper):

NVIDIA GPU with CUDA (or use CPU mode — see Whisper backend options)

Install

git clone https://github.com/csheaff/talktype.git
cd talktype
make install

This will:

Install system packages (wtype, ydotool, etc.)
Create a Python venv with faster-whisper
Symlink talktype into ~/.local/bin/

ydotool permissions

Note: Only needed if you use ydotool. If you use wtype (Wayland) or xdotool (X11), skip this.

ydotool needs access to /dev/uinput. Add yourself to the input group:

sudo usermod -aG input $USER
echo 'KERNEL=="uinput", GROUP="input", MODE="0660"' | sudo tee /etc/udev/rules.d/80-uinput.rules
sudo udevadm control --reload-rules && sudo udevadm trigger

Then reboot for the group change to take effect.

Pre-download model (optional)

make model

Configuration

talktype reads ~/.config/talktype/config on startup (follows $XDG_CONFIG_HOME). This works everywhere — GNOME shortcuts, terminals, Sway, cron — no need to set environment variables in each context.

mkdir -p ~/.config/talktype
cat > ~/.config/talktype/config << 'EOF'
TALKTYPE_CMD="/path/to/talktype/transcribe-server transcribe"
EOF

Any TALKTYPE_* variable can go in this file. Environment variables still work and are applied after the config file, so they override it.

Set TALKTYPE_TYPE_CMD to control which typing tool is used (auto, wtype, ydotool, xdotool, or any custom command). Default is auto, which picks the best available tool: wtype (Wayland) → ydotool+daemon → xdotool (X11).

Setup

Bind talktype to a keyboard shortcut:

GNOME: Settings → Keyboard → Keyboard Shortcuts → Custom Shortcuts

Name: TalkType
Command: talktype (or full path ~/.local/bin/talktype)
Shortcut: your choice (e.g. Super+D, F11, etc.)

Sway / Hyprland: Add to your config:

bindsym $mod+d exec talktype

Usage

Press your shortcut → notification says "Listening..."
Speak
Press the shortcut again → transcribes and types the text at your cursor

Backends

Server backends auto-start on first use — the model loads once and stays in memory for fast subsequent transcriptions.

Whisper (default)

faster-whisper. Best with a GPU. Works out of the box after make install with no config needed.

For faster repeated use, switch to server mode in your config:

# ~/.config/talktype/config
TALKTYPE_CMD="/path/to/talktype/transcribe-server transcribe"

Variable	Default	Description
`WHISPER_MODEL`	`base`	`tiny`, `base`, `small`, `medium`, `large-v3-turbo`
`WHISPER_LANG`	`en`	Language code
`WHISPER_DEVICE`	`cuda`	`cuda` or `cpu`
`WHISPER_COMPUTE`	`float16`	`float16` (GPU), `int8` or `float32` (CPU)

Moonshine (CPU, lightweight)

Moonshine by Useful Sensors. 61.5M params, purpose-built for CPU/edge inference.

make moonshine

# ~/.config/talktype/config
TALKTYPE_CMD="/path/to/talktype/backends/moonshine-server transcribe"

Set MOONSHINE_MODEL=UsefulSensors/moonshine-tiny for an even smaller 27M param model.

Server management

The server starts automatically on first transcription. You can also manage it directly:

./transcribe-server start   # start manually
./transcribe-server stop    # stop the server

Custom backends

Set TALKTYPE_CMD to any command that takes a WAV file path as its last argument and prints text to stdout:

# ~/.config/talktype/config
TALKTYPE_CMD="/path/to/my-transcriber"

Your command will be called as: $TALKTYPE_CMD /path/to/recording.wav

It should print the transcribed text to stdout and exit. That's the only contract — use whatever model, language, or runtime you want.

How it works

[hotkey] → recording starts → [hotkey] → recording stops
                                            ↓
                                     $TALKTYPE_CMD audio.wav
                                            ↓
                                     type_text → text appears at cursor

The talktype script is ~160 lines of bash. Transcription backends are swappable. Server mode uses Unix sockets to keep models in memory.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
backends		backends
test		test
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
talktype		talktype
transcribe		transcribe
transcribe-server		transcribe-server
whisper-daemon.py		whisper-daemon.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

talktype

Requirements

Install

ydotool permissions

Pre-download model (optional)

Configuration

Setup

Usage

Backends

Whisper (default)

Moonshine (CPU, lightweight)

Server management

Custom backends

How it works

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

csheaff/talktype

Folders and files

Latest commit

History

Repository files navigation

talktype

Requirements

Install

ydotool permissions

Pre-download model (optional)

Configuration

Setup

Usage

Backends

Whisper (default)

Moonshine (CPU, lightweight)

Server management

Custom backends

How it works

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages