Add speech-to-text, text-to-speech, and ElevenLabs provider by patrickdet · Pull Request #472 · agentjido/req_llm

patrickdet · 2026-03-01T21:06:20Z

Summary

Adds TTS and STT to req_llm, plus an ElevenLabs provider.

Speech (`ReqLLM.Speech`)

Text-to-speech through the standard prepare_request(:speech, ...) pipeline. Any provider that implements the operation works.

{:ok, result} = ReqLLM.speak("openai:tts-1", "Hello world", voice: "alloy")
File.write!("hello.mp3", result.audio)

Options: voice selection, speed, output format (mp3/wav/opus/flac/aac/pcm), language hints, provider-specific stuff like OpenAI's instructions for gpt-4o-mini-tts.

Transcription (`ReqLLM.Transcription`)

Speech-to-text. Takes file paths, raw binary, or base64 audio. Returns text with optional segment timing.

{:ok, result} = ReqLLM.transcribe("groq:whisper-large-v3-turbo", "recording.mp3")
result.text #=> "Hello world"
result.segments #=> [%{text: "Hello world", start_second: 0.0, end_second: 1.2}]

ElevenLabs provider

Speech-only. Their API is pretty different from OpenAI's /audio/speech:

Voice ID in the URL path (/v1/text-to-speech/{voiceId})
xi-api-key header instead of Bearer auth
Output format as a query param, not body field
text/model_id instead of input/model

voice_settings (stability, similarity_boost, style, speed) go through provider_options. Auto-discovered at startup.

{:ok, result} = ReqLLM.speak(
  %{id: "eleven_multilingual_v2", provider: :elevenlabs},
  "Hello!",
  provider_options: [stability: 0.5, similarity_boost: 0.8]
)

Integration tests

Tagged :integration, excluded by default. Tested against real APIs:

ElevenLabs TTS: default voice, voice_settings, language codes
OpenAI TTS: tts-1, wav output
Groq STT: generate-then-transcribe pattern (OpenAI TTS makes audio, Groq whisper transcribes it) so we don't commit binary fixtures

ELEVENLABS_API_KEY=... OPENAI_API_KEY=... GROQ_API_KEY=... \
  mix test --include integration test/req_llm/integration/

Test plan

ElevenLabs unit tests pass (18 tests)
Full suite passes (2370 tests, 0 failures)
Integration tests pass against real ElevenLabs, OpenAI, and Groq APIs (8/8)

Add speech-to-text transcription (ReqLLM.Transcription) and text-to-speech generation (ReqLLM.Speech) with provider-agnostic pipelines that work via prepare_request(:transcription/:speech). Add ElevenLabs as a speech-only provider with its unique API format (voice ID in URL path, xi-api-key header, format as query param). Integration tests verify TTS (ElevenLabs + OpenAI) and STT (Groq whisper via generate-then-transcribe pattern) against real APIs. Tagged :integration and excluded by default.

mikehostetler · 2026-03-09T22:32:46Z

Blocking on agentjido/llm_db#152 for first-class ElevenLabs support. The provider module can work with explicit map specs, but string model specs like elevenlabs:eleven_multilingual_v2 still depend on llm_db provider/model metadata. Until that lands, ElevenLabs remains a custom-provider path rather than a normal built-in provider.

I'm actively working on this issue - so it won't be a blocker for long

mikehostetler · 2026-03-10T00:58:21Z

Status update as of March 10, 2026:

Updated the local PR branch to use llm_db from GitHub main so we can validate against the merged ElevenLabs support before the next Hex release.
Added the ElevenLabs provider to this repo's local llm_db allow-filter so elevenlabs:* models resolve correctly in test/runtime config.
Extended the PR branch's ElevenLabs provider to support speech-to-text via POST /v1/speech-to-text in addition to the existing TTS path.
Updated transcription parsing to handle ElevenLabs response fields like language_code, words[].text, and multichannel transcript payloads.
Added unit/integration coverage for the ElevenLabs STT path.

Verification:

mix test test/req_llm/speech_test.exs test/req_llm_test.exs test/req_llm/providers/elevenlabs_test.exs test/req_llm/transcription_test.exs passes locally.
ReqLLM.model("elevenlabs:eleven_multilingual_v2") now resolves correctly against the GitHub llm_db branch.
Live ElevenLabs verification is still blocked in this environment by an upstream HTTP 401 / invalid_api_key response from ElevenLabs, not by the request/response wiring in req_llm.

We should keep this PR in draft until llm_db is released, since landing this against an unreleased Hex dependency would create a merge timing problem for main.

mikehostetler · 2026-03-10T01:43:51Z

Status update as of March 10, 2026:

Updated this PR to use released Hex llm_db 2026.3.1 instead of the temporary GitHub main override.
Rebases cleanly on current main.
Kept the ElevenLabs speech-to-text, transcription parsing, model resolution, and coverage changes on the branch.

Verification:

mix test passes locally.
mix format --check-formatted passes locally.
mix test test/req_llm/speech_test.exs test/req_llm_test.exs test/req_llm/providers/elevenlabs_test.exs test/req_llm/transcription_test.exs passes locally.

Remaining limitation in this environment:

Live ElevenLabs STT integration was not rerun here because ELEVENLABS_API_KEY is not available locally.

This should now be ready for final review.

patrickdet force-pushed the feat/speech-transcription-elevenlabs branch from 8e6ea10 to 3d7830c Compare March 1, 2026 21:11

patrickdet force-pushed the feat/speech-transcription-elevenlabs branch from 3d7830c to 9ad9d93 Compare March 1, 2026 21:15

mikehostetler marked this pull request as draft March 10, 2026 00:58

mikehostetler added 2 commits March 9, 2026 20:36

Merge branch 'main' into feat/speech-transcription-elevenlabs

f8664f6

feat: finalize elevenlabs transcription support

1518bb9

mikehostetler marked this pull request as ready for review March 10, 2026 01:43

mikehostetler merged commit 7ac5580 into agentjido:main Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add speech-to-text, text-to-speech, and ElevenLabs provider#472

Add speech-to-text, text-to-speech, and ElevenLabs provider#472
mikehostetler merged 3 commits intoagentjido:mainfrom
patrickdet:feat/speech-transcription-elevenlabs

patrickdet commented Mar 1, 2026

Uh oh!

mikehostetler commented Mar 9, 2026 •

edited

Loading

Uh oh!

mikehostetler commented Mar 10, 2026

Uh oh!

mikehostetler commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

patrickdet commented Mar 1, 2026

Summary

Speech (ReqLLM.Speech)

Transcription (ReqLLM.Transcription)

ElevenLabs provider

Integration tests

Test plan

Uh oh!

mikehostetler commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikehostetler commented Mar 10, 2026

Uh oh!

mikehostetler commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Speech (`ReqLLM.Speech`)

Transcription (`ReqLLM.Transcription`)

mikehostetler commented Mar 9, 2026 •

edited

Loading