Skip to content

Add speech-to-text, text-to-speech, and ElevenLabs provider#472

Merged
mikehostetler merged 3 commits intoagentjido:mainfrom
patrickdet:feat/speech-transcription-elevenlabs
Mar 10, 2026
Merged

Add speech-to-text, text-to-speech, and ElevenLabs provider#472
mikehostetler merged 3 commits intoagentjido:mainfrom
patrickdet:feat/speech-transcription-elevenlabs

Conversation

@patrickdet
Copy link
Copy Markdown
Contributor

Summary

Adds TTS and STT to req_llm, plus an ElevenLabs provider.

Speech (ReqLLM.Speech)

Text-to-speech through the standard prepare_request(:speech, ...) pipeline. Any provider that implements the operation works.

{:ok, result} = ReqLLM.speak("openai:tts-1", "Hello world", voice: "alloy")
File.write!("hello.mp3", result.audio)

Options: voice selection, speed, output format (mp3/wav/opus/flac/aac/pcm), language hints, provider-specific stuff like OpenAI's instructions for gpt-4o-mini-tts.

Transcription (ReqLLM.Transcription)

Speech-to-text. Takes file paths, raw binary, or base64 audio. Returns text with optional segment timing.

{:ok, result} = ReqLLM.transcribe("groq:whisper-large-v3-turbo", "recording.mp3")
result.text #=> "Hello world"
result.segments #=> [%{text: "Hello world", start_second: 0.0, end_second: 1.2}]

ElevenLabs provider

Speech-only. Their API is pretty different from OpenAI's /audio/speech:

  • Voice ID in the URL path (/v1/text-to-speech/{voiceId})
  • xi-api-key header instead of Bearer auth
  • Output format as a query param, not body field
  • text/model_id instead of input/model

voice_settings (stability, similarity_boost, style, speed) go through provider_options. Auto-discovered at startup.

{:ok, result} = ReqLLM.speak(
  %{id: "eleven_multilingual_v2", provider: :elevenlabs},
  "Hello!",
  provider_options: [stability: 0.5, similarity_boost: 0.8]
)

Integration tests

Tagged :integration, excluded by default. Tested against real APIs:

  • ElevenLabs TTS: default voice, voice_settings, language codes
  • OpenAI TTS: tts-1, wav output
  • Groq STT: generate-then-transcribe pattern (OpenAI TTS makes audio, Groq whisper transcribes it) so we don't commit binary fixtures
ELEVENLABS_API_KEY=... OPENAI_API_KEY=... GROQ_API_KEY=... \
  mix test --include integration test/req_llm/integration/

Test plan

  • ElevenLabs unit tests pass (18 tests)
  • Full suite passes (2370 tests, 0 failures)
  • Integration tests pass against real ElevenLabs, OpenAI, and Groq APIs (8/8)

@patrickdet patrickdet force-pushed the feat/speech-transcription-elevenlabs branch from 8e6ea10 to 3d7830c Compare March 1, 2026 21:11
Add speech-to-text transcription (ReqLLM.Transcription) and
text-to-speech generation (ReqLLM.Speech) with provider-agnostic
pipelines that work via prepare_request(:transcription/:speech).

Add ElevenLabs as a speech-only provider with its unique API format
(voice ID in URL path, xi-api-key header, format as query param).

Integration tests verify TTS (ElevenLabs + OpenAI) and STT (Groq
whisper via generate-then-transcribe pattern) against real APIs.
Tagged :integration and excluded by default.
@patrickdet patrickdet force-pushed the feat/speech-transcription-elevenlabs branch from 3d7830c to 9ad9d93 Compare March 1, 2026 21:15
@mikehostetler
Copy link
Copy Markdown
Contributor

mikehostetler commented Mar 9, 2026

Blocking on agentjido/llm_db#152 for first-class ElevenLabs support. The provider module can work with explicit map specs, but string model specs like elevenlabs:eleven_multilingual_v2 still depend on llm_db provider/model metadata. Until that lands, ElevenLabs remains a custom-provider path rather than a normal built-in provider.

I'm actively working on this issue - so it won't be a blocker for long

@mikehostetler
Copy link
Copy Markdown
Contributor

Status update as of March 10, 2026:

  • Updated the local PR branch to use llm_db from GitHub main so we can validate against the merged ElevenLabs support before the next Hex release.
  • Added the ElevenLabs provider to this repo's local llm_db allow-filter so elevenlabs:* models resolve correctly in test/runtime config.
  • Extended the PR branch's ElevenLabs provider to support speech-to-text via POST /v1/speech-to-text in addition to the existing TTS path.
  • Updated transcription parsing to handle ElevenLabs response fields like language_code, words[].text, and multichannel transcript payloads.
  • Added unit/integration coverage for the ElevenLabs STT path.

Verification:

  • mix test test/req_llm/speech_test.exs test/req_llm_test.exs test/req_llm/providers/elevenlabs_test.exs test/req_llm/transcription_test.exs passes locally.
  • ReqLLM.model("elevenlabs:eleven_multilingual_v2") now resolves correctly against the GitHub llm_db branch.
  • Live ElevenLabs verification is still blocked in this environment by an upstream HTTP 401 / invalid_api_key response from ElevenLabs, not by the request/response wiring in req_llm.

We should keep this PR in draft until llm_db is released, since landing this against an unreleased Hex dependency would create a merge timing problem for main.

@mikehostetler mikehostetler marked this pull request as draft March 10, 2026 00:58
@mikehostetler
Copy link
Copy Markdown
Contributor

Status update as of March 10, 2026:

  • Updated this PR to use released Hex llm_db 2026.3.1 instead of the temporary GitHub main override.
  • Rebases cleanly on current main.
  • Kept the ElevenLabs speech-to-text, transcription parsing, model resolution, and coverage changes on the branch.

Verification:

  • mix test passes locally.
  • mix format --check-formatted passes locally.
  • mix test test/req_llm/speech_test.exs test/req_llm_test.exs test/req_llm/providers/elevenlabs_test.exs test/req_llm/transcription_test.exs passes locally.

Remaining limitation in this environment:

  • Live ElevenLabs STT integration was not rerun here because ELEVENLABS_API_KEY is not available locally.

This should now be ready for final review.

@mikehostetler mikehostetler marked this pull request as ready for review March 10, 2026 01:43
@mikehostetler mikehostetler merged commit 7ac5580 into agentjido:main Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants