Feature Request
Description
Add audio, speech, and music generation adapters to the @tanstack/ai-fal package. The fal adapter currently supports image and video generation, but fal's platform also offers 600+ models including audio modalities:
- Text-to-Speech (e.g.,
fal-ai/kokoro - multi-language TTS)
- Text-to-Music (e.g.,
fal-ai/diffrhythm - music generation from prompts/lyrics)
- Text-to-Sound Effects (e.g., sound effect generation from descriptions)
- Speech-to-Text (e.g.,
fal-ai/whisper, fal-ai/wizper - transcription)
Motivation
TanStack AI's fal adapter (@tanstack/ai-fal) currently implements falImage and falVideo adapters following the tree-shakeable adapter pattern. Adding audio/speech/music adapters would complete fal's media generation coverage and align with TanStack AI's goal of being a comprehensive, provider-agnostic AI SDK.
fal's audio models support:
- Multi-language text-to-speech with multiple voices
- Music generation from text prompts, lyrics, and reference audio
- Sound effect generation from descriptions
- Speech transcription and translation
Proposed API
Following the existing adapter pattern:
import { falAudio, falSpeech, falMusic } from '@tanstack/ai-fal/adapters'
// Text-to-Speech
const speechAdapter = falSpeech('fal-ai/kokoro')
// Text-to-Music
const musicAdapter = falMusic('fal-ai/diffrhythm')
// Text-to-Sound Effects
const soundAdapter = falAudio('fal-ai/sound-effects')
Additional Context
- Existing adapters use
fal.subscribe() (image) and fal.queue (video) patterns
- Audio generation may use either pattern depending on model latency
- The fal SDK (
@fal-ai/client) already supports audio responses with File/Audio output types
- Model metadata types in
model-meta.ts would need to be extended for audio models