How to handle large audio files with Whisper API in openai-python without hitting memory limits? #2547
-
|
Hi, |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
|
Think I need anything from the center |
Beta Was this translation helpful? Give feedback.
-
|
If you’re running into memory or timeout issues with Whisper API when transcribing long recordings (1+ hours), it’s best to avoid sending the entire file in a single request. Large uploads increase both memory usage and the chance of hitting request timeouts. |
Beta Was this translation helpful? Give feedback.
-
|
The Whisper API has a 25MB file size limit per request. For long recordings (1+ hour), you need to split the audio into chunks before sending. Here's a reliable approach using from pydub import AudioSegment
from openai import OpenAI
import io
client = OpenAI()
def transcribe_large_file(file_path: str, chunk_length_ms: int = 10 * 60 * 1000) -> str:
"""Transcribe a large audio file by splitting into chunks.
Args:
file_path: Path to the audio file
chunk_length_ms: Chunk size in milliseconds (default 10 minutes)
"""
audio = AudioSegment.from_file(file_path)
full_transcript = []
for i in range(0, len(audio), chunk_length_ms):
chunk = audio[i:i + chunk_length_ms]
# Export chunk to buffer (avoids writing temp files to disk)
buf = io.BytesIO()
chunk.export(buf, format="mp3", bitrate="64k") # compress to stay under 25MB
buf.seek(0)
buf.name = "chunk.mp3"
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=buf,
response_format="text",
)
full_transcript.append(transcript)
print(f"Transcribed chunk {i // chunk_length_ms + 1}")
return " ".join(full_transcript)
result = transcribe_large_file("long_recording.wav")A few things to keep in mind:
# Split a file into 10-minute chunks without loading into memory
ffmpeg -i long_recording.wav -f segment -segment_time 600 -c:a libmp3lame -b:a 64k chunk_%03d.mp3Then iterate over the chunk files and transcribe each one. |
Beta Was this translation helpful? Give feedback.
If you’re running into memory or timeout issues with Whisper API when transcribing long recordings (1+ hours), it’s best to avoid sending the entire file in a single request. Large uploads increase both memory usage and the chance of hitting request timeouts.