Skip to content

Add live audio transcription streaming support to Foundry Local C# SDK#485

Open
rui-ren wants to merge 9 commits intomainfrom
ruiren/audio-streaming-support-sdk
Open

Add live audio transcription streaming support to Foundry Local C# SDK#485
rui-ren wants to merge 9 commits intomainfrom
ruiren/audio-streaming-support-sdk

Conversation

@rui-ren
Copy link

@rui-ren rui-ren commented Mar 5, 2026

Here's the updated PR description based on the latest changes (renamed types, CoreInterop routing fix, mermaid updates):


Title: Add live audio transcription streaming support to Foundry Local C# SDK

Description:

Adds real-time audio streaming support to the Foundry Local C# SDK, enabling live microphone-to-text transcription via ONNX Runtime GenAI's StreamingProcessor API (Nemotron ASR).

The existing OpenAIAudioClient only supports file-based transcription. This PR introduces LiveAudioTranscriptionSession that accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as an async stream.

What's included

New files

  • src/OpenAI/LiveAudioTranscriptionClient.cs — Streaming session with StartAsync(), AppendAsync(), GetTranscriptionStream(), StopAsync()
  • src/OpenAI/LiveAudioTranscriptionTypes.csLiveAudioTranscriptionResult and CoreErrorResponse types

Modified files

  • src/OpenAI/AudioClient.cs — Added CreateLiveTranscriptionSession() factory method
  • src/Detail/ICoreInterop.cs — Added StreamingRequestBuffer struct, StartAudioStream, PushAudioData, StopAudioStream interface methods
  • src/Detail/CoreInterop.cs — Routes audio commands through existing execute_command / execute_command_with_binary native entry points (no separate audio exports needed)
  • src/Detail/JsonSerializationContext.cs — Registered LiveAudioTranscriptionResult for AOT compatibility
  • test/FoundryLocal.Tests/Utils.cs — Updated to use CreateLiveTranscriptionSession()

Documentation

image

API surface

var audioClient = await model.GetAudioClientAsync();
var session = audioClient.CreateLiveTranscriptionSession();

session.Settings.SampleRate = 16000;
session.Settings.Channels = 1;
session.Settings.Language = "en";

await session.StartAsync();

// Push audio from microphone callback
await session.AppendAsync(pcmBytes);

// Read results as async stream
await foreach (var result in session.GetTranscriptionStream())
{
    Console.Write(result.Text);
}

await session.StopAsync();

Design highlights

  • Internal push queue — Bounded Channel<T> serializes audio pushes from any thread (safe for mic callbacks) with backpressure
  • Retry policy — Transient native errors retried with exponential backoff (3 attempts); permanent errors terminate the session
  • Settings freeze — Audio format settings are snapshot-copied at StartAsync() and immutable during the session
  • Cancellation-safe stopStopAsync always calls native stop even if cancelled, preventing native session leaks
  • Dedicated session CTS — Push loop uses its own CancellationTokenSource, decoupled from the caller's token
  • Routes through existing exportsStartAudioStream and StopAudioStream route through execute_command; PushAudioData routes through execute_command_with_binary — no new native entry points required

Core integration (neutron-server)

The Core side (AudioStreamingSession.cs) uses StreamingProcessor + Generator + Tokenizer + TokenizerStream from onnxruntime-genai to perform real-time RNNT decoding. The native commands (audio_stream_start/push/stop) are handled as cases in NativeInterop.ExecuteCommandManaged / ExecuteCommandWithBinaryManaged.

Verified working

  • ✅ SDK build succeeds (0 errors)
  • ✅ GenAI StreamingProcessor pipeline verified with WAV file (correct transcript)
  • ✅ Core TranscribeChunk byte[] PCM path matches reference float[] path exactly
  • ✅ Full E2E simulation: SDK Channel + JSON serialization + session management (32 partial + 1 final result)
  • ✅ Live microphone test: 67s real-time transcription through SDK → Core → GenAI
  • ✅ Full SDK → Core → GenAI E2E with locally built Core DLL and GenAI NuGet 0.13.0-dev

@vercel
Copy link

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
foundry-local Ready Ready Preview, Comment Mar 13, 2026 8:27pm

Request Review

ruiren_microsoft added 2 commits March 10, 2026 18:09
@rui-ren rui-ren changed the title Add real-time audio streaming support (Microphone ASR) - c# Add live audio transcription streaming support to Foundry Local C# SDK Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant