NPC Conversation extension by UoA eResearch adds Speech-to-Text (STT), Large Language Model (LLM) inference, and Text-to-Speech (TTS) so players can hold real-time voice conversations with MetaHuman NPCs.
For details on NVIDIA ACE, please visit the main ACE documentation website.
The original Kairos Sample Project documentation is available at the Kairos section here.
- Feature Overview
- Architecture
- Plugin: NPCConversation
- Quick-Start Setup
- Project Settings Reference
- Blueprint Usage
- Provider Reference
- Full Conversation Pipeline Example
- Platform Notes
- Troubleshooting
| Capability | Primary provider | Fallback |
|---|---|---|
| Speech-to-Text (STT) | Whisper-compatible REST API (OpenAI, local Faster-Whisper, …) | — |
| LLM inference | OpenAI-compatible chat completions (Cerebras, OpenAI, Ollama, …) | — |
| Text-to-Speech (TTS) | ElevenLabs REST API | Platform system TTS (Windows SAPI, macOS say, Linux espeak-ng) |
All three features are exposed as async Blueprint nodes that fire success/failure delegate pins, making it straightforward to wire them together inside a Blueprint graph without writing any C++.
Player speaks
│
▼
[STT] UNPCSTTAsync::AsyncRecordAndTranscribe
│ records microphone → WAV → Whisper API
│ OnSuccess → transcribed text
▼
[LLM] UNPCLLMAsync::AsyncSendToLLM
│ chat completions (Cerebras / OpenAI / Ollama / …)
│ OnSuccess → response text
▼
[TTS] UNPCTTSAsync::AsyncSpeakText
│ ElevenLabs API (or system TTS fallback)
│ OnSuccess → WavFilePath
▼
[ACE] Animate Character From Wav File Async (NV_ACE_Reference plugin)
│ Audio2Face-3D drives MetaHuman lip-sync + facial animation
▼
NPC speaks with synced facial animation
Located at Plugins/NPCConversation/. The plugin is a single Runtime module that adds three async
Blueprint nodes and a Developer Settings panel. It has no dependency on NV_ACE_Reference so
it can be used in other Unreal projects as-is.
| Node | Category | Description |
|---|---|---|
Record and Transcribe (STT Async) |
NPC Conversation | STT | Capture microphone for N seconds → Whisper API → text |
Send Message to LLM (Async) |
NPC Conversation | LLM | text → chat completions → response text |
Speak Text (TTS Async) |
NPC Conversation | TTS | text → WAV file (ElevenLabs or system fallback) |
The plugin is already enabled in KairosSample.uproject. If you copy it to another project add:
{ "Name": "NPCConversation", "Enabled": true }to the Plugins array in your .uproject file.
| Provider | URL | Notes |
|---|---|---|
| Cerebras (LLM) | https://cloud.cerebras.ai | Free tier available. Copy the API key from your dashboard. |
| OpenAI (LLM / STT) | https://platform.openai.com | Covers both gpt-* models and whisper-1. |
| ElevenLabs (TTS) | https://elevenlabs.io | Free tier includes 10 000 characters/month. |
Open Edit → Project Settings → Plugins → NPC Conversation and fill in:
- LLM API Key — Cerebras or OpenAI key
- ElevenLabs API Key — ElevenLabs key
- Whisper API Key — OpenAI key (or leave empty for local servers)
Security note — API keys are stored in
Config/DefaultEngine.ini. Do not commit that file to a public repository when it contains real keys. Consider using environment variable substitution or a secrets-management solution for production builds.
On Windows, ensure the app has Microphone permission under Settings → Privacy → Microphone.
On macOS add the NSMicrophoneUsageDescription key to Info.plist.
See Blueprint Usage below.
All settings live under Edit → Project Settings → Plugins → NPC Conversation.
| Setting | Default | Description |
|---|---|---|
| LLM Base URL | https://api.cerebras.ai/v1 |
Base URL of any OpenAI-compatible chat completions API. |
| LLM API Key | (empty) | Bearer token sent in the Authorization header. Omit for local servers that don't require auth. |
| LLM Model | llama-3.3-70b |
Model name forwarded in the request body. |
| Default System Prompt | "You are a helpful NPC …" | Injected as the system message before each turn. Override per-call in Blueprints. |
| Max Tokens | 200 |
Maximum tokens the model may generate per reply. |
| LLM Timeout (s) | 30 |
HTTP timeout for LLM calls. |
| Setting | Default | Description |
|---|---|---|
| TTS Provider | ElevenLabs (with System fallback) |
Use System TTS only to bypass ElevenLabs entirely. |
| ElevenLabs API Key | (empty) | If empty, the plugin skips ElevenLabs and uses system TTS. |
| ElevenLabs Voice ID | 21m00Tcm4TlvDq8ikWAM (Rachel) |
Voice ID from the ElevenLabs voice library. |
| ElevenLabs Model ID | eleven_turbo_v2_5 |
ElevenLabs model. eleven_turbo_v2_5 is fast; eleven_multilingual_v2 supports more languages. |
| TTS Timeout (s) | 30 |
HTTP timeout for ElevenLabs calls. |
| Setting | Default | Description |
|---|---|---|
| STT Provider | Whisper API (with System fallback) |
Whisper API: tries the Whisper endpoint first, falls back to system STT on failure. System STT only: skips the API and goes straight to system STT. |
| Whisper API Base URL | https://api.openai.com/v1 |
Base URL for a Whisper-compatible server. Leave empty to skip directly to system STT. |
| Whisper API Key | (empty) | Bearer token. Leave empty for local servers that don't require auth. |
| Whisper Model | whisper-1 |
Model forwarded in the multipart request. |
| Default Recording Duration (s) | 5.0 |
How long to record when no duration is passed to the Blueprint node. |
| STT Timeout (s) | 60 |
HTTP timeout for transcription calls (allow extra time for large audio). |
[Record and Transcribe (STT Async)]
RecordingDurationSeconds: 5.0
├── OnSuccess → TranscribedText (FString), bSuccess
└── OnFailure → TranscribedText (""), bSuccess (false)
RecordingDurationSeconds = 0 uses the project-default from settings.
[Send Message to LLM (Async)]
UserMessage: <TranscribedText from STT>
SystemPromptOverride: "" ← leave empty to use the project default
├── OnSuccess → ResponseText (FString), bSuccess
└── OnFailure → ResponseText (""), bSuccess (false)
[Speak Text (TTS Async)]
Text: <ResponseText from LLM>
├── OnSuccess → WavFilePath (FString), bSuccess
└── OnFailure → WavFilePath (""), bSuccess (false)
WavFilePath is a temporary file on disk. Pass it directly to the ACE plugin's
Animate Character From Wav File Async node. You are responsible for deleting the file when
done (use the File Manager utility Blueprint or Delete File from a helper library).
[Animate Character From Wav File Async] ← NV_ACE_Reference plugin
WorldContextObject: self
Character: <your MetaHuman actor reference>
PathToWav: <WavFilePath from TTS>
├── AudioSendCompleted → bSuccess
Below is the minimal Blueprint graph for a single conversation turn.
[Input Action "Talk"]
│
▼
[Record and Transcribe (STT Async)] Duration=5s
│ OnSuccess
▼
[Send Message to LLM (Async)] UserMessage=TranscribedText
│ OnSuccess
▼
[Speak Text (TTS Async)] Text=ResponseText
│ OnSuccess
▼
[Animate Character From Wav File Async] PathToWav=WavFilePath Character=MetaHumanRef
│ AudioSendCompleted
▼
[Delete File] FilePath=WavFilePath ← clean up temp file
For a continuous conversation (the NPC remembers context) you need to maintain a conversation
history array and pass it as additional system/assistant messages. The SystemPromptOverride
parameter can carry a JSON-serialised history string, or you can extend the plugin with a stateful
conversation manager component.
Any OpenAI-compatible /chat/completions endpoint works. Set LLM Base URL accordingly.
| Provider | Base URL | Recommended model |
|---|---|---|
| Cerebras | https://api.cerebras.ai/v1 |
llama-3.3-70b (fast, free tier) |
| OpenAI | https://api.openai.com/v1 |
gpt-4o-mini |
| Ollama (local) | http://localhost:11434/v1 |
any pulled model, e.g. llama3 |
| LM Studio (local) | http://localhost:1234/v1 |
any loaded model |
| Groq | https://api.groq.com/openai/v1 |
llama-3.3-70b-versatile |
- Endpoint:
https://api.elevenlabs.io/v1/text-to-speech/{voice_id}?output_format=pcm_22050 - Auth:
xi-api-keyheader - Output: raw int16 PCM at 22 050 Hz, mono, wrapped in a WAV container by the plugin
- Voice library: https://elevenlabs.io/voice-library
When ElevenLabs is unavailable (no key, HTTP error), the plugin falls back to the platform's built-in speech synthesizer.
| Platform | Tool used | Notes |
|---|---|---|
| Windows | PowerShell + System.Speech.Synthesis |
No extra install needed. ~2–3 s startup latency. |
| macOS | /usr/bin/say + afconvert |
Built-in. Supports many voices via System Preferences → Accessibility → Spoken Content. |
| Linux | espeak-ng |
Install with sudo apt install espeak-ng. Uses -f (file input) and -w (WAV output) flags. |
- Endpoint:
POST {base_url}/audio/transcriptions(multipart/form-data) - Field
file: WAV audio, 16-bit PCM, mono — auto-generated by the plugin from the microphone at the device's native sample rate - Field
model: as configured in Project Settings - Response:
{ "text": "transcribed text" }
Compatible servers:
| Option | URL | Notes |
|---|---|---|
| OpenAI | https://api.openai.com/v1 |
Hosted, whisper-1 model |
| Faster-Whisper server | http://localhost:9000 |
Local GPU inference, very fast |
| Whisper.cpp server | http://localhost:8080 |
CPU-friendly, OpenAI-compatible API |
| Groq | https://api.groq.com/openai/v1 |
Free tier, whisper-large-v3-turbo |
When the Whisper API is unavailable, times out, or returns an error, the plugin automatically
falls back to platform speech recognition. This also fires when STT Provider is set to
System STT only.
| Platform | Tool used | Notes |
|---|---|---|
| Windows | PowerShell + System.Speech.Recognition |
Built-in (requires .NET Framework ≥ 3.5). Accuracy is lower than Whisper for open-domain speech. |
| macOS | — | Not available. OnFailure fires with a log message suggesting a local Whisper server. |
| Linux | — | Not available. OnFailure fires with a log message suggesting a local Whisper server. |
-
Windows (Win64) — Fully tested path. Microphone capture uses the Windows Audio Session API (WASAPI) via UE's
AudioCapturemodule. -
macOS — Microphone capture works via CoreAudio. System TTS uses
say/afconvert(built-in). EnsureNSMicrophoneUsageDescriptionis set inInfo.plist. -
Linux — Microphone capture works via PulseAudio/ALSA. System TTS requires
espeak-ng. -
Other platforms (Android, iOS, Console) — LLM and TTS (ElevenLabs) work on any platform that has internet access. Microphone capture (STT) and system TTS fallback depend on platform support.
- Check microphone permissions (Windows Privacy settings / macOS System Settings).
- Make sure no other application has an exclusive lock on the microphone.
- Try a different microphone device.
- Double-check the LLM API Key in Project Settings.
- For Cerebras, ensure the key starts with
csk-. For OpenAI, it starts withsk-.
- Verify the ElevenLabs API Key in Project Settings.
- Check your ElevenLabs quota at https://elevenlabs.io/subscription.
- Reduce Default Recording Duration in Project Settings (OpenAI limit is 25 MB / ~16 min).
- Verify PowerShell is available: run
powershell -Command "Write-Host OK"in a terminal. - Check that
System.Speechassembly is present (it ships with .NET Framework ≥ 3.5).
- The plugin writes temp files to the OS temp directory (
%TEMP%//tmp). - Call Delete File (or use
IFileManager::Get().Delete()) onWavFilePathafter the ACE animation node completes.