Skip to content

UoA-eResearch/ACE

Repository files navigation

NVIDIA ACE — Kairos Sample Project

NPCConversation Plugin CI

NPC Conversation extension by UoA eResearch adds Speech-to-Text (STT), Large Language Model (LLM) inference, and Text-to-Speech (TTS) so players can hold real-time voice conversations with MetaHuman NPCs.

For details on NVIDIA ACE, please visit the main ACE documentation website.

The original Kairos Sample Project documentation is available at the Kairos section here.


Table of Contents

  1. Feature Overview
  2. Architecture
  3. Plugin: NPCConversation
  4. Quick-Start Setup
  5. Project Settings Reference
  6. Blueprint Usage
  7. Provider Reference
  8. Full Conversation Pipeline Example
  9. Platform Notes
  10. Troubleshooting

Feature Overview

Capability Primary provider Fallback
Speech-to-Text (STT) Whisper-compatible REST API (OpenAI, local Faster-Whisper, …)
LLM inference OpenAI-compatible chat completions (Cerebras, OpenAI, Ollama, …)
Text-to-Speech (TTS) ElevenLabs REST API Platform system TTS (Windows SAPI, macOS say, Linux espeak-ng)

All three features are exposed as async Blueprint nodes that fire success/failure delegate pins, making it straightforward to wire them together inside a Blueprint graph without writing any C++.


Architecture

Player speaks
     │
     ▼
[STT] UNPCSTTAsync::AsyncRecordAndTranscribe
     │  records microphone → WAV → Whisper API
     │  OnSuccess → transcribed text
     ▼
[LLM] UNPCLLMAsync::AsyncSendToLLM
     │  chat completions (Cerebras / OpenAI / Ollama / …)
     │  OnSuccess → response text
     ▼
[TTS] UNPCTTSAsync::AsyncSpeakText
     │  ElevenLabs API (or system TTS fallback)
     │  OnSuccess → WavFilePath
     ▼
[ACE] Animate Character From Wav File Async  (NV_ACE_Reference plugin)
     │  Audio2Face-3D drives MetaHuman lip-sync + facial animation
     ▼
NPC speaks with synced facial animation

Plugin: NPCConversation

Located at Plugins/NPCConversation/. The plugin is a single Runtime module that adds three async Blueprint nodes and a Developer Settings panel. It has no dependency on NV_ACE_Reference so it can be used in other Unreal projects as-is.

Blueprint nodes added

Node Category Description
Record and Transcribe (STT Async) NPC Conversation | STT Capture microphone for N seconds → Whisper API → text
Send Message to LLM (Async) NPC Conversation | LLM text → chat completions → response text
Speak Text (TTS Async) NPC Conversation | TTS text → WAV file (ElevenLabs or system fallback)

Quick-Start Setup

1 — Enable the plugin

The plugin is already enabled in KairosSample.uproject. If you copy it to another project add:

{ "Name": "NPCConversation", "Enabled": true }

to the Plugins array in your .uproject file.

2 — Obtain API keys

Provider URL Notes
Cerebras (LLM) https://cloud.cerebras.ai Free tier available. Copy the API key from your dashboard.
OpenAI (LLM / STT) https://platform.openai.com Covers both gpt-* models and whisper-1.
ElevenLabs (TTS) https://elevenlabs.io Free tier includes 10 000 characters/month.

3 — Enter API keys in Project Settings

Open Edit → Project Settings → Plugins → NPC Conversation and fill in:

  • LLM API Key — Cerebras or OpenAI key
  • ElevenLabs API Key — ElevenLabs key
  • Whisper API Key — OpenAI key (or leave empty for local servers)

Security note — API keys are stored in Config/DefaultEngine.ini. Do not commit that file to a public repository when it contains real keys. Consider using environment variable substitution or a secrets-management solution for production builds.

4 — Allow microphone access

On Windows, ensure the app has Microphone permission under Settings → Privacy → Microphone.

On macOS add the NSMicrophoneUsageDescription key to Info.plist.

5 — Connect the Blueprint nodes

See Blueprint Usage below.


Project Settings Reference

All settings live under Edit → Project Settings → Plugins → NPC Conversation.

LLM

Setting Default Description
LLM Base URL https://api.cerebras.ai/v1 Base URL of any OpenAI-compatible chat completions API.
LLM API Key (empty) Bearer token sent in the Authorization header. Omit for local servers that don't require auth.
LLM Model llama-3.3-70b Model name forwarded in the request body.
Default System Prompt "You are a helpful NPC …" Injected as the system message before each turn. Override per-call in Blueprints.
Max Tokens 200 Maximum tokens the model may generate per reply.
LLM Timeout (s) 30 HTTP timeout for LLM calls.

TTS

Setting Default Description
TTS Provider ElevenLabs (with System fallback) Use System TTS only to bypass ElevenLabs entirely.
ElevenLabs API Key (empty) If empty, the plugin skips ElevenLabs and uses system TTS.
ElevenLabs Voice ID 21m00Tcm4TlvDq8ikWAM (Rachel) Voice ID from the ElevenLabs voice library.
ElevenLabs Model ID eleven_turbo_v2_5 ElevenLabs model. eleven_turbo_v2_5 is fast; eleven_multilingual_v2 supports more languages.
TTS Timeout (s) 30 HTTP timeout for ElevenLabs calls.

STT

Setting Default Description
STT Provider Whisper API (with System fallback) Whisper API: tries the Whisper endpoint first, falls back to system STT on failure. System STT only: skips the API and goes straight to system STT.
Whisper API Base URL https://api.openai.com/v1 Base URL for a Whisper-compatible server. Leave empty to skip directly to system STT.
Whisper API Key (empty) Bearer token. Leave empty for local servers that don't require auth.
Whisper Model whisper-1 Model forwarded in the multipart request.
Default Recording Duration (s) 5.0 How long to record when no duration is passed to the Blueprint node.
STT Timeout (s) 60 HTTP timeout for transcription calls (allow extra time for large audio).

Blueprint Usage

STT node — Record and Transcribe

[Record and Transcribe (STT Async)]
  RecordingDurationSeconds: 5.0
  ├── OnSuccess → TranscribedText (FString), bSuccess
  └── OnFailure → TranscribedText (""), bSuccess (false)

RecordingDurationSeconds = 0 uses the project-default from settings.

LLM node — Send Message to LLM

[Send Message to LLM (Async)]
  UserMessage: <TranscribedText from STT>
  SystemPromptOverride: ""          ← leave empty to use the project default
  ├── OnSuccess → ResponseText (FString), bSuccess
  └── OnFailure → ResponseText (""), bSuccess (false)

TTS node — Speak Text

[Speak Text (TTS Async)]
  Text: <ResponseText from LLM>
  ├── OnSuccess → WavFilePath (FString), bSuccess
  └── OnFailure → WavFilePath (""), bSuccess (false)

WavFilePath is a temporary file on disk. Pass it directly to the ACE plugin's Animate Character From Wav File Async node. You are responsible for deleting the file when done (use the File Manager utility Blueprint or Delete File from a helper library).

ACE integration — drive MetaHuman lip-sync

[Animate Character From Wav File Async]  ← NV_ACE_Reference plugin
  WorldContextObject: self
  Character: <your MetaHuman actor reference>
  PathToWav: <WavFilePath from TTS>
  ├── AudioSendCompleted → bSuccess

Full Conversation Pipeline Example

Below is the minimal Blueprint graph for a single conversation turn.

[Input Action "Talk"]
    │
    ▼
[Record and Transcribe (STT Async)]  Duration=5s
    │ OnSuccess
    ▼
[Send Message to LLM (Async)]  UserMessage=TranscribedText
    │ OnSuccess
    ▼
[Speak Text (TTS Async)]  Text=ResponseText
    │ OnSuccess
    ▼
[Animate Character From Wav File Async]  PathToWav=WavFilePath  Character=MetaHumanRef
    │ AudioSendCompleted
    ▼
[Delete File]  FilePath=WavFilePath   ← clean up temp file

For a continuous conversation (the NPC remembers context) you need to maintain a conversation history array and pass it as additional system/assistant messages. The SystemPromptOverride parameter can carry a JSON-serialised history string, or you can extend the plugin with a stateful conversation manager component.


Provider Reference

LLM Providers

Any OpenAI-compatible /chat/completions endpoint works. Set LLM Base URL accordingly.

Provider Base URL Recommended model
Cerebras https://api.cerebras.ai/v1 llama-3.3-70b (fast, free tier)
OpenAI https://api.openai.com/v1 gpt-4o-mini
Ollama (local) http://localhost:11434/v1 any pulled model, e.g. llama3
LM Studio (local) http://localhost:1234/v1 any loaded model
Groq https://api.groq.com/openai/v1 llama-3.3-70b-versatile

TTS Providers

ElevenLabs (primary)

  • Endpoint: https://api.elevenlabs.io/v1/text-to-speech/{voice_id}?output_format=pcm_22050
  • Auth: xi-api-key header
  • Output: raw int16 PCM at 22 050 Hz, mono, wrapped in a WAV container by the plugin
  • Voice library: https://elevenlabs.io/voice-library

System TTS (fallback)

When ElevenLabs is unavailable (no key, HTTP error), the plugin falls back to the platform's built-in speech synthesizer.

Platform Tool used Notes
Windows PowerShell + System.Speech.Synthesis No extra install needed. ~2–3 s startup latency.
macOS /usr/bin/say + afconvert Built-in. Supports many voices via System Preferences → Accessibility → Spoken Content.
Linux espeak-ng Install with sudo apt install espeak-ng. Uses -f (file input) and -w (WAV output) flags.

STT Providers

Whisper API (primary, OpenAI-compatible)

  • Endpoint: POST {base_url}/audio/transcriptions (multipart/form-data)
  • Field file: WAV audio, 16-bit PCM, mono — auto-generated by the plugin from the microphone at the device's native sample rate
  • Field model: as configured in Project Settings
  • Response: { "text": "transcribed text" }

Compatible servers:

Option URL Notes
OpenAI https://api.openai.com/v1 Hosted, whisper-1 model
Faster-Whisper server http://localhost:9000 Local GPU inference, very fast
Whisper.cpp server http://localhost:8080 CPU-friendly, OpenAI-compatible API
Groq https://api.groq.com/openai/v1 Free tier, whisper-large-v3-turbo

System STT (fallback)

When the Whisper API is unavailable, times out, or returns an error, the plugin automatically falls back to platform speech recognition. This also fires when STT Provider is set to System STT only.

Platform Tool used Notes
Windows PowerShell + System.Speech.Recognition Built-in (requires .NET Framework ≥ 3.5). Accuracy is lower than Whisper for open-domain speech.
macOS Not available. OnFailure fires with a log message suggesting a local Whisper server.
Linux Not available. OnFailure fires with a log message suggesting a local Whisper server.

Platform Notes

  • Windows (Win64) — Fully tested path. Microphone capture uses the Windows Audio Session API (WASAPI) via UE's AudioCapture module.

  • macOS — Microphone capture works via CoreAudio. System TTS uses say/afconvert (built-in). Ensure NSMicrophoneUsageDescription is set in Info.plist.

  • Linux — Microphone capture works via PulseAudio/ALSA. System TTS requires espeak-ng.

  • Other platforms (Android, iOS, Console) — LLM and TTS (ElevenLabs) work on any platform that has internet access. Microphone capture (STT) and system TTS fallback depend on platform support.


Troubleshooting

"Failed to open default capture stream"

  • Check microphone permissions (Windows Privacy settings / macOS System Settings).
  • Make sure no other application has an exclusive lock on the microphone.
  • Try a different microphone device.

LLM returns HTTP 401 / 403

  • Double-check the LLM API Key in Project Settings.
  • For Cerebras, ensure the key starts with csk-. For OpenAI, it starts with sk-.

ElevenLabs returns HTTP 401

Whisper returns HTTP 413 (payload too large)

  • Reduce Default Recording Duration in Project Settings (OpenAI limit is 25 MB / ~16 min).

System TTS produces no audio on Windows

  • Verify PowerShell is available: run powershell -Command "Write-Host OK" in a terminal.
  • Check that System.Speech assembly is present (it ships with .NET Framework ≥ 3.5).

Temp WAV files accumulate on disk

  • The plugin writes temp files to the OS temp directory (%TEMP% / /tmp).
  • Call Delete File (or use IFileManager::Get().Delete()) on WavFilePath after the ACE animation node completes.

About

Experiments with NVIDIA ACE in UE5 https://developer.nvidia.com/ace-for-games

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors