Skip to content

RDKEMW-14906 : VAD detector in Control Manager/Voice SDK#186

Open
dwolaver wants to merge 6 commits intodevelopfrom
feature/RDKEMW-14906_stream_voice_activity_detection
Open

RDKEMW-14906 : VAD detector in Control Manager/Voice SDK#186
dwolaver wants to merge 6 commits intodevelopfrom
feature/RDKEMW-14906_stream_voice_activity_detection

Conversation

@dwolaver
Copy link
Copy Markdown
Contributor

No description provided.

@dwolaver dwolaver marked this pull request as ready for review April 3, 2026 19:09
@dwolaver dwolaver requested a review from a team as a code owner April 3, 2026 19:09
Copilot AI review requested due to automatic review settings April 3, 2026 19:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds voice activity detection (VAD) support and reporting across Control Manager / Voice SDK, including new session-end signaling for silent utterances and extended voice-session telemetry.

Changes:

  • Extend voice session telemetry marker schema (version bump) and emit VAD-related stats (voice detected / peak confidence / peak RMS).
  • Add a new IPC/session-end result for silent utterances and broadcast it via both legacy IARM and Thunder JSON payloads.
  • Introduce a configurable voice_activity_detection_mode preference and propagate it into XRSR routing (PTT path).

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/voice/telemetry/ctrlm_voice_telemetry_events.h Extends telemetry session struct and stream-end API to carry VAD stats.
src/voice/telemetry/ctrlm_voice_telemetry_events.cpp Emits additional telemetry fields and records VAD stats on stream end.
src/voice/ipc/ctrlm_voice_ipc.h Adds SESSION_END_SILENT_UTTERANCE result type.
src/voice/ipc/ctrlm_voice_ipc_iarm_thunder.cpp Adds Thunder JSON payload support for silent-utterance session end.
src/voice/ipc/ctrlm_voice_ipc_iarm_legacy.cpp Broadcasts a new legacy IARM event for silent sessions.
src/voice/ctrlm_voice_obj.h Adds VAD mode preference and a string-to-XRSR mapping helper.
src/voice/ctrlm_voice_obj.cpp Reads VAD mode from defaults/config/RFC, logs it, and emits silent-utterance IPC; forwards VAD stats to telemetry.
src/voice/ctrlm_voice_obj_generic.cpp Passes configured VAD mode into XRSR route configuration (PTT).
src/telemetry/ctrlm_telemetry_markers.h Updates voice session stats marker documentation and bumps schema version to 3.
src/ctrlm_config_default.json Adds default voice_activity_detection_mode configuration value.
include/ctrlm_ipc.h Adds CTRLM_VOICE_IARM_EVENT_SESSION_SILENT to the main IARM event enum.
include/ctrlm_ipc_voice.h Defines the new silent-session IARM event payload and documents the event.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

int32_t m_end_reason_stream; // reason for ending the stream as reported by speech router

int32_t m_voice_detected; // flag to indicate if voice was detected (-1 = not reported, 0 = no voice detected, 1 = voice detected)
int32_t m_peak_confidence; // peak confidence level when voice detection is reported
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_peak_confidence is declared as int32_t, but update_on_stream_end() accepts peak_confidence as uint32_t and the value is treated as non-negative (0..N). This signed/unsigned mismatch can lead to compiler warnings and potential wrap if an out-of-range value is ever passed; please align the member type (and any related fields) with the parameter type (e.g., uint32_t) or add explicit bounds-checked conversion.

Suggested change
int32_t m_peak_confidence; // peak confidence level when voice detection is reported
uint32_t m_peak_confidence; // peak confidence level when voice detection is reported

Copilot uses AI. Check for mistakes.
Comment on lines +282 to +285
if(voice_detected >= 0) {
m_voice_detected = voice_detected;
m_peak_confidence = peak_confidence;
m_peak_rms_level = peak_rms_level;
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here peak_confidence (uint32_t) is assigned into m_peak_confidence (int32_t). This is a lossy signedness conversion and may trigger warnings; consider making m_peak_confidence a uint32_t (to match the API) or performing an explicit, clamped cast.

Copilot uses AI. Check for mistakes.
Comment on lines 114 to +121
// <stream_ret_code> - audio stream success/error code.
// <protocol_ret_code> - protocol success/error code.
// <server_ret_code> - server success/error code.
// <server_message> - server message.
// <result> - flag to indicate if session was successful.
// <end_reason_stream> - reason why the stream ended (if available).
// <ret_code_protocol> - protocol return code.
// <voice_detected> - flag to indicate if voice was detected.
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The marker format documentation now includes both <protocol_ret_code> and <ret_code_protocol>, and both are described as a protocol success/error/return code. This is ambiguous for telemetry consumers; please rename one of the fields in the documentation and/or clarify the distinction (e.g., session end reason vs protocol/library return code) so the schema is unambiguous for version 3.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants