RDKEMW-14906 : VAD detector in Control Manager/Voice SDK by dwolaver · Pull Request #186 · rdkcentral/control

dwolaver · 2026-03-23T19:46:24Z

No description provided.

Copilot

Pull request overview

Adds voice activity detection (VAD) support and reporting across Control Manager / Voice SDK, including new session-end signaling for silent utterances and extended voice-session telemetry.

Changes:

Extend voice session telemetry marker schema (version bump) and emit VAD-related stats (voice detected / peak confidence / peak RMS).
Add a new IPC/session-end result for silent utterances and broadcast it via both legacy IARM and Thunder JSON payloads.
Introduce a configurable voice_activity_detection_mode preference and propagate it into XRSR routing (PTT path).

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/voice/telemetry/ctrlm_voice_telemetry_events.h	Extends telemetry session struct and stream-end API to carry VAD stats.
src/voice/telemetry/ctrlm_voice_telemetry_events.cpp	Emits additional telemetry fields and records VAD stats on stream end.
src/voice/ipc/ctrlm_voice_ipc.h	Adds `SESSION_END_SILENT_UTTERANCE` result type.
src/voice/ipc/ctrlm_voice_ipc_iarm_thunder.cpp	Adds Thunder JSON payload support for silent-utterance session end.
src/voice/ipc/ctrlm_voice_ipc_iarm_legacy.cpp	Broadcasts a new legacy IARM event for silent sessions.
src/voice/ctrlm_voice_obj.h	Adds VAD mode preference and a string-to-XRSR mapping helper.
src/voice/ctrlm_voice_obj.cpp	Reads VAD mode from defaults/config/RFC, logs it, and emits silent-utterance IPC; forwards VAD stats to telemetry.
src/voice/ctrlm_voice_obj_generic.cpp	Passes configured VAD mode into XRSR route configuration (PTT).
src/telemetry/ctrlm_telemetry_markers.h	Updates voice session stats marker documentation and bumps schema version to 3.
src/ctrlm_config_default.json	Adds default `voice_activity_detection_mode` configuration value.
include/ctrlm_ipc.h	Adds `CTRLM_VOICE_IARM_EVENT_SESSION_SILENT` to the main IARM event enum.
include/ctrlm_ipc_voice.h	Defines the new silent-session IARM event payload and documents the event.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T19:17:00Z

src/voice/telemetry/ctrlm_voice_telemetry_events.h

    int32_t     m_end_reason_stream;    // reason for ending the stream as reported by speech router

+    int32_t     m_voice_detected;       // flag to indicate if voice was detected (-1 = not reported, 0 = no voice detected, 1 = voice detected)
+    int32_t     m_peak_confidence;      // peak confidence level when voice detection is reported


m_peak_confidence is declared as int32_t, but update_on_stream_end() accepts peak_confidence as uint32_t and the value is treated as non-negative (0..N). This signed/unsigned mismatch can lead to compiler warnings and potential wrap if an out-of-range value is ever passed; please align the member type (and any related fields) with the parameter type (e.g., uint32_t) or add explicit bounds-checked conversion.

Suggested change

int32_t m_peak_confidence; // peak confidence level when voice detection is reported

uint32_t m_peak_confidence; // peak confidence level when voice detection is reported

Copilot · 2026-04-03T19:17:01Z

src/voice/telemetry/ctrlm_voice_telemetry_events.cpp

+    if(voice_detected >= 0) {
+        m_voice_detected        = voice_detected;
+        m_peak_confidence       = peak_confidence;
+        m_peak_rms_level        = peak_rms_level;


Here peak_confidence (uint32_t) is assigned into m_peak_confidence (int32_t). This is a lossy signedness conversion and may trigger warnings; consider making m_peak_confidence a uint32_t (to match the API) or performing an explicit, clamped cast.

Copilot · 2026-04-03T19:17:01Z

src/telemetry/ctrlm_telemetry_markers.h

 // <stream_ret_code>      - audio stream success/error code.
 // <protocol_ret_code>    - protocol success/error code.
 // <server_ret_code>      - server success/error code.
 // <server_message>       - server message.
 // <result>               - flag to indicate if session was successful.
+// <end_reason_stream>    - reason why the stream ended (if available).
+// <ret_code_protocol>    - protocol return code.
+// <voice_detected>       - flag to indicate if voice was detected.


The marker format documentation now includes both <protocol_ret_code> and <ret_code_protocol>, and both are described as a protocol success/error/return code. This is ambiguous for telemetry consumers; please rename one of the fields in the documentation and/or clarify the distinction (e.g., session end reason vs protocol/library return code) so the schema is unambiguous for version 3.

dwolaver added 6 commits March 23, 2026 13:51

Adding config params

3da65c7

Remove vad timeout

9fa0704

Only use VAD mode setting for PTT

51895ce

Add silent utterance reason

15a964e

adding telemetry data

b37e68f

change the telemetry stats default back

8e27504

dwolaver marked this pull request as ready for review April 3, 2026 19:09

dwolaver requested a review from a team as a code owner April 3, 2026 19:09

Copilot AI review requested due to automatic review settings April 3, 2026 19:09

Copilot started reviewing on behalf of dwolaver April 3, 2026 19:10 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDKEMW-14906 : VAD detector in Control Manager/Voice SDK#186

RDKEMW-14906 : VAD detector in Control Manager/Voice SDK#186
dwolaver wants to merge 6 commits intodevelopfrom
feature/RDKEMW-14906_stream_voice_activity_detection

dwolaver commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	int32_t m_peak_confidence; // peak confidence level when voice detection is reported
	uint32_t m_peak_confidence; // peak confidence level when voice detection is reported

Conversation

dwolaver commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants