RDKEMW-14906 : VAD detector in Control Manager/Voice SDK#186
RDKEMW-14906 : VAD detector in Control Manager/Voice SDK#186
Conversation
There was a problem hiding this comment.
Pull request overview
Adds voice activity detection (VAD) support and reporting across Control Manager / Voice SDK, including new session-end signaling for silent utterances and extended voice-session telemetry.
Changes:
- Extend voice session telemetry marker schema (version bump) and emit VAD-related stats (voice detected / peak confidence / peak RMS).
- Add a new IPC/session-end result for silent utterances and broadcast it via both legacy IARM and Thunder JSON payloads.
- Introduce a configurable
voice_activity_detection_modepreference and propagate it into XRSR routing (PTT path).
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/voice/telemetry/ctrlm_voice_telemetry_events.h | Extends telemetry session struct and stream-end API to carry VAD stats. |
| src/voice/telemetry/ctrlm_voice_telemetry_events.cpp | Emits additional telemetry fields and records VAD stats on stream end. |
| src/voice/ipc/ctrlm_voice_ipc.h | Adds SESSION_END_SILENT_UTTERANCE result type. |
| src/voice/ipc/ctrlm_voice_ipc_iarm_thunder.cpp | Adds Thunder JSON payload support for silent-utterance session end. |
| src/voice/ipc/ctrlm_voice_ipc_iarm_legacy.cpp | Broadcasts a new legacy IARM event for silent sessions. |
| src/voice/ctrlm_voice_obj.h | Adds VAD mode preference and a string-to-XRSR mapping helper. |
| src/voice/ctrlm_voice_obj.cpp | Reads VAD mode from defaults/config/RFC, logs it, and emits silent-utterance IPC; forwards VAD stats to telemetry. |
| src/voice/ctrlm_voice_obj_generic.cpp | Passes configured VAD mode into XRSR route configuration (PTT). |
| src/telemetry/ctrlm_telemetry_markers.h | Updates voice session stats marker documentation and bumps schema version to 3. |
| src/ctrlm_config_default.json | Adds default voice_activity_detection_mode configuration value. |
| include/ctrlm_ipc.h | Adds CTRLM_VOICE_IARM_EVENT_SESSION_SILENT to the main IARM event enum. |
| include/ctrlm_ipc_voice.h | Defines the new silent-session IARM event payload and documents the event. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| int32_t m_end_reason_stream; // reason for ending the stream as reported by speech router | ||
|
|
||
| int32_t m_voice_detected; // flag to indicate if voice was detected (-1 = not reported, 0 = no voice detected, 1 = voice detected) | ||
| int32_t m_peak_confidence; // peak confidence level when voice detection is reported |
There was a problem hiding this comment.
m_peak_confidence is declared as int32_t, but update_on_stream_end() accepts peak_confidence as uint32_t and the value is treated as non-negative (0..N). This signed/unsigned mismatch can lead to compiler warnings and potential wrap if an out-of-range value is ever passed; please align the member type (and any related fields) with the parameter type (e.g., uint32_t) or add explicit bounds-checked conversion.
| int32_t m_peak_confidence; // peak confidence level when voice detection is reported | |
| uint32_t m_peak_confidence; // peak confidence level when voice detection is reported |
| if(voice_detected >= 0) { | ||
| m_voice_detected = voice_detected; | ||
| m_peak_confidence = peak_confidence; | ||
| m_peak_rms_level = peak_rms_level; |
There was a problem hiding this comment.
Here peak_confidence (uint32_t) is assigned into m_peak_confidence (int32_t). This is a lossy signedness conversion and may trigger warnings; consider making m_peak_confidence a uint32_t (to match the API) or performing an explicit, clamped cast.
| // <stream_ret_code> - audio stream success/error code. | ||
| // <protocol_ret_code> - protocol success/error code. | ||
| // <server_ret_code> - server success/error code. | ||
| // <server_message> - server message. | ||
| // <result> - flag to indicate if session was successful. | ||
| // <end_reason_stream> - reason why the stream ended (if available). | ||
| // <ret_code_protocol> - protocol return code. | ||
| // <voice_detected> - flag to indicate if voice was detected. |
There was a problem hiding this comment.
The marker format documentation now includes both <protocol_ret_code> and <ret_code_protocol>, and both are described as a protocol success/error/return code. This is ambiguous for telemetry consumers; please rename one of the fields in the documentation and/or clarify the distinction (e.g., session end reason vs protocol/library return code) so the schema is unambiguous for version 3.
No description provided.