Skip to content

Fix incorrect configurationId when multiple DCRs share the same counters for ME metrics#2158

Open
simathih wants to merge 2 commits intomasterfrom
simathih/metricsfix
Open

Fix incorrect configurationId when multiple DCRs share the same counters for ME metrics#2158
simathih wants to merge 2 commits intomasterfrom
simathih/metricsfix

Conversation

@simathih
Copy link
Copy Markdown
Collaborator

@simathih simathih commented Mar 31, 2026

When multiple DCRs configured the same performance counters, all metrics were sent to ME tagged with only the first DCR's configurationId (due to an early break). ME uses this tag to route data per-DCR, so all subsequent DCRs received incorrectly attributed metrics.

Changes

  • parse_config in telegraf_config_handler.py
    • Collect all unique configurationIds across all fields in a plugin before emitting any config
    • For each configId, emit a separate telegraf config file named <omiclass>-<configId>.conf (previously one file per plugin, e.g. filesystem.conf)
    • Each generated file carries only its own configurationId in the [inputs.<plugin>.tags] block

Before (one file per plugin, wrong configId for DCR 2+):

filesystem.conf          # configurationId="dcr-fa9e2b03..." for all DCRs

After (one file per plugin per DCR, correct configId in each):

filesystem-dcr-c93b5596....conf   # configurationId="dcr-c93b5596..."
filesystem-dcr-fa9e2b03....conf   # configurationId="dcr-fa9e2b03..."

after
before

@simathih simathih requested a review from a team as a code owner March 31, 2026 21:56
@simathih
Copy link
Copy Markdown
Collaborator Author

simathih commented Apr 8, 2026

it will look like this now:
image

from:
image

@simathih
Copy link
Copy Markdown
Collaborator Author

simathih commented Apr 8, 2026

@copilot please review this

Copy link
Copy Markdown

Copilot AI commented Apr 8, 2026

@copilot please review this

The fix looks correct for the AMA (non-LAD) multiple DCR case. Here's my review:

What the fix does:

  • Old behavior: only the first configId from counterConfigIdMap was used per plugin (the old code had a break after getting the first configId), so all metrics for a plugin across multiple DCRs were tagged with the same (wrong) configurationId.
  • New behavior: collects all unique configIds across all fields in a plugin (lines 174–179), then generates a separate config file per configId (e.g., filesystem-dcr-abc123.conf, filesystem-dcr-def456.conf), each with the correct configurationId tag. This matches the before/after screenshots above.

One observation:
Line 187 initializes rate_specific_aggregator_str but it's never used (was the same before this PR). Minor nit.

One concern:
At line 176, counterConfigIdMap is looked up using telegraf_json[omiclass][plugin][field]["displayName"]. For non-LAD (AMA), the displayName stored in telegraf_json is the full counter name (line 126), which matches the key used when building counterConfigIdMap (line 100–101). This is correct ✓.

For LAD (is_lad=True), the displayName stored in telegraf_json is counter.split("->")[1] (line 124 — just the short name), while counterConfigIdMap uses the full counter as key. This could cause a KeyError for LAD. However, since LAD is deprecated and the LAD data format from parse_lad_perf_settings doesn't include configurationId (which would fail earlier at line 105 anyway), this is a pre-existing concern rather than a new regression.

Summary: The fix is correct for the intended AMA use case and the file-per-DCR approach matches what's shown in the screenshots.

Copilot AI changed the title fix for Incorrect configurationId in the case of multiple DCRs when s… Fix incorrect configurationId when multiple DCRs share the same counters for ME metrics Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants