Skip to content

[Feature]: Add --label flag to filter alerts by label values #115

@Inderdeep01

Description

@Inderdeep01

Describe the feature request

Summary

Add a --label flag to the alert subcommand that filters alerts by label key-value pairs. This enables per-node alert distribution in monitoring systems like Icinga, where each host should only see alerts relevant to itself.
or more specifically
Add a --node flag to filter alerts to only those matching a specific node name. This enables per-host alert distribution in Icinga where each monitored host should only see its own alerts.

Problem Statement

When using check_prometheus with Prometheus/vmalert, alerts often include labels identifying the affected host (e.g., node, hostname, instance). However, there's currently no way
to filter alerts by these labels.

Current behavior:

$ check_prometheus alert --name "ConsulServiceCritical"                                                                                                                                
CRITICAL - 50 Alerts: 50 Firing - 0 Pending - 0 Inactive                                                                                                                                   
 \_[CRITICAL] [ConsulServiceCritical] node=server1 ...                                                                                                                                 
 \_[CRITICAL] [ConsulServiceCritical] node=server2 ...                                                                                                                                 
 \_[CRITICAL] [ConsulServiceCritical] node=server3 ...                                                                                                                                 
 ... (all 50 nodes)                                                                                                                                                                        

Problem:

  • Every Icinga host running this check sees ALL alerts, not just its own
  • Alert counts are inflated on each host
  • No way to map alerts to their corresponding monitored hosts
  • Operators can't quickly identify which specific node has issues

Proposed Solution

Add a --label flag that accepts key=value pairs to filter alerts:

  $ check_prometheus alert --name "ConsulServiceCritical" --label "node=server1"                                                                                                         
  CRITICAL - 1 Alerts: 1 Firing - 0 Pending - 0 Inactive                                                                                                                                     
   \_[CRITICAL] [ConsulServiceCritical] node=server1 is firing - value: 1.00

A more specific implementation (for Icinga in our case)

We run check_prometheus against vmalert to surface Consul service alerts in Icinga. Each alert includes a node label identifying the affected host:

{                                                                                                                                                                                          
  "alertname": "ConsulServiceNodeCritical",                                                                                                                                                
  "node": "app01.example.com",                                                                                                                                                             
  "service_name": "api-gateway",                                                                                                                                                           
  "severity": "critical"                                                                                                                                                                   
}                                                                                                                                                                                          

Current behavior: Every Icinga host sees ALL alerts for ALL nodes:

  $ check_prometheus alert --name "ConsulServiceCritical"                                                                                                                                
  CRITICAL - 47 Alerts: 47 Firing - 0 Pending - 0 Inactive                                                                                                                                   
   \_[CRITICAL] [ConsulServiceCritical] node=app01.example.com ...                                                                                                                       
   \_[CRITICAL] [ConsulServiceCritical] node=app02.example.com ...                                                                                                                       
   \_[CRITICAL] [ConsulServiceCritical] node=app03.example.com ...                                                                                                                       
   ... (44 more nodes)                                                                                                                                                                       

Expected behavior: Each host should only see its own alerts:

  $ check_prometheus alert --name "ConsulServiceCritical" --node "app01.example.com"                                                                                                     
  CRITICAL - 1 Alerts: 1 Firing - 0 Pending - 0 Inactive                                                                                                                                     
   \_[CRITICAL] [ConsulServiceCritical] node=app01.example.com is firing                                                                                                                 

Use Case

In our Icinga configuration, we want to assign node-specific alerts to their corresponding hosts:

apply Service "ConsulServiceCritical" {                                                                                                                                                
    import "generic-service"                                                                                                                                                                 
    check_command = "check_prometheus_alert"                                                                                                                                                 
    vars.alertname = "ConsulServiceCritical"                                                                                                                                             
    vars.node = host.name                        # <-- filter by this host                                                                                                                   
    vars.no_alerts_state = "OK"                                                                                                                                                              
    assign where match(host.name, "*")                                                                                                                                                  
  }                                                                                                                                                                                          

Without node filtering:

  • app01 shows 47 critical alerts (all nodes)
  • app02 shows 47 critical alerts (all nodes)
  • Operators can't tell which host actually has the problem

With node filtering:

  • app01 shows 1 alert (only its own)
  • app02 shows 0 alerts (OK state)
  • Clear visibility into which specific host is affected

Let me know your thoughts on this.
Thanks and Regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    duplicateThis issue or pull request already existsfeatureNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions