Skip to content

Latest commit

 

History

History
240 lines (184 loc) · 5.25 KB

File metadata and controls

240 lines (184 loc) · 5.25 KB

Prometheus Cheatsheet

text

1. Introduction:

  • Prometheus is an open-source systems monitoring and alerting toolkit, particularly well-suited for monitoring dynamic, cloud-native environments such as Kubernetes. It uses a pull-based model to scrape metrics from configured endpoints.

2. Key Concepts:

  • Metrics: Data points collected over time, usually in the form of time series.
  • PromQL: Prometheus Query Language used to query the collected metrics.
  • Exporters: Components that expose metrics in a format that Prometheus can scrape.
  • Alertmanager: Manages alerts generated by Prometheus.

3. Installation:

  • Running Prometheus:

    wget https://github.com/prometheus/prometheus/releases/download/v2.30.0/prometheus-2.30.0.linux-amd64.tar.gz
    tar xvfz prometheus-*.tar.gz
    cd prometheus-*
    ./prometheus --config.file=prometheus.yml
  • Docker:

    docker run -p 9090:9090 prom/prometheus

4. Prometheus Configuration:

  • Basic prometheus.yml Configuration:

    global:
      scrape_interval: 15s
    
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
          - targets: ['localhost:9090']
  • Adding Targets:

    - job_name: 'node_exporter'
      static_configs:
        - targets: ['localhost:9100']

5. Prometheus Query Language (PromQL):

  • Basic Queries:

    up
    rate(http_requests_total[5m])
    
  • Aggregations:

    sum(rate(http_requests_total[5m]))
    avg_over_time(http_requests_total[5m])
    
  • Recording Rules:

    groups:
    - name: example
      rules:
      - record: job:http_inprogress_requests:sum
        expr: sum(http_inprogress_requests) by (job)

6. Exporters:

  • Node Exporter: Collects system-level metrics.

    wget https://github.com/prometheus/node_exporter/releases/download/v1.2.2/node_exporter-1.2.2.linux-amd64.tar.gz
    tar xvfz node_exporter-*.tar.gz
    ./node_exporter
  • Custom Exporter: Writing a custom exporter using Python.

    from prometheus_client import start_http_server, Gauge
    import random
    import time
    
    g = Gauge('random_number', 'A random number')
    
    def generate_random_number():
        while True:
            g.set(random.random())
            time.sleep(5)
    
    if __name__ == '__main__':
        start_http_server(8000)
        generate_random_number()

7. Alerts and Alertmanager:

  • Alerting Rules:

    groups:
    - name: example
      rules:
      - alert: HighMemoryUsage
        expr: node_memory_Active_bytes / node_memory_MemTotal_bytes * 100 > 90
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High memory usage detected on {{ $labels.instance }}"
          description: "Memory usage is above 90% for more than 5 minutes."
  • Alertmanager Configuration:

    global:
      resolve_timeout: 5m
    
    route:
      group_by: ['alertname']
      receiver: 'email'
    
    receivers:
    - name: 'email'
      email_configs:
      - to: 'your-email@example.com'
        from: 'prometheus@example.com'
        smarthost: 'smtp.example.com:587'
        auth_username: 'username'
        auth_password: 'password'

8. Prometheus Federation:

  • Setting Up Federation:

    scrape_configs:
    - job_name: 'federate'
      honor_labels: true
      metrics_path: '/federate'
      params:
        match[]:
          - '{job="prometheus"}'
      static_configs:
        - targets:
          - 'prometheus-server-1:9090'
          - 'prometheus-server-2:9090'

9. Monitoring Kubernetes with Prometheus:

  • Deploying Prometheus on Kubernetes:

    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
      name: prometheus
    spec:
      replicas: 1
      serviceAccountName: prometheus
      serviceMonitorSelector:
        matchLabels:
          team: frontend
      resources:
        requests:
          memory: 400Mi
      storage:
        volumeClaimTemplate:
          spec:
            storageClassName: standard
            resources:
              requests:
                storage: 50Gi
  • ServiceMonitor Example:

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: example-monitor
    spec:
      selector:
        matchLabels:
          app: example
      endpoints:
        - port: web

10. Advanced Prometheus Concepts:

  • Thanos: Extends Prometheus with long-term storage, global querying, and downsampling.
  • Cortex: Multi-tenant, horizontally scalable Prometheus as a service.

11. Prometheus Security:

  • Basic Authentication:

    basic_auth:
      username: admin
      password: admin
  • TLS/SSL Configuration:

    tls_config:
      ca_file: /etc/prometheus/certs/ca.crt
      cert_file: /etc/prometheus/certs/prometheus.crt
      key_file: /etc/prometheus/certs/prometheus.key

12. Troubleshooting Prometheus:

  • Common Issues:

    • High Cardinality Metrics: Too many unique time series can overwhelm Prometheus.
    • Slow Queries: Optimize queries by avoiding high cardinality and using efficient aggregations.
  • Debugging:

    • Use the promtool command-line tool to check configuration files.
    • Prometheus UI provides an interface to debug queries and examine time series data.