1. Introduction:
- Prometheus is an open-source systems monitoring and alerting toolkit, particularly well-suited for monitoring dynamic, cloud-native environments such as Kubernetes. It uses a pull-based model to scrape metrics from configured endpoints.
2. Key Concepts:
- Metrics: Data points collected over time, usually in the form of time series.
- PromQL: Prometheus Query Language used to query the collected metrics.
- Exporters: Components that expose metrics in a format that Prometheus can scrape.
- Alertmanager: Manages alerts generated by Prometheus.
3. Installation:
-
Running Prometheus:
wget https://github.com/prometheus/prometheus/releases/download/v2.30.0/prometheus-2.30.0.linux-amd64.tar.gz tar xvfz prometheus-*.tar.gz cd prometheus-* ./prometheus --config.file=prometheus.yml
-
Docker:
docker run -p 9090:9090 prom/prometheus
4. Prometheus Configuration:
-
Basic
prometheus.ymlConfiguration:global: scrape_interval: 15s scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090']
-
Adding Targets:
- job_name: 'node_exporter' static_configs: - targets: ['localhost:9100']
5. Prometheus Query Language (PromQL):
-
Basic Queries:
up rate(http_requests_total[5m]) -
Aggregations:
sum(rate(http_requests_total[5m])) avg_over_time(http_requests_total[5m]) -
Recording Rules:
groups: - name: example rules: - record: job:http_inprogress_requests:sum expr: sum(http_inprogress_requests) by (job)
6. Exporters:
-
Node Exporter: Collects system-level metrics.
wget https://github.com/prometheus/node_exporter/releases/download/v1.2.2/node_exporter-1.2.2.linux-amd64.tar.gz tar xvfz node_exporter-*.tar.gz ./node_exporter -
Custom Exporter: Writing a custom exporter using Python.
from prometheus_client import start_http_server, Gauge import random import time g = Gauge('random_number', 'A random number') def generate_random_number(): while True: g.set(random.random()) time.sleep(5) if __name__ == '__main__': start_http_server(8000) generate_random_number()
7. Alerts and Alertmanager:
-
Alerting Rules:
groups: - name: example rules: - alert: HighMemoryUsage expr: node_memory_Active_bytes / node_memory_MemTotal_bytes * 100 > 90 for: 5m labels: severity: critical annotations: summary: "High memory usage detected on {{ $labels.instance }}" description: "Memory usage is above 90% for more than 5 minutes."
-
Alertmanager Configuration:
global: resolve_timeout: 5m route: group_by: ['alertname'] receiver: 'email' receivers: - name: 'email' email_configs: - to: 'your-email@example.com' from: 'prometheus@example.com' smarthost: 'smtp.example.com:587' auth_username: 'username' auth_password: 'password'
8. Prometheus Federation:
-
Setting Up Federation:
scrape_configs: - job_name: 'federate' honor_labels: true metrics_path: '/federate' params: match[]: - '{job="prometheus"}' static_configs: - targets: - 'prometheus-server-1:9090' - 'prometheus-server-2:9090'
9. Monitoring Kubernetes with Prometheus:
-
Deploying Prometheus on Kubernetes:
apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: name: prometheus spec: replicas: 1 serviceAccountName: prometheus serviceMonitorSelector: matchLabels: team: frontend resources: requests: memory: 400Mi storage: volumeClaimTemplate: spec: storageClassName: standard resources: requests: storage: 50Gi
-
ServiceMonitor Example:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: example-monitor spec: selector: matchLabels: app: example endpoints: - port: web
10. Advanced Prometheus Concepts:
- Thanos: Extends Prometheus with long-term storage, global querying, and downsampling.
- Cortex: Multi-tenant, horizontally scalable Prometheus as a service.
11. Prometheus Security:
-
Basic Authentication:
basic_auth: username: admin password: admin
-
TLS/SSL Configuration:
tls_config: ca_file: /etc/prometheus/certs/ca.crt cert_file: /etc/prometheus/certs/prometheus.crt key_file: /etc/prometheus/certs/prometheus.key
12. Troubleshooting Prometheus:
-
Common Issues:
- High Cardinality Metrics: Too many unique time series can overwhelm Prometheus.
- Slow Queries: Optimize queries by avoiding high cardinality and using efficient aggregations.
-
Debugging:
- Use the
promtoolcommand-line tool to check configuration files. - Prometheus UI provides an interface to debug queries and examine time series data.
- Use the
