Skip to content

Latest commit

 

History

History
304 lines (233 loc) · 7.19 KB

File metadata and controls

304 lines (233 loc) · 7.19 KB

Performance Profiling Implementation

Overview

This document describes the performance profiling instrumentation added to FloatWM. The profiling system provides comprehensive monitoring of window manager performance including frame timing, event processing latency, memory usage, and CPU utilization.

Features

1. Frame Timing

  • Tracks each event loop iteration as a "frame"
  • Calculates FPS (frames per second) in real-time
  • Records min/max/average frame times
  • Provides p50, p95, p99 percentiles for frame time distribution

2. Event Processing Latency

  • Measures time from event receipt to dispatch completion
  • Tracks latency per event type (KeyPress, MotionNotify, etc.)
  • Provides statistics on total events processed
  • Identifies slow event handlers

3. Memory Usage Tracking

  • Monitors RSS (Resident Set Size) memory usage
  • Tracks VMS (Virtual Memory Size)
  • Records peak memory usage
  • Calculates memory usage trends

4. CPU Utilization

  • Measures current CPU percentage
  • Tracks average and peak CPU usage
  • Monitors context switches (voluntary/involuntary)
  • Tracks total CPU time consumed

5. Flamegraph Generation (Optional)

  • Generates hierarchical call traces
  • Creates JSON output for visualization
  • Tracks scope entry/exit times
  • Shows call stack depth

Implementation Details

Architecture

The profiling system consists of:

  1. profiling.rs - Core profiling module

    • PerformanceProfiler: Main profiler struct
    • ProfilingConfig: Configuration options
    • PerformanceMetrics: Collected metrics data structures
    • FlameGraphNode: Flame graph data structure
  2. event.rs Integration - Event loop integration

    • Frame timing measurement in event loop
    • Event latency tracking in dispatch
    • Periodic memory/CPU sampling
    • Automatic export on shutdown

Configuration Options

pub struct ProfilingConfig {
    /// Enable profiling
    pub enabled: bool,

    /// Sample interval for metrics collection (in milliseconds)
    pub sample_interval_ms: u64,

    /// Maximum number of samples to keep in memory
    pub max_samples: usize,

    /// Enable flamegraph generation
    pub enable_flamegraph: bool,

    /// Output directory for profiling data
    pub output_dir: String,

    /// Enable frame timing tracking
    pub track_frame_timing: bool,

    /// Enable event latency tracking
    pub track_event_latency: bool,

    /// Enable memory usage tracking
    pub track_memory_usage: bool,

    /// Enable CPU utilization tracking
    pub track_cpu_usage: bool,
}

Usage Example

use floatwm::profiling::{PerformanceProfiler, ProfilingConfig};

// Create profiler with custom configuration
let config = ProfilingConfig {
    enabled: true,
    enable_flamegraph: true,
    output_dir: "/tmp/floatwm_profiling".to_string(),
    ..Default::default()
};

let mut profiler = PerformanceProfiler::new(config);

// Set profiler on event loop
event_loop.set_profiler(profiler);

// The profiler automatically collects metrics during event loop execution
// On shutdown, metrics are exported to JSON files

Programmatic Profiling

For custom profiling scopes:

// Enter a profiling scope
profiler.enter_scope("my_function");

// ... do work ...

// Exit the scope
profiler.exit_scope();

Record custom metrics:

// Record frame time
profiler.record_frame_time(Duration::from_millis(16));

// Record event latency
profiler.record_event_latency("CustomEvent", Duration::from_micros(100));

// Sample memory usage
profiler.sample_memory_usage()?;

// Sample CPU usage
profiler.sample_cpu_usage()?;

Export data:

// Export metrics to JSON
profiler.export_metrics("/tmp/metrics.json")?;

// Generate flamegraph
profiler.generate_flamegraph("/tmp/flamegraph.json")?;

Output Format

Metrics JSON

{
  "frame_timing": {
    "frame_count": 1000,
    "avg_frame_time_us": 16666.67,
    "min_frame_time_us": 15000.0,
    "max_frame_time_us": 25000.0,
    "current_fps": 60.0,
    "avg_fps": 59.8,
    "percentiles": {
      "p50_us": 16500.0,
      "p95_us": 18000.0,
      "p99_us": 22000.0
    }
  },
  "event_latency": {
    "total_events": 5000,
    "avg_latency_us": 150.5,
    "min_latency_us": 50.0,
    "max_latency_us": 5000.0,
    "by_event_type": {
      "KeyPress": {
        "count": 100,
        "avg_us": 120.0,
        "min_us": 80.0,
        "max_us": 200.0
      }
    }
  },
  "memory_usage": {
    "rss_bytes": 52428800,
    "peak_rss_bytes": 67108864,
    "vms_bytes": 104857600,
    "page_faults": 0,
    "memory_trend": 1024.0
  },
  "cpu_usage": {
    "current_percent": 2.5,
    "avg_percent": 2.1,
    "peak_percent": 8.3,
    "total_cpu_time": 12.5,
    "voluntary_context_switches": 1000,
    "involuntary_context_switches": 10
  }
}

Flamegraph JSON

[
  {
    "name": "event_loop_iteration",
    "start_us": 0,
    "duration_us": 16667,
    "depth": 0,
    "children": [
      {
        "name": "event_KeyPress",
        "start_us": 1000,
        "duration_us": 150,
        "depth": 1,
        "children": []
      }
    ]
  }
]

Performance Considerations

Overhead

The profiling system is designed to have minimal overhead:

  • When disabled: No performance impact (configurable)
  • When enabled:
    • Frame timing: ~1-2 microseconds per frame
    • Event latency: ~0.5 microseconds per event
    • Memory sampling: ~1 millisecond per sample (via /proc filesystem)
    • CPU sampling: ~1 millisecond per sample (via /proc filesystem)

Sampling Strategy

  • Frame timing: Every frame
  • Event latency: Every event
  • Memory/CPU: Every 100 frames (configurable)
  • Summary printing: Every 1000 frames (configurable)

Files Modified

  1. src/profiling.rs (NEW)

    • Core profiling implementation
    • ~1000 lines of code
  2. src/event.rs (MODIFIED)

    • Integrated profiler into event loop
    • Added frame timing and event latency tracking
    • Added profiler accessor methods
  3. src/lib.rs (MODIFIED)

    • Added profiling module exports

Testing

The profiling system includes unit tests for:

  • Profiler creation and configuration
  • Frame timing recording
  • Event latency tracking
  • Flamegraph scope management
  • Metrics reset functionality

Future Enhancements

Possible future improvements:

  1. Real-time Visualization

    • Built-in web server for live metrics
    • WebSocket streaming of metrics
    • Interactive flamegraph viewer
  2. Advanced Analysis

    • Bottleneck detection
    • Anomaly detection
    • Performance regression alerts
  3. Integration

    • Integration with external profilers (perf, flamegraph.rs)
    • Support for Chrome tracing format
    • Export to Prometheus metrics
  4. Configurable Triggers

    • Automatic profiling on high load
    • Trigger profiling from IPC commands
    • Time-based profiling windows

Configuration via Environment Variables

Future support for environment variable configuration:

  • FLOATWM_PROFILING_ENABLED: Enable/disable profiling
  • FLOATWM_PROFILING_OUTPUT_DIR: Output directory for profiling data
  • FLOATWM_PROFILING_FLAMEGRAPH: Enable flamegraph generation
  • FLOATWM_PROFILING_SAMPLE_INTERVAL: Sampling interval in milliseconds

License

This profiling system is part of FloatWM and follows the same license (MIT OR Apache-2.0).