Problem or Use Case
The ConfigCache already tracks hits and misses internally via CacheStats, but these metrics are only available programmatically through get_cache_stats(). There is no integration with CloudWatch, so operators cannot:
Monitor cache hit rates in production dashboards
Measure cold-cache vs warm-cache latency (p50/p95/p99)
Set alarms on cache degradation (e.g., hit rate drops below threshold)
Validate the effectiveness of the BatchGetItem optimization (⚡ BatchGetItem optimization for config resolution #298 ) in production
Split from #298 to keep the BatchGetItem optimization focused on DynamoDB access patterns while this issue addresses observability.
Proposed Solution
Custom CloudWatch Metrics
Publish metrics from ConfigCache to CloudWatch using the existing boto3/aioboto3 clients:
Metric Name
Unit
Description
ConfigCache/HitRate
Percent
hits / (hits + misses) over reporting period
ConfigCache/Hits
Count
Cache hits since last publish
ConfigCache/Misses
Count
Cache misses since last publish
ConfigCache/ColdCacheLatency
Milliseconds
Latency on cache miss (includes DynamoDB round trip)
ConfigCache/WarmCacheLatency
Milliseconds
Latency on cache hit (in-memory lookup)
ConfigCache/Size
Count
Number of cached entries
Dimensions: StackName (required), Resource (optional, for per-resource breakdown)
Integration Points
CacheStats already tracks hits/misses - extend with latency tracking (start/stop timers around fetch_fn calls)
Periodic publishing - batch metrics and publish at configurable intervals (e.g., every 60s) to avoid per-request CloudWatch API calls
Opt-in - disabled by default, enabled via enable_metrics=True on RateLimiter or ConfigCache
Latency Tracking
# On cache miss: measure fetch_fn latency
start = time .monotonic ()
value = await fetch_fn ()
elapsed_ms = (time .monotonic () - start ) * 1000
self ._cold_latencies .append (elapsed_ms )
# On cache hit: measure lookup latency
start = time .monotonic ()
value = entry .value # in-memory
elapsed_ms = (time .monotonic () - start ) * 1000
self ._warm_latencies .append (elapsed_ms )
Acceptance Criteria
Alternatives Considered
EMF (Embedded Metrics Format) via Lambda Powertools : Only works inside Lambda. ConfigCache runs in the application process, not the aggregator Lambda.
Expose Prometheus endpoint : Adds a dependency and requires a metrics scraper. CloudWatch is already available in the AWS environment.
Log-based metrics (CloudWatch Logs Insights) : Requires structured logging and post-hoc queries. Custom metrics provide real-time dashboards and alarms.
Dependencies
Problem or Use Case
The
ConfigCachealready trackshitsandmissesinternally viaCacheStats, but these metrics are only available programmatically throughget_cache_stats(). There is no integration with CloudWatch, so operators cannot:Split from #298 to keep the BatchGetItem optimization focused on DynamoDB access patterns while this issue addresses observability.
Proposed Solution
Custom CloudWatch Metrics
Publish metrics from
ConfigCacheto CloudWatch using the existing boto3/aioboto3 clients:ConfigCache/HitRatehits / (hits + misses)over reporting periodConfigCache/HitsConfigCache/MissesConfigCache/ColdCacheLatencyConfigCache/WarmCacheLatencyConfigCache/SizeDimensions:
StackName(required),Resource(optional, for per-resource breakdown)Integration Points
enable_metrics=TrueonRateLimiterorConfigCacheLatency Tracking
Acceptance Criteria
CacheStatsextended withcold_latency_msandwarm_latency_mslists for percentile calculationConfigCacheMetricsclass (or equivalent) publishes custom metrics to CloudWatch namespaceZaeLimiter/ConfigCacheHitRate,Hits,Misses,ColdCacheLatency,WarmCacheLatency, andSizeStackNamedimension; latency metrics support optionalResourcedimensionRateLimiterconstructor parametertests/unit/verify metric values matchCacheStatscountersgenerate_sync.pyif new async source files are addedAlternatives Considered
EMF (Embedded Metrics Format) via Lambda Powertools: Only works inside Lambda. ConfigCache runs in the application process, not the aggregator Lambda.
Expose Prometheus endpoint: Adds a dependency and requires a metrics scraper. CloudWatch is already available in the AWS environment.
Log-based metrics (CloudWatch Logs Insights): Requires structured logging and post-hoc queries. Custom metrics provide real-time dashboards and alarms.
Dependencies