✨ CloudWatch metrics for ConfigCache (hit rate, cold/warm latency)

## Problem or Use Case

The `ConfigCache` already tracks `hits` and `misses` internally via `CacheStats`, but these metrics are only available programmatically through `get_cache_stats()`. There is no integration with CloudWatch, so operators cannot:

- Monitor cache hit rates in production dashboards
- Measure cold-cache vs warm-cache latency (p50/p95/p99)
- Set alarms on cache degradation (e.g., hit rate drops below threshold)
- Validate the effectiveness of the BatchGetItem optimization (#298) in production

Split from #298 to keep the BatchGetItem optimization focused on DynamoDB access patterns while this issue addresses observability.

## Proposed Solution

### Custom CloudWatch Metrics

Publish metrics from `ConfigCache` to CloudWatch using the existing boto3/aioboto3 clients:

| Metric Name | Unit | Description |
|-------------|------|-------------|
| `ConfigCache/HitRate` | Percent | `hits / (hits + misses)` over reporting period |
| `ConfigCache/Hits` | Count | Cache hits since last publish |
| `ConfigCache/Misses` | Count | Cache misses since last publish |
| `ConfigCache/ColdCacheLatency` | Milliseconds | Latency on cache miss (includes DynamoDB round trip) |
| `ConfigCache/WarmCacheLatency` | Milliseconds | Latency on cache hit (in-memory lookup) |
| `ConfigCache/Size` | Count | Number of cached entries |

**Dimensions:** `StackName` (required), `Resource` (optional, for per-resource breakdown)

### Integration Points

1. **CacheStats already tracks hits/misses** - extend with latency tracking (start/stop timers around fetch_fn calls)
2. **Periodic publishing** - batch metrics and publish at configurable intervals (e.g., every 60s) to avoid per-request CloudWatch API calls
3. **Opt-in** - disabled by default, enabled via `enable_metrics=True` on `RateLimiter` or `ConfigCache`

### Latency Tracking

```python
# On cache miss: measure fetch_fn latency
start = time.monotonic()
value = await fetch_fn()
elapsed_ms = (time.monotonic() - start) * 1000
self._cold_latencies.append(elapsed_ms)

# On cache hit: measure lookup latency
start = time.monotonic()
value = entry.value  # in-memory
elapsed_ms = (time.monotonic() - start) * 1000
self._warm_latencies.append(elapsed_ms)
```

## Acceptance Criteria

- [ ] `CacheStats` extended with `cold_latency_ms` and `warm_latency_ms` lists for percentile calculation
- [ ] New `ConfigCacheMetrics` class (or equivalent) publishes custom metrics to CloudWatch namespace `ZaeLimiter/ConfigCache`
- [ ] Metrics include `HitRate`, `Hits`, `Misses`, `ColdCacheLatency`, `WarmCacheLatency`, and `Size`
- [ ] All metrics tagged with `StackName` dimension; latency metrics support optional `Resource` dimension
- [ ] Metrics publishing is opt-in (disabled by default), enabled via a `RateLimiter` constructor parameter
- [ ] Metrics are batched and published periodically (not per-request) to minimize CloudWatch API costs
- [ ] Unit tests in `tests/unit/` verify metric values match `CacheStats` counters
- [ ] Unit tests verify latency is recorded on cache hit and cache miss code paths
- [ ] Sync variant generated via `generate_sync.py` if new async source files are added

## Alternatives Considered

1. **EMF (Embedded Metrics Format) via Lambda Powertools**: Only works inside Lambda. ConfigCache runs in the application process, not the aggregator Lambda.

2. **Expose Prometheus endpoint**: Adds a dependency and requires a metrics scraper. CloudWatch is already available in the AWS environment.

3. **Log-based metrics (CloudWatch Logs Insights)**: Requires structured logging and post-hoc queries. Custom metrics provide real-time dashboards and alarms.

## Dependencies

- #298 - BatchGetItem optimization (cold-cache latency is what this issue measures)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ CloudWatch metrics for ConfigCache (hit rate, cold/warm latency) #322

Problem or Use Case

Proposed Solution

Custom CloudWatch Metrics

Integration Points

Latency Tracking

Acceptance Criteria

Alternatives Considered

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric Name	Unit	Description
`ConfigCache/HitRate`	Percent	`hits / (hits + misses)` over reporting period
`ConfigCache/Hits`	Count	Cache hits since last publish
`ConfigCache/Misses`	Count	Cache misses since last publish
`ConfigCache/ColdCacheLatency`	Milliseconds	Latency on cache miss (includes DynamoDB round trip)
`ConfigCache/WarmCacheLatency`	Milliseconds	Latency on cache hit (in-memory lookup)
`ConfigCache/Size`	Count	Number of cached entries

✨ CloudWatch metrics for ConfigCache (hit rate, cold/warm latency) #322

Description

Problem or Use Case

Proposed Solution

Custom CloudWatch Metrics

Integration Points

Latency Tracking

Acceptance Criteria

Alternatives Considered

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions