Skip to content

event-exporter-v2: shared-nothing HA re-architecture#1141

Closed
erain wants to merge 1 commit intoGoogleCloudPlatform:masterfrom
erain:event-exporter-v2-shared-nothing-ha
Closed

event-exporter-v2: shared-nothing HA re-architecture#1141
erain wants to merge 1 commit intoGoogleCloudPlatform:masterfrom
erain:event-exporter-v2-shared-nothing-ha

Conversation

@erain
Copy link
Copy Markdown
Contributor

@erain erain commented Mar 26, 2026

Summary

Re-architecture of the event-exporter as a new standalone module (event-exporter-v2/) for horizontal scalability and high availability. The original event-exporter/ is untouched.

  • Shared-nothing consistent hashing — each pod watches all events, only sends events it "owns" via xxhash(namespace/name) % N
  • Peer discovery via headless Service Endpoints informer — no leader election, no gRPC, no external dependencies
  • Deterministic InsertId for Cloud Logging deduplication during scaling transitions
  • Tuned sink defaults — buffer 500 (was 100), concurrency 25 (was 10), flush 2s (was 5s)
  • Exponential backoff with jitter replacing fixed 10s retry in Cloud Logging writer
  • HPA-compatible Deployment with readiness probe, PDB, and custom metrics
  • Fully backward compatible — same binary runs in single-pod mode when --headless-service-name is empty

Design

Architecture

  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
  │   Pod 0     │  │   Pod 1     │  │   Pod 2     │
  │             │  │             │  │             │
  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │
  │ │ Watcher │ │  │ │ Watcher │ │  │ │ Watcher │ │
  │ │(all evts)│ │  │ │(all evts)│ │  │ │(all evts)│ │
  │ └────┬────┘ │  │ └────┬────┘ │  │ └────┬────┘ │
  │      │      │  │      │      │  │      │      │
  │ hash%3==0?  │  │ hash%3==1?  │  │ hash%3==2?  │
  │   ┌──┴──┐   │  │   ┌──┴──┐   │  │   ┌──┴──┐   │
  │   │Sink │   │  │   │Sink │   │  │   │Sink │   │
  │   │~1/3 │   │  │   │~1/3 │   │  │   │~1/3 │   │
  │   └──┬──┘   │  │   └──┬──┘   │  │   └──┬──┘   │
  └──────┼──────┘  └──────┼──────┘  └──────┼──────┘
         │                │                │
         └────────────────┼────────────────┘
                          ▼
                 GCP Cloud Logging

How consistent hashing works

  1. Each pod discovers peers via a headless Service's Endpoints informer
  2. Peer IPs are sorted lexicographically — all pods produce the same ordering
  3. Each pod identifies itself by POD_IP (Downward API) and finds its index in the sorted list
  4. For each event: owner = xxhash(namespace/name) % len(peers) — if owner == myIndex, process it
  5. Grace period: if a pod isn't in Endpoints yet, it processes ALL events (prevents gaps)
  6. Deduplication: deterministic InsertId = namespace/name/resourceVersion handles brief overlaps during scaling

Why shared-nothing over leader-worker

Dimension Shared-Nothing (chosen) Leader-Worker
Complexity Low — no gRPC, no leader election High — gRPC service, election, role transitions
SPOF None — all pods are equal Leader is a bottleneck
HPA Natural — add more pods Requires separate scaling for leader vs workers
API server impact N watchers (negligible for N<10) 1 watcher (minimal)
Memory per pod Full event cache per pod Only leader needs caches

Scaling behavior

  • Scale up (2→3): new pod appears in Endpoints, all pods recompute N=3. Brief overlap window (<1s) produces duplicates deduplicated by InsertId
  • Scale down (3→2): departing pod drains buffers on SIGTERM. Remaining pods absorb orphaned partition
  • Pod crash: remaining pods see updated Endpoints, absorb crashed pod's partition. Resync (1min) recovers any in-flight events
  • Throughput: ~200-500 events/sec per pod. N pods = N × throughput

New files (vs original event-exporter)

File Purpose
peerdiscovery/discovery.go Endpoints informer, sorted peer list, IsOwner() hash check
peerdiscovery/discovery_test.go Tests: determinism, distribution, grace period, rebalancing
peerdiscovery/metrics.go event_exporter_peer_count, event_exporter_peer_updates_total
example/event-exporter-ha.yaml HA deployment: headless Service, HPA, PDB, readiness probe

Modified files (vs original event-exporter)

File Change
main.go --headless-service-name flag, PeerDiscovery init, /readyz endpoint
event_exporter.go PeerDiscovery as third concurrent goroutine
sinks/stackdriver/sink.go Hash filtering in OnAdd/OnUpdate (before serialization), queue depth reporting
sinks/stackdriver/sink_factory.go Exported type, CreateNewWithOwnerChecker method
sinks/stackdriver/sink_config.go Tuned defaults: buffer 500, concurrency 25, flush 2s
sinks/stackdriver/writer.go Exponential backoff (1s→60s, 2x, 20% jitter)
sinks/stackdriver/log_entry_factory.go Deterministic InsertId
sinks/stackdriver/metrics.go events_dropped_by_hash_total, events_owned_total, queue_depth

New Prometheus metrics

Metric Type Purpose
event_exporter_peer_count Gauge Current number of peer pods
event_exporter_peer_updates_total Counter Peer list update events
event_exporter_events_dropped_by_hash_total Counter Events skipped (owned by another pod)
event_exporter_events_owned_total Counter Events this pod processes
event_exporter_queue_depth Gauge Log entry channel depth (HPA metric)

Status

This is an initial working implementation. Known TODOs:

  • End-to-end tests with multi-pod deployment
  • Load testing at scale (>1000 events/sec)
  • Validate Cloud Logging InsertId deduplication behavior under rebalancing
  • Tune HPA thresholds based on real-world metrics
  • Update Makefile/Dockerfile for v2 build paths
  • Consider EndpointSlice API as future replacement for Endpoints

Test plan

  • Unit tests pass: go test ./... (peerdiscovery, sinks, watchers, podlabels)
  • go vet ./... clean
  • go build ./... succeeds
  • Original event-exporter/ untouched and builds/tests independently
  • Deploy in test GKE cluster with 3 replicas
  • Verify all events appear in Cloud Logging with no gaps
  • Scale up/down and verify no event loss
  • Kill a pod and verify partition recovery

🤖 Generated with Claude Code

…hing for HA

Add event-exporter-v2 as a new standalone module alongside the existing
event-exporter. This is a re-architecture of the event exporter for
higher performance, scalability, and high availability while maintaining
the same core functionality (exporting all cluster events to GCP Cloud
Logging) and low cluster-wide footprint.

Key changes from event-exporter:
- Shared-nothing consistent hashing for multi-pod horizontal scaling
- Peer discovery via headless Service Endpoints informer
- Hash-based event partitioning (xxhash modulo) across pods
- Deterministic InsertId for Cloud Logging deduplication during rebalancing
- Exponential backoff with jitter replacing fixed 10s retry
- Tuned defaults: buffer 500 (was 100), concurrency 25 (was 10), flush 2s (was 5s)
- New Prometheus metrics: queue_depth, peer_count, events_owned/dropped_by_hash
- Readiness probe (/readyz) gated on peer discovery sync
- HA deployment manifest with headless Service, HPA, and PDB
- Fully backward compatible: single-pod mode when --headless-service-name is empty

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@erain
Copy link
Copy Markdown
Contributor Author

erain commented Mar 26, 2026

Opened against wrong repo, re-creating against fork.

@erain erain closed this Mar 26, 2026
@google-cla
Copy link
Copy Markdown

google-cla bot commented Mar 26, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant