event-exporter-v2: shared-nothing HA re-architecture#1141
Closed
erain wants to merge 1 commit intoGoogleCloudPlatform:masterfrom
Closed
event-exporter-v2: shared-nothing HA re-architecture#1141erain wants to merge 1 commit intoGoogleCloudPlatform:masterfrom
erain wants to merge 1 commit intoGoogleCloudPlatform:masterfrom
Conversation
…hing for HA Add event-exporter-v2 as a new standalone module alongside the existing event-exporter. This is a re-architecture of the event exporter for higher performance, scalability, and high availability while maintaining the same core functionality (exporting all cluster events to GCP Cloud Logging) and low cluster-wide footprint. Key changes from event-exporter: - Shared-nothing consistent hashing for multi-pod horizontal scaling - Peer discovery via headless Service Endpoints informer - Hash-based event partitioning (xxhash modulo) across pods - Deterministic InsertId for Cloud Logging deduplication during rebalancing - Exponential backoff with jitter replacing fixed 10s retry - Tuned defaults: buffer 500 (was 100), concurrency 25 (was 10), flush 2s (was 5s) - New Prometheus metrics: queue_depth, peer_count, events_owned/dropped_by_hash - Readiness probe (/readyz) gated on peer discovery sync - HA deployment manifest with headless Service, HPA, and PDB - Fully backward compatible: single-pod mode when --headless-service-name is empty Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Opened against wrong repo, re-creating against fork. |
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Re-architecture of the event-exporter as a new standalone module (
event-exporter-v2/) for horizontal scalability and high availability. The originalevent-exporter/is untouched.xxhash(namespace/name) % NInsertIdfor Cloud Logging deduplication during scaling transitions--headless-service-nameis emptyDesign
Architecture
How consistent hashing works
POD_IP(Downward API) and finds its index in the sorted listowner = xxhash(namespace/name) % len(peers)— ifowner == myIndex, process itInsertId = namespace/name/resourceVersionhandles brief overlaps during scalingWhy shared-nothing over leader-worker
Scaling behavior
InsertIdNew files (vs original event-exporter)
peerdiscovery/discovery.goIsOwner()hash checkpeerdiscovery/discovery_test.gopeerdiscovery/metrics.goevent_exporter_peer_count,event_exporter_peer_updates_totalexample/event-exporter-ha.yamlModified files (vs original event-exporter)
main.go--headless-service-nameflag, PeerDiscovery init,/readyzendpointevent_exporter.gosinks/stackdriver/sink.goOnAdd/OnUpdate(before serialization), queue depth reportingsinks/stackdriver/sink_factory.goCreateNewWithOwnerCheckermethodsinks/stackdriver/sink_config.gosinks/stackdriver/writer.gosinks/stackdriver/log_entry_factory.goInsertIdsinks/stackdriver/metrics.goevents_dropped_by_hash_total,events_owned_total,queue_depthNew Prometheus metrics
event_exporter_peer_countevent_exporter_peer_updates_totalevent_exporter_events_dropped_by_hash_totalevent_exporter_events_owned_totalevent_exporter_queue_depthStatus
This is an initial working implementation. Known TODOs:
InsertIddeduplication behavior under rebalancingTest plan
go test ./...(peerdiscovery, sinks, watchers, podlabels)go vet ./...cleango build ./...succeedsevent-exporter/untouched and builds/tests independently🤖 Generated with Claude Code