Skip to content

Latest commit

 

History

History
84 lines (64 loc) · 5.94 KB

File metadata and controls

84 lines (64 loc) · 5.94 KB

Change Data Capture (CDC) Module

Change Data Capture and changefeed implementation for ThemisDB.

Module Purpose

Implements Change Data Capture for ThemisDB, providing real-time change notifications via SSE streaming, filtered subscriptions, change log management, historical change replay, and CDC-driven incremental materialized view maintenance.

Subsystem Scope

In scope: Changefeed engine, SSE event streaming, per-collection/per-key filtering, change log persistence, historical replay, subscription lifecycle management, cross-collection aggregated streams, CDC-based materialized view maintenance, WebSocket transport (WsTransport, cdc_ws_handler.cpp), Kafka producer transport (KafkaCDCProducer), ICDCTransport abstract interface, consumer group semantics (ConsumerGroupManager), at-least-once delivery (DeliveryTracker), dead-letter queue (DeadLetterQueue), transactional outbox (OutboxWriter, OutboxRelay), change stream compression, Debezium format support, schema registry integration, GDPR-aware change log redaction.

Relevant Interfaces

  • changefeed.cpp — core change capture engine
  • changefeed_buffer.cpp — per-tenant in-memory ring buffer for pending events
  • tenant_buffer_manager.cpp — per-tenant buffer lifecycle and quota enforcement
  • ws_transport.cpp — WebSocket transport (WsTransport, implements ICDCTransport)
  • cdc_ws_handler.cpp — WebSocket HTTP handler wiring for CDC streams
  • kafka_cdc_producer.cpp — Kafka transport backend (KafkaCDCProducer, opt-in via THEMIS_ENABLE_KAFKA)
  • consumer_group.cpp — consumer group semantics with durable offset tracking (ConsumerGroupManager)
  • delivery_tracker.cpp — at-least-once delivery with redelivery and acknowledgement (DeliveryTracker)
  • dead_letter_queue.cpp — persistence of events that exhaust delivery retries (DeadLetterQueue)
  • outbox.cpp — transactional outbox pattern for atomic CDC + application data publishing (OutboxWriter, OutboxRelay)
  • cross_collection_stream.cpp — cross-collection change aggregation (CrossCollectionStream)
  • cdc_materialized_view.cpp — CDC-driven incremental materialized view maintenance (CDCMaterializedViewMaintainer)
  • cdc_admin.cpp — admin API for subscription and buffer management

Current Delivery Status

Maturity: 🟢 Production — SSE-based changefeeds, filtered subscriptions, WebSocket transport, consumer groups, and Kafka producer integration operational.

Components

  • Changefeed engine (changefeed.cpp)
  • Per-tenant in-memory ring buffer for pending events (changefeed_buffer.cpp)
  • Per-tenant buffer lifecycle and quota enforcement with backpressure (tenant_buffer_manager.cpp)
  • Server-Sent Events (SSE) streaming
  • WebSocket-based change streaming (ws_transport.cpp, cdc_ws_handler.cpp)
  • Kafka producer transport for enterprise CDC pipelines (kafka_cdc_producer.cpp)
  • Consumer group semantics with durable offset tracking (consumer_group.cpp)
  • At-least-once delivery tracker with acknowledgement and redelivery (delivery_tracker.cpp)
  • Dead-letter queue for failed event deliveries (dead_letter_queue.cpp)
  • Transactional outbox pattern for atomic CDC + application data publishing (outbox.cpp)
  • Cross-collection change aggregation (cross_collection_stream.cpp)
  • CDC-based incremental materialized view maintenance (cdc_materialized_view.cpp)
  • Admin API for subscription and buffer management (cdc_admin.cpp)

Features

  • Real-time change notifications
  • SSE-based and WebSocket-based event streaming
  • Filtered change subscriptions (collection, key prefix, operation type)
  • Historical change replay from stored change log
  • Consumer group semantics with durable offset tracking and partition assignment
  • At-least-once delivery guarantees with consumer acknowledgement and redelivery — available for SSE connections (GET /changefeed/stream?consumer_id=...) and Consumer Groups (/v2/cdc/stream)
  • Dead-letter queue for events that exhaust delivery retries
  • Transactional outbox pattern for atomic CDC + application data publishing
  • Cross-collection merged event streams with per-collection resume cursors
  • CDC-driven incremental materialized view maintenance (GROUP BY aggregations updated in O(1) per change)
  • Kafka-compatible producer interface for enterprise integration (Debezium envelope format supported)
  • GDPR-aware change log redaction (PII field scrubbing)
  • Change stream compression for high-volume feeds

Documentation

For CDC documentation, see:

Scientific References

  1. Stonebraker, M., Rowe, L. A., & Hirohama, M. (1990). The Implementation of Postgres. IEEE Transactions on Knowledge and Data Engineering, 2(1), 125–142. https://doi.org/10.1109/69.43410

  2. Kleppmann, M. (2017). Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O'Reilly Media. ISBN: 978-1-449-37332-0

  3. Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., & Schwarz, P. (1992). ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM Transactions on Database Systems, 17(1), 94–162. https://doi.org/10.1145/128765.128770

  4. Flink Community. (2015). Apache Flink: Stream and Batch Processing in a Single Engine. IEEE Data Engineering Bulletin, 38(4), 28–38. http://sites.computer.org/debull/A15dec/p28.pdf