Change Data Capture and changefeed implementation for ThemisDB.
Implements Change Data Capture for ThemisDB, providing real-time change notifications via SSE streaming, filtered subscriptions, change log management, historical change replay, and CDC-driven incremental materialized view maintenance.
In scope: Changefeed engine, SSE event streaming, per-collection/per-key filtering, change log persistence, historical replay, subscription lifecycle management, cross-collection aggregated streams, CDC-based materialized view maintenance, WebSocket transport (WsTransport, cdc_ws_handler.cpp), Kafka producer transport (KafkaCDCProducer), ICDCTransport abstract interface, consumer group semantics (ConsumerGroupManager), at-least-once delivery (DeliveryTracker), dead-letter queue (DeadLetterQueue), transactional outbox (OutboxWriter, OutboxRelay), change stream compression, Debezium format support, schema registry integration, GDPR-aware change log redaction.
changefeed.cpp— core change capture enginechangefeed_buffer.cpp— per-tenant in-memory ring buffer for pending eventstenant_buffer_manager.cpp— per-tenant buffer lifecycle and quota enforcementws_transport.cpp— WebSocket transport (WsTransport, implementsICDCTransport)cdc_ws_handler.cpp— WebSocket HTTP handler wiring for CDC streamskafka_cdc_producer.cpp— Kafka transport backend (KafkaCDCProducer, opt-in viaTHEMIS_ENABLE_KAFKA)consumer_group.cpp— consumer group semantics with durable offset tracking (ConsumerGroupManager)delivery_tracker.cpp— at-least-once delivery with redelivery and acknowledgement (DeliveryTracker)dead_letter_queue.cpp— persistence of events that exhaust delivery retries (DeadLetterQueue)outbox.cpp— transactional outbox pattern for atomic CDC + application data publishing (OutboxWriter,OutboxRelay)cross_collection_stream.cpp— cross-collection change aggregation (CrossCollectionStream)cdc_materialized_view.cpp— CDC-driven incremental materialized view maintenance (CDCMaterializedViewMaintainer)cdc_admin.cpp— admin API for subscription and buffer management
Maturity: 🟢 Production — SSE-based changefeeds, filtered subscriptions, WebSocket transport, consumer groups, and Kafka producer integration operational.
- Changefeed engine (
changefeed.cpp) - Per-tenant in-memory ring buffer for pending events (
changefeed_buffer.cpp) - Per-tenant buffer lifecycle and quota enforcement with backpressure (
tenant_buffer_manager.cpp) - Server-Sent Events (SSE) streaming
- WebSocket-based change streaming (
ws_transport.cpp,cdc_ws_handler.cpp) - Kafka producer transport for enterprise CDC pipelines (
kafka_cdc_producer.cpp) - Consumer group semantics with durable offset tracking (
consumer_group.cpp) - At-least-once delivery tracker with acknowledgement and redelivery (
delivery_tracker.cpp) - Dead-letter queue for failed event deliveries (
dead_letter_queue.cpp) - Transactional outbox pattern for atomic CDC + application data publishing (
outbox.cpp) - Cross-collection change aggregation (
cross_collection_stream.cpp) - CDC-based incremental materialized view maintenance (
cdc_materialized_view.cpp) - Admin API for subscription and buffer management (
cdc_admin.cpp)
- Real-time change notifications
- SSE-based and WebSocket-based event streaming
- Filtered change subscriptions (collection, key prefix, operation type)
- Historical change replay from stored change log
- Consumer group semantics with durable offset tracking and partition assignment
- At-least-once delivery guarantees with consumer acknowledgement and redelivery — available for SSE connections (
GET /changefeed/stream?consumer_id=...) and Consumer Groups (/v2/cdc/stream) - Dead-letter queue for events that exhaust delivery retries
- Transactional outbox pattern for atomic CDC + application data publishing
- Cross-collection merged event streams with per-collection resume cursors
- CDC-driven incremental materialized view maintenance (GROUP BY aggregations updated in O(1) per change)
- Kafka-compatible producer interface for enterprise integration (Debezium envelope format supported)
- GDPR-aware change log redaction (PII field scrubbing)
- Change stream compression for high-volume feeds
For CDC documentation, see:
- Architecture Guide — component diagram, data flow, threading model
- Roadmap — feature status and planned work
- CDC Operations Runbook — production operations
- CDC Implementation Summary — implementation history
- Change Data Capture (DE) — end-user guide (German)
-
Stonebraker, M., Rowe, L. A., & Hirohama, M. (1990). The Implementation of Postgres. IEEE Transactions on Knowledge and Data Engineering, 2(1), 125–142. https://doi.org/10.1109/69.43410
-
Kleppmann, M. (2017). Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O'Reilly Media. ISBN: 978-1-449-37332-0
-
Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., & Schwarz, P. (1992). ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM Transactions on Database Systems, 17(1), 94–162. https://doi.org/10.1145/128765.128770
-
Flink Community. (2015). Apache Flink: Stream and Batch Processing in a Single Engine. IEEE Data Engineering Bulletin, 38(4), 28–38. http://sites.computer.org/debull/A15dec/p28.pdf