Welcome to the ThemisDB replication documentation. This directory serves as the central hub for all replication and high-availability (HA) documentation.
- replication-ha-guide.md - START HERE ⭐
- Complete guide to High Availability replication
- Deployment topologies (Active-Passive, Active-Active, Multi-DC)
- Configuration examples and best practices
- Monitoring, alerting, and operational procedures
- Performance tuning and troubleshooting
-
REPLICATION_IMPLEMENTATION_STATUS.md (German)
- Detailed implementation status (~85% complete)
- Component breakdown and file locations
- Write-path flow diagrams
- Build & test status
- Prometheus metrics reference
-
- RAID 1/10 replication readiness plan
- Current findings and implementation steps
- Acceptance criteria and next actions
- Integration status
ThemisDB's replication system is organized across two main modules:
- Location:
include/replication/,src/replication/ - Components:
ReplicationManager- Lifecycle and configuration managementMultiMasterReplicationManager- Multi-master coordination (seeinclude/replication/multi_master_replication.h)
- Responsibility: High-level replication strategies and orchestration
- Location:
include/sharding/,src/sharding/ - Components:
WALManager- Write-Ahead Log with LSN trackingWALShipper- Batch-based WAL shipping to replicasWALApplier- Idempotent WAL applicationReplicationCoordinator- Write concern enforcement (ONE/MAJORITY/ALL)ReplicaTopology- Shard-to-replica mapping (RAID 1/10/5/6)- Consensus modules (Raft, Gossip, Paxos)
HealthMonitor- Replica health tracking
- Responsibility: WAL-based replication mechanics, distributed consensus, topology management
Design Rationale: This separation allows the replication/ module to focus on business logic while sharding/ handles the complex distributed systems infrastructure needed for both replication and horizontal scaling.
✅ Completed:
- WAL-based replication with LSN tracking
- Write concern support (ONE/MAJORITY/ALL)
- RAID 1/10 topology support
- Idempotent WAL application
- Prometheus metrics integration
- HTTP and gRPC replication endpoints
- Leader election (Raft/Gossip/Paxos)
- Health monitoring and failure detection
🚧 In Progress:
- Multi-node endurance testing
- RAID 5/6 implementation
- Automatic failover enhancement
- ARCHITECTURE.md - System overview
- SECURITY.md - Security configuration
- MONITORING.md - Monitoring setup
- Disaster Recovery - DR procedures
- Start here: Read replication-ha-guide.md for a comprehensive overview
- Configuration: Check the configuration examples in the HA guide
- Implementation details: See REPLICATION_IMPLEMENTATION_STATUS.md for component-level details
- Deployment planning: Review replication_raid_plan.md for RAID configuration
Overall Progress: ~85% Complete
- WAL Manager, Shipper, Applier
- Replication Coordinator
- Replica Topology
- HTTP/gRPC endpoints
- Integration tests (8/8 passing)
- Prometheus metrics
- Multi-node endurance testing
- RAID 5/6 parity implementation
- Automatic failover enhancements
When updating replication documentation:
- Keep all three core documents in sync (HA guide, implementation status, RAID plan)
- Update cross-references when adding new documentation
- Follow the established structure for consistency
- Update this index when adding new replication docs
See LICENSE for licensing information.