feat(storage): Write-Optimized Merge (WOM) Tree — v1.8.0#4204
feat(storage): Write-Optimized Merge (WOM) Tree — v1.8.0#4204
Conversation
…sue #260) Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new in-memory storage primitive, WomTree (Write-Optimized Merge Tree), intended as a Bε-tree-style write-optimized alternative to LSM-based structures, along with focused unit tests and a dedicated CI workflow.
Changes:
- Introduces
WomTreepublic API + implementation (put/get/remove/contains/scan/scanRange/compact/flushOnce, stats, config validation). - Adds focused GTest suite and CMake target for WOM tree.
- Adds path-scoped GitHub Actions workflow and updates storage roadmap/docs to mark WOM tree implemented.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
include/storage/wom_tree.h |
Public API, config + stats definitions, and module documentation. |
src/storage/wom_tree.cpp |
WOM tree core logic: buffering, flushing, leaf splits, scans, stats wiring. |
tests/test_wom_tree.cpp |
Focused unit tests covering core operations, stats, and concurrency scenarios. |
tests/CMakeLists.txt |
Adds test_wom_tree_focused executable and registers WomTreeFocusedTests. |
cmake/CMakeLists.txt |
Adds src/storage/wom_tree.cpp to core build sources. |
.github/workflows/wom-tree-ci.yml |
New WOM-tree-specific CI job to build/run focused tests across compilers/OSes. |
src/storage/ROADMAP.md |
Marks WOM Tree phase as completed in v1.8.0. |
src/storage/FUTURE_ENHANCEMENTS.md |
Updates WOM tree entry to “Implemented” and points to implementation files. |
You can also share your feedback on Copilot code review. Take the survey.
include/storage/wom_tree.h
Outdated
| * | ||
| * ## Thread safety | ||
| * | ||
| * All public methods are thread-safe via an internal reader-writer mutex. |
include/storage/wom_tree.h
Outdated
| * If true, deleted keys are tombstoned in-memory and purged | ||
| * asynchronously during the next flush/compact pass. | ||
| * If false, the delete takes effect immediately (higher read path | ||
| * cost for large buffers). |
| /** | ||
| * @brief Estimated write-amplification factor. | ||
| * | ||
| * Ratio of bytes written internally (buffer propagation + leaf | ||
| * writes) to bytes written by the user. A value close to 1.0 | ||
| * means minimal write overhead; the WOM tree targets 2–5× | ||
| * compared to 10–30× for typical LSM trees. | ||
| * | ||
| * Returns 0.0 if no user bytes have been written yet. | ||
| */ | ||
| double writeAmplification() const noexcept { | ||
| if (user_bytes_written == 0) return 0.0; | ||
| return static_cast<double>(internal_bytes_written) / | ||
| static_cast<double>(user_bytes_written); | ||
| } |
| void doInsertOp(Op op) { | ||
| stat_user_bytes.fetch_add(op.byteSize(), std::memory_order_relaxed); | ||
|
|
||
| if (root->is_leaf) { | ||
| // Single-leaf fast path. | ||
| if (op.type == OpType::PUT) { | ||
| bool was_present = (root->leafFind(op.key) != root->data.end()); | ||
| root->leafApply(op); | ||
| if (!was_present) { | ||
| stat_live_entries.fetch_add(1, std::memory_order_relaxed); | ||
| } | ||
| } else { | ||
| bool was_present = (root->leafFind(op.key) != root->data.end()); | ||
| root->leafApply(op); | ||
| if (was_present) { | ||
| stat_live_entries.fetch_sub(1, std::memory_order_relaxed); | ||
| } | ||
| } | ||
| maybeSplitRootLeaf(); | ||
| return; | ||
| } | ||
|
|
||
| // Append to root buffer. | ||
| root->buffer_bytes += op.byteSize(); | ||
| root->buffer.push_back(std::move(op)); | ||
|
|
||
| // Flush root buffer if it exceeds the threshold. | ||
| if (root->buffer_bytes > config.buffer_size_bytes) { | ||
| flushNode(*root, 1); | ||
| } | ||
|
|
||
| // Global buffer pressure check. | ||
| if (config.max_buffered_entries > 0) { | ||
| size_t total_buf = countBufferedEntries(*root); | ||
| if (total_buf > config.max_buffered_entries) { | ||
| flushAll(*root, 1); | ||
| } | ||
| } | ||
| } |
src/storage/wom_tree.cpp
Outdated
| parent.children.insert(parent.children.begin() + static_cast<ptrdiff_t>(ci + 1), | ||
| std::move(right)); | ||
|
|
||
| // If parent is now overfull, it will be handled by the parent's caller. |
| void WomTree::scan( | ||
| const std::function<bool(std::string_view, std::string_view)>& callback) const { | ||
| std::lock_guard<std::mutex> lk(impl_->mu); | ||
| std::map<std::string, std::string> materialized; | ||
| impl_->collectAllEntries(materialized); | ||
| for (const auto& [k, v] : materialized) { | ||
| if (!callback(k, v)) break; | ||
| } | ||
| } |
|
@copilot apply changes based on the comments in this thread |
Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
…ble-counting; add scan assertion Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
All 6 review comments are addressed in commits
Also fixed a heap-buffer-overflow in |
Implements
WomTree, a Bε-tree-style write-optimized structure targeting 2–5× write amplification (vs. 10–30× for LSM), with full review-comment resolution and ASAN-clean test suite.Description
Core (
include/storage/wom_tree.h+src/storage/wom_tree.cpp)Bε-tree design: internal nodes hold lazy write buffers; mutations flush downward only when
buffer_size_bytesis exceeded. Key config:buffer_size_bytes(64 KiB default),fanout(16),leaf_capacity(256),max_buffered_entries,lazy_deletes.Review fixes applied:
std::shared_mutex: shared lock forget/contains/stats/scan/scanRange; exclusive for mutations — concurrent readers no longer block each otherlazy_deletesimplemented:false→directRemove()descends to leaf and callsclearBufferedOpsForKey()on the path to prevent buffer resurrection after flush;true(default) → tombstone buffered lazilywriteAmplification()always ≥ 1.0: single-leaf fast path now counts bytes instat_internal_bytesviaapplyOpToLeaf; stat doc updatedsize()exact:doGet()checked before buffering each PUT; REMOVE always decrements (existence pre-verified);applyOpToLeafintentionally does not touchstat_live_entriesto avoid double-counting during flush propagationfanoutenforced:doOneInternalSplit+fixAllInternalOverflowsrecursively split overfull internal nodes after everydoInsertOp, including root growthscan()/scanRange()lock scope: snapshot under shared lock, invoke callback outside — deadlock-free when writers are called from within the callbackflushNodeOOB fix:child_ops(sized to original child count) was indexed past bounds aftermaybeSplitChildinserted right-leaf siblings; fixed withsplits_so_faroffsetTests (
tests/test_wom_tree.cpp— 40 tests,WomTreeFocusedTests)All 5 acceptance criteria covered plus dedicated tests for each review fix:
AC1_WriteAmplification_BelowLSMBaseline,WriteAmplification_AtLeastOneAfterSinglePutUpdateHeavy_ManyKeys_CorrectAfterOverwriteAC3_LazyFlush_*,AC3_Compact_ForcesAllBuffersToLeavesAC4_StatsExpose_InternalBufferBytesAC5_PointRead_MultiLevelTraversal_StillCorrectlazy_deletes=falseLazyDeletesFalse_*(3 tests)size()accuracySize_AccurateWithBufferedOps,Size_AccurateAfterBufferedRemoveFanoutEnforced_InternalNodesSplitCorrectly,FanoutEnforced_TreeHeightBoundedScan_CallbackInvokedOutsideLock_NoDeadlockConcurrentReaders_NoBlockingEachOtherAll 40 pass under AddressSanitizer + UBSanitizer, zero warnings.
Build & CI
cmake/CMakeLists.txt:wom_tree.cpp→THEMIS_CORE_SOURCEStests/CMakeLists.txt:test_wom_tree_focusedtarget.github/workflows/wom-tree-ci.yml: path-scoped CI on GCC-12/Clang-15/GCC-13 using standard composite actionsDocs
src/storage/ROADMAP.md: WOM Tree marked[x]src/storage/FUTURE_ENHANCEMENTS.md: status → ✅ Implemented (v1.8.0)Type of Change
Testing
📚 Research & Knowledge (wenn applicable)
/docs/research/angelegt?/docs/research/implementation_influence/eingetragen?Relevante Quellen:
Checklist
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.