feat: support rangebitmap read and write#185
feat: support rangebitmap read and write#185fafacao86 wants to merge 13 commits intoalibaba:mainfrom
Conversation
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Adds RangeBitmap file index support (read/write) and validates it with new unit/integration tests plus embedded test datasets to close #146.
Changes:
- Implement RangeBitmap file index reader/writer and factory registration.
- Add comprehensive UTs for RangeBitmap behavior across types and edge-cases, plus IT coverage using PaIOn-generated datasets.
- Add ORC/Parquet test datasets (single-chunk and multi-chunk) with range-bitmap index metadata.
Reviewed changes
Copilot reviewed 31 out of 51 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/snapshot-1 | Adds Parquet multi-chunk snapshot metadata for ITs. |
| test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/LATEST | Adds Parquet multi-chunk latest snapshot pointer. |
| test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/EARLIEST | Adds Parquet multi-chunk earliest snapshot pointer. |
| test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/schema/schema-0 | Adds schema/options enabling range-bitmap with small chunk size for multi-chunk behavior. |
| test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/README | Documents Parquet multi-chunk dataset rows and index config. |
| test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/snapshot-1 | Adds Parquet single-chunk snapshot metadata for ITs. |
| test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/LATEST | Adds Parquet single-chunk latest snapshot pointer. |
| test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/EARLIEST | Adds Parquet single-chunk earliest snapshot pointer. |
| test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/schema/schema-0 | Adds schema/options enabling range-bitmap for single-chunk case. |
| test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/README | Documents Parquet single-chunk dataset rows and index config. |
| test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/snapshot-1 | Adds ORC multi-chunk snapshot metadata for ITs. |
| test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/LATEST | Adds ORC multi-chunk latest snapshot pointer. |
| test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/EARLIEST | Adds ORC multi-chunk earliest snapshot pointer. |
| test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/schema/schema-0 | Adds schema/options enabling range-bitmap + ORC format + small chunk size for multi-chunk behavior. |
| test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/README | Documents ORC multi-chunk dataset rows and index config. |
| test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/snapshot-1 | Adds ORC single-chunk snapshot metadata for ITs. |
| test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/LATEST | Adds ORC single-chunk latest snapshot pointer. |
| test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/EARLIEST | Adds ORC single-chunk earliest snapshot pointer. |
| test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/schema/schema-0 | Adds schema/options enabling range-bitmap + ORC format for single-chunk case. |
| test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/README | Documents ORC single-chunk dataset rows and index config. |
| test/inte/read_inte_with_index_test.cpp | Adds IT assertions for RangeBitmap index across predicates and data patterns (single/multi-chunk). |
| src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_test.cpp | Adds UTs that roundtrip writer/reader and validate predicate behavior and edge cases. |
| src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_factory.h | Declares factory for RangeBitmap file index. |
| src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_factory.cpp | Implements factory creation and registration. |
| src/paimon/common/file_index/rangebitmap/range_bitmap_file_index.h | Declares RangeBitmap file index, reader, and writer APIs. |
| src/paimon/common/file_index/rangebitmap/range_bitmap_file_index.cpp | Implements index reader/writer based on RangeBitmap serialization. |
| src/paimon/common/file_index/rangebitmap/range_bitmap.h | Declares RangeBitmap query API and append/serialize builder. |
| src/paimon/common/file_index/rangebitmap/range_bitmap.cpp | Implements RangeBitmap read path and serialization format. |
| src/paimon/common/file_index/rangebitmap/dictionary/key_factory.h | Renames default chunk size constant to match naming conventions. |
| src/paimon/common/file_index/CMakeLists.txt | Adds RangeBitmap sources to build. |
| src/paimon/CMakeLists.txt | Registers new RangeBitmap unit test in test build. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_test.cpp
Outdated
Show resolved
Hide resolved
| PAIMON_ASSIGN_OR_RAISE(auto dictionary, | ||
| ChunkedDictionary::Appender::Create( | ||
| factory_, static_cast<int32_t>(chunk_size_bytes_limit_), pool_)); |
There was a problem hiding this comment.
This narrows chunk_size_bytes_limit_ from int64_t to int32_t without validation. If a caller configures a chunk size > INT32_MAX, this can overflow and break dictionary chunking. Consider validating the value and returning Status::Invalid when it exceeds std::numeric_limits<int32_t>::max() (or update the downstream API to accept int64_t).
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_test.cpp
Show resolved
Hide resolved
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_test.cpp
Show resolved
Hide resolved
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_test.cpp
Show resolved
Hide resolved
|
@fafacao86 Thanks for the contribution! You can refer to issue #188 and take a look at this example test as a pattern. The goal is to ensure the range index gracefully handles I/O failures (e.g., read errors, file not found, etc.), similar to how Let me know if you need any help — happy to assist! Thanks! |
Sure, let me take a look. |
|
Is my understanding of the test purpose of |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
|
||
| ASSERT_OK(in->Close()); | ||
| } | ||
|
|
There was a problem hiding this comment.
Rangebitmap IO exception test
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Purpose
Linked issue: close #146
Tests
UT in
rangebtimap_file_index_test.cppIT in
paimon::test::ReadInteWithIndexTest::CheckResultForRangeBitmapdata is generated using paimon-java v1.3.1.
Same data, same queries, with single-chunk and multi-chunk, result should be the same.
tests are mainly written by AI, reviewed by human.
test coverage:
range_bitmap_file_index.cpp is a little low(82.7%) is because No write integration test to cover CreateWriter Method.
API and Format
Documentation
Generative AI tooling
Generated-by: Kimi K2.5