Skip to content

feat: support rangebitmap read and write#185

Open
fafacao86 wants to merge 13 commits intoalibaba:mainfrom
fafacao86:rangebitmap-inte
Open

feat: support rangebitmap read and write#185
fafacao86 wants to merge 13 commits intoalibaba:mainfrom
fafacao86:rangebitmap-inte

Conversation

@fafacao86
Copy link
Contributor

Purpose

Linked issue: close #146

Tests

UT in rangebtimap_file_index_test.cpp

  1. functional tests: single-chunk, multi-chunk, different types rangebitmap read and write
  2. different query patterns: EQ GT LT GTE LTE ISNULL, query existing data and non-existing data.
  3. different data patterns: normal numbers mixed with NULLs, floats with -0.0, +0.0, NaN etc.
  4. edge cases: empty rangebitmap, rangebitmap with ONLY NULLs.

IT in paimon::test::ReadInteWithIndexTest::CheckResultForRangeBitmap
data is generated using paimon-java v1.3.1.
Same data, same queries, with single-chunk and multi-chunk, result should be the same.

tests are mainly written by AI, reviewed by human.
test coverage:
range_bitmap_file_index.cpp is a little low(82.7%) is because No write integration test to cover CreateWriter Method.

[range_bitmap.cpp](http://172.16.25.167:8000/coverage/common/file_index/rangebitmap/range_bitmap.cpp.gcov.html)	
96.2%96.2%
96.2 %	203 / 211	100.0 %	19 / 19
[range_bitmap.h](http://172.16.25.167:8000/coverage/common/file_index/rangebitmap/range_bitmap.h.gcov.html)	
100.0%
100.0 %	5 / 5	100.0 %	2 / 2
[range_bitmap_file_index.cpp](http://172.16.25.167:8000/coverage/common/file_index/rangebitmap/range_bitmap_file_index.cpp.gcov.html)	
82.7%82.7%
82.7 %	110 / 133	96.6 %	28 / 29
[range_bitmap_file_index.h](http://172.16.25.167:8000/coverage/common/file_index/rangebitmap/range_bitmap_file_index.h.gcov.html)	
100.0%
100.0 %	1 / 1	100.0 %	2 / 2
[range_bitmap_file_index_factory.cpp](http://172.16.25.167:8000/coverage/common/file_index/rangebitmap/range_bitmap_file_index_factory.cpp.gcov.html)	
100.0%
100.0 %	4 / 4	100.0 %	4 / 4
[range_bitmap_file_index_factory.h](http://172.16.25.167:8000/coverage/common/file_index/rangebitmap/range_bitmap_file_index_factory.h.gcov.html)	
100.0%
100.0 %	2 / 2	100.0 %	1 / 1
[range_bitmap_file_index_test.cpp](http://172.16.25.167:8000/coverage/common/file_index/rangebitmap/range_bitmap_file_index_test.cpp.gcov.html)	
99.7%99.7%
99.7 %	317 / 318	100.0 %	61 / 61

API and Format

Documentation

Generative AI tooling

Generated-by: Kimi K2.5

@zjw1111 zjw1111 requested a review from Copilot March 20, 2026 10:02
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds RangeBitmap file index support (read/write) and validates it with new unit/integration tests plus embedded test datasets to close #146.

Changes:

  • Implement RangeBitmap file index reader/writer and factory registration.
  • Add comprehensive UTs for RangeBitmap behavior across types and edge-cases, plus IT coverage using PaIOn-generated datasets.
  • Add ORC/Parquet test datasets (single-chunk and multi-chunk) with range-bitmap index metadata.

Reviewed changes

Copilot reviewed 31 out of 51 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/snapshot-1 Adds Parquet multi-chunk snapshot metadata for ITs.
test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/LATEST Adds Parquet multi-chunk latest snapshot pointer.
test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/EARLIEST Adds Parquet multi-chunk earliest snapshot pointer.
test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/schema/schema-0 Adds schema/options enabling range-bitmap with small chunk size for multi-chunk behavior.
test/test_data/parquet/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/README Documents Parquet multi-chunk dataset rows and index config.
test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/snapshot-1 Adds Parquet single-chunk snapshot metadata for ITs.
test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/LATEST Adds Parquet single-chunk latest snapshot pointer.
test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/EARLIEST Adds Parquet single-chunk earliest snapshot pointer.
test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/schema/schema-0 Adds schema/options enabling range-bitmap for single-chunk case.
test/test_data/parquet/append_with_rangebitmap.db/append_with_rangebitmap/README Documents Parquet single-chunk dataset rows and index config.
test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/snapshot-1 Adds ORC multi-chunk snapshot metadata for ITs.
test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/LATEST Adds ORC multi-chunk latest snapshot pointer.
test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/snapshot/EARLIEST Adds ORC multi-chunk earliest snapshot pointer.
test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/schema/schema-0 Adds schema/options enabling range-bitmap + ORC format + small chunk size for multi-chunk behavior.
test/test_data/orc/append_with_rangebitmap_multi_chunk.db/append_with_rangebitmap_multi_chunk/README Documents ORC multi-chunk dataset rows and index config.
test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/snapshot-1 Adds ORC single-chunk snapshot metadata for ITs.
test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/LATEST Adds ORC single-chunk latest snapshot pointer.
test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/snapshot/EARLIEST Adds ORC single-chunk earliest snapshot pointer.
test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/schema/schema-0 Adds schema/options enabling range-bitmap + ORC format for single-chunk case.
test/test_data/orc/append_with_rangebitmap.db/append_with_rangebitmap/README Documents ORC single-chunk dataset rows and index config.
test/inte/read_inte_with_index_test.cpp Adds IT assertions for RangeBitmap index across predicates and data patterns (single/multi-chunk).
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_test.cpp Adds UTs that roundtrip writer/reader and validate predicate behavior and edge cases.
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_factory.h Declares factory for RangeBitmap file index.
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index_factory.cpp Implements factory creation and registration.
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index.h Declares RangeBitmap file index, reader, and writer APIs.
src/paimon/common/file_index/rangebitmap/range_bitmap_file_index.cpp Implements index reader/writer based on RangeBitmap serialization.
src/paimon/common/file_index/rangebitmap/range_bitmap.h Declares RangeBitmap query API and append/serialize builder.
src/paimon/common/file_index/rangebitmap/range_bitmap.cpp Implements RangeBitmap read path and serialization format.
src/paimon/common/file_index/rangebitmap/dictionary/key_factory.h Renames default chunk size constant to match naming conventions.
src/paimon/common/file_index/CMakeLists.txt Adds RangeBitmap sources to build.
src/paimon/CMakeLists.txt Registers new RangeBitmap unit test in test build.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +260 to +262
PAIMON_ASSIGN_OR_RAISE(auto dictionary,
ChunkedDictionary::Appender::Create(
factory_, static_cast<int32_t>(chunk_size_bytes_limit_), pool_));
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This narrows chunk_size_bytes_limit_ from int64_t to int32_t without validation. If a caller configures a chunk size > INT32_MAX, this can overflow and break dictionary chunking. Consider validating the value and returning Status::Invalid when it exceeds std::numeric_limits<int32_t>::max() (or update the downstream API to accept int64_t).

Copilot uses AI. Check for mistakes.
@lxy-9602
Copy link
Collaborator

@fafacao86 Thanks for the contribution!
We're currently improving our code quality standards, and we'd appreciate it if you could add an I/O exception test for the range index before the code is merged.

You can refer to issue #188 and take a look at this example test as a pattern.

The goal is to ensure the range index gracefully handles I/O failures (e.g., read errors, file not found, etc.), similar to how SstFileIoTest does.

Let me know if you need any help — happy to assist! Thanks!

@fafacao86
Copy link
Contributor Author

@fafacao86 Thanks for the contribution! We're currently improving our code quality standards, and we'd appreciate it if you could add an I/O exception test for the range index before the code is merged.

Sure, let me take a look.

@fafacao86
Copy link
Contributor Author

Is my understanding of the test purpose of TEST_F(SstFileIOTest, TestIOException) correct?
Triggers I/O failures at every possible point in the code (1st, 2nd, 3rd... I/O operation) to ensure that when a disk read fails or a network connection drops, the code returns an error gracefully instead of crashing? The test expectation is not to crash, and error message contains IO information?
@lxy-9602

fafacao86 and others added 2 commits March 23, 2026 17:11
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

ASSERT_OK(in->Close());
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rangebitmap IO exception test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support RangeBitmap File Index

3 participants