Implement comparisons for RunArray. by brunal · Pull Request #9448 · apache/arrow-rs

brunal · 2026-02-20T10:25:13Z

This MR implements efficient eq, neq, distinct, not distinct, gt, lt, ... for 2 RunArrays with the same DataTypes & length.

The idea is to:

Compute all values indices where the comparison must be performed.

This is the union of the run-ends

For example, given 2 RunArray with run-end values:
[3, 4, 10]
and [2, 5, 10]

The intersection of their run-ends is
[2, 3, 4, 5, 10]

The corresponding indices of the values array of each RunArray are:
[0, 0, 1, 2, 2]
and [0, 1, 1, 1, 2]

Use apply_op_vectored() to perform the operation on the values arrays at those indices.
Finally take nulls into account.
Build a BooleanArray from the result + the null mask.

Implementation thoughts:

A. Returning a RunArray instead of a BooleanArray would be interesting. This can be more efficient: a RunArray (with values being a BooleanBuffer) would have a length in [1; len(input RunArray) * 2] and can be efficiently constructed. This would require introducing new pub functions: distinct_run_array, eq_run_array, etc.

B. The operation is performed on all indices before looking at the nulls. With sparse (null-heavy) arrays this is wasteful. It might be worth skipping the computation when either side is null and then splicing results from non-null and null indices.

C. There's a bit of copy-paste for downcast_primitive_array!() usage. I could only skip that by introducing a new macro, which didn't seem desirable.

D. I find the lack of a value type for a fully typed run array annoying. Array and RunArray<I> are value types, but TypedRunArray<'_, I, V> is a reference type. This is frustrating. Some type contracts are only comments, and not enforced by the type system.

This feature is tracked in #3520.

This MR implements efficient eq, neq, distinct, not distinct, gt, lt, ... for 2 RunArrays with the same DataTypes & length. The idea is to: 1. Compute all values indices where the comparison must be performed. This is the union of the run-ends For example, given 2 RunArray with run-end values: [3, 4, 10] and [2, 5, 10] The intersection of their run-ends is [2, 3, 4, 5, 10] The corresponding indices of the values array of each RunArray are: [0, 0, 1, 2, 2] and [0, 1, 1, 1, 2] 2. Use apply_op_vectored() to perform the operation on the values arrays at those indices. 3. Finally take nulls into account. 4. Build a BooleanArray from the result + the null mask. Implementation thoughts: A. Returning a RunArray instead of a BooleanArray would be interesting. This can be more efficient: a RunArray (with values being a BooleanBuffer) would have a length in [1; len(input RunArray) * 2] and can be efficiently constructed. This would require introducing new pub functions: distinct_run_array, eq_run_array, etc. B. The operation is performed on all indices before looking at the nulls. With sparse (null-heavy) arrays this is wasteful. It might be worth skipping the computation when either side is null and then splicing results from non-null and null indices. C. There's a bit of copy-paste for downcast_primitive_array!() usage. I could only skip that by introducing a new macro, which didn't seem desirable. D. I find the lack of a value type for a fully typed run array annoying. Array an RunArray<I> are value types, but TypedRunArray<'_, I, V> is a reference type. This is frustrating. Some type contracts are only comments, and not enforced by the type system.

github-actions bot added the arrow Changes to the arrow crate label Feb 20, 2026

brunal added 4 commits February 20, 2026 14:53

clippy + rename tests

20f4ee7

fix var names

6db8de4

Simplify matches

bc3904f

improve comment

8bb9e86

brunal marked this pull request as ready for review February 21, 2026 08:43

brunal mentioned this pull request Feb 21, 2026

Implements Sum,sum_checked,min,max,is Distict,inverse for REE. #7933

Open

Fix max_size

ca818af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Implement comparisons for RunArray.#9448

Implement comparisons for RunArray.#9448
brunal wants to merge 6 commits intoapache:mainfrom
brunal:ree-cmp

brunal commented Feb 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

brunal commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

brunal commented Feb 20, 2026 •

edited

Loading