Skip to content

fix: avoid FixedSizeList overflow in hash join build materialization#21023

Open
kumarUjjawal wants to merge 1 commit intoapache:mainfrom
kumarUjjawal:fix/#20673
Open

fix: avoid FixedSizeList overflow in hash join build materialization#21023
kumarUjjawal wants to merge 1 commit intoapache:mainfrom
kumarUjjawal:fix/#20673

Conversation

@kumarUjjawal
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Issue #20673 describes a panic in FULL OUTER JOIN when the build side includes a FixedSizeList column and the total element count exceeds u32::MAX.

The problem was caused by HashJoinExec concatenating the entire build side into a single batch. For FixedSizeList, this leads to take() hitting Arrow’s internal u32 indexing limit and panicking.

The fix removes the single-batch assumption. Instead, the build side remains batched, and logical row indices are mapped back to their original batches. This avoids creating oversized FixedSizeListArray instances and prevents the panic.

What changes are included in this PR?

Fix this by keeping the build side in logical batch order instead of materializing one giant RecordBatch. Add batch-aware helpers for build-side row materialization, join-key equality checks, join filters, and final unmatched-row output, while preserving existing join semantics.

Also add regression coverage for large FixedSizeList build-side indices and unmatched-row output, unit tests for the new batch helpers.

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panic in FULL OUTER JOIN with schema of (u32, FSL) when number of rows and size of FSL exceed u32::MAX and target_partition=1

1 participant