Skip to content

Conversation

@lyne7-sc
Copy link
Contributor

@lyne7-sc lyne7-sc commented Feb 1, 2026

Which issue does this PR close?

Rationale for this change

The previous implementation of array_repeat relied on Arrow defaults when handling null and negative count values. As a result, null counts were implicitly treated as zero and returned empty arrays, which is a correctness issue.

This PR makes the handling of these edge cases explicit and aligns the function with SQL null semantics.

What changes are included in this PR?

  • Explicit handling of null and negative count values
  • Planner-time coercion of the count argument to Int64

Are these changes tested?

Yes, SLTs added and pass.

Are there any user-facing changes?

Yes. When the count value is null, array_repeat now returns a null array instead of an empty array.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 1, 2026
OffsetBuffer::new(offsets.into()),
repeated_values,
None,
Some(NullBuffer::new(nulls.finish())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can copy the nulls buffer from the count array instead of using a builder

Suggested change
Some(NullBuffer::new(nulls.finish())),
count_array.nulls().cloned(),

let total_repeated_values: usize = count_array
.values()
.iter()
.map(|&c| if c > 0 { c as usize } else { 0 })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: technically the spec allows null slots to have non-0 values, so this could overestimate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible fix:

let total_repeated_values: usize = (0..count_array.len())
    .map(|i| get_count_with_validity(count_array, i).0)
    .sum();

),
Arc::new(inner_list),
None,
Some(NullBuffer::new(outer_nulls.finish())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note here about reusing input null buffer

let total_repeated_values: usize = count_array
.values()
.iter()
.map(|&c| if c > 0 { c as usize } else { 0 })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible fix:

let total_repeated_values: usize = (0..count_array.len())
    .map(|i| get_count_with_validity(count_array, i).0)
    .sum();

count_array: &UInt64Array,
count_array: &Int64Array,
) -> Result<ArrayRef> {
let counts = count_array.values();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also does not check for NULLs in the count_array and may lead to overestimates. You need to use get_count_with_validity() at line 245 too

let (count, is_valid) = get_count_with_validity(count_array, idx);

running_offset += count;
offsets.push(O::from_usize(running_offset).unwrap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extreme case:

  1. if the input values is ListArray, then its offset type will be i32
  2. and if the count value is bigger than i32::MAX
  3. then i32::from_usize(i32::MAX + 1) will return None and it will panic

lyne7-sc and others added 4 commits February 2, 2026 22:50
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
@lyne7-sc
Copy link
Contributor Author

lyne7-sc commented Feb 2, 2026

Thanks for the review.
The code has been updated to handle overflow cases and and fix the capacity pre-calculation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

array_repeat function does not handle null values in count array

3 participants