Skip to content

Improve memory accounting for ArrowBytesViewMap#20077

Merged
alamb merged 3 commits intoapache:mainfrom
vigneshsiva11:improve-arrowbytesviewmap-memory
Feb 2, 2026
Merged

Improve memory accounting for ArrowBytesViewMap#20077
alamb merged 3 commits intoapache:mainfrom
vigneshsiva11:improve-arrowbytesviewmap-memory

Conversation

@vigneshsiva11
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

ArrowBytesViewMap was previously accounting for the logical number of null bits when reporting memory usage. This under-reported memory consumption is because NullBufferBuilder may allocate more memory than is currently used.

Memory accounting in DataFusion is expected to reflect allocated memory rather than logical usage to ensure accurate memory tracking.

What changes are included in this PR?

  • Update ArrowBytesViewMap::size to use NullBufferBuilder::allocated_size instead of calculating size from the number of used null bits.

Are these changes tested?

  • Yes. Existing tests were run:
    • cargo test -p datafusion-physical-expr-common

Are there any user-facing changes?

  • No. This change only affects internal memory accounting and does not alter query behavior or public APIs.

Copilot AI review requested due to automatic review settings January 30, 2026 15:51
@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Jan 30, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to improve memory accounting for ArrowBytesViewMap by changing from calculating null buffer size based on logical length (self.nulls.len() / 8) to using the allocated size (self.nulls.allocated_size()). The intent is to account for allocated memory rather than just the used memory, which aligns with DataFusion's memory accounting goals.

Changes:

  • Modified ArrowBytesViewMap::size() method to use NullBufferBuilder::allocated_size() for calculating null buffer memory usage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread datafusion/physical-expr-common/src/binary_view_map.rs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @vigneshsiva11 🙏

let in_progress_size = self.in_progress.capacity();
let completed_size: usize = self.completed.iter().map(|b| b.len()).sum();
let nulls_size = self.nulls.len() / 8;
let nulls_size = self.nulls.allocated_size() / 8;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocated size is in bytes (not bits) https://docs.rs/arrow/latest/arrow/array/struct.NullBufferBuilder.html#method.allocated_size

Return the allocated size of this builder, in bytes, useful for memory accounting.

So I don't think we should still be dividing by 8

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification and the link! I’ve updated the code to use allocated_size() directly without dividing by 8.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code still appears to divide by 8 🤔

Copy link
Copy Markdown
Contributor

@Dandandan Dandandan Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comment in datafusion/physical-plan/src/aggregates/group_values/null_builder.rs:66, "NullBufferBuilder::allocated_size returns capacity in bits". Other places in the codebase that use NullBufferBuilder directly (such as MaybeNullBufferBuilder::allocated_size() at datafusion/physical-plan/src/aggregates/group_values/null_builder.rs:67

Copilot is confused by this comment elsewhere in the source code (it's there).

Adjust nulls_size calculation to use allocated_size directly.
@vigneshsiva11
Copy link
Copy Markdown
Contributor Author

You’re right. I’ve updated the PR to remove the division by 8 and now use allocated_size() directly, as it already returns bytes.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Feb 2, 2026

You’re right. I’ve updated the PR to remove the division by 8 and now use allocated_size() directly, as it already returns bytes.

Thanks @vigneshsiva11

@alamb alamb added this pull request to the merge queue Feb 2, 2026
Merged via the queue into apache:main with commit dab903e Feb 2, 2026
32 checks passed
de-bgunter pushed a commit to de-bgunter/datafusion that referenced this pull request Mar 24, 2026
## Which issue does this PR close?

- Closes apache#20074

## Rationale for this change

ArrowBytesViewMap was previously accounting for the logical number of
null bits when reporting memory usage. This under-reported memory
consumption is because NullBufferBuilder may allocate more memory than
is currently used.

Memory accounting in DataFusion is expected to reflect allocated memory
rather than logical usage to ensure accurate memory tracking.

## What changes are included in this PR?

- Update ArrowBytesViewMap::size to use
NullBufferBuilder::allocated_size instead of calculating size from the
number of used null bits.

## Are these changes tested?

- Yes. Existing tests were run:
  - cargo test -p datafusion-physical-expr-common

## Are there any user-facing changes?

- No. This change only affects internal memory accounting and does not
alter query behavior or public APIs.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve memory accounting for ArrowBytesViewMap

4 participants