Skip to content

feat(common): inject _block_num statistics into physical plan#2031

Merged
Theodus merged 1 commit intomainfrom
theodus/stats
Mar 26, 2026
Merged

feat(common): inject _block_num statistics into physical plan#2031
Theodus merged 1 commit intomainfrom
theodus/stats

Conversation

@Theodus
Copy link
Copy Markdown
Member

@Theodus Theodus commented Mar 26, 2026

Implement TableProvider::statistics() on QueryableSnapshot to report exact _block_num column min/max from synced range. DataFusion's AggregateStatistics optimizer replaces MIN/MAX aggregates with constants when exact statistics are available, avoiding full parquet scans.

Copy link
Copy Markdown
Contributor

@fordN fordN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! 🚀

@Theodus Theodus marked this pull request as draft March 26, 2026 18:36
Override _block_num column min/max on the Statistics passed to
FileScanConfig in scan(), using synced_range. This is where
DataFusion's AggregateStatistics optimizer reads from, enabling it
to resolve MIN/MAX(_block_num) as constants without scanning parquet
files.
@Theodus Theodus changed the title feat(common): resolve MIN/MAX(_block_num) from metadata statistics feat(common): inject _block_num statistics into physical plan Mar 26, 2026
@Theodus Theodus marked this pull request as ready for review March 26, 2026 18:47
@Theodus Theodus merged commit eb6e264 into main Mar 26, 2026
8 checks passed
@Theodus Theodus deleted the theodus/stats branch March 26, 2026 18:50
.await?;

// Override _block_num column statistics with exact min/max from synced_range.
// This enables the AggregateStatistics optimizer to resolve MIN/MAX(_block_num)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong, the start and end range of the synced segment are not necessarily represented as column values. E.g. the segment could have zero rows.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #2047

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants