Avoid null-restrict evaluation for predicates that reference non-join columns in PushDownFilter by kosiew · Pull Request #20961 · apache/datafusion

kosiew · 2026-03-16T14:07:22Z

Which issue does this PR close?

Part of perf: push_down_filter is pathologically slow for some plans #20002

Rationale for this change

PushDownFilter can spend a disproportionate amount of planning time inferring predicates across joins. One expensive path is is_restrict_null_predicate, which falls back to compiling and evaluating the predicate against a null-filled schema to decide whether a predicate is null-rejecting.

For predicates that reference columns outside the join-key set, that evaluation cannot succeed with the synthetic null schema built for join columns only. In practice, callers already treat evaluation failures as non-restricting, but we still pay the full cost of the physical-expression compilation and evaluation path first.

This change adds a cheap guard to detect predicates that reference columns outside the allowed join columns and returns false early. That preserves the existing behavior while avoiding unnecessary work in a hot optimizer path.

What changes are included in this PR?

This PR makes two focused changes:

In is_restrict_null_predicate, collect the join columns into a HashSet and add a fast-path check that verifies whether the predicate only references those columns.
If the predicate references any non-join column, return Ok(false) immediately instead of attempting null-evaluation.

Additionally:

The evaluated join-column set is reused for the fallback evaluate_expr_with_null_column path.
InferredPredicates::insert_inferred_predicate is simplified to use .unwrap_or(false) when consuming is_restrict_null_predicate, which matches the prior effective behavior of treating errors as non-restricting.
A regression test is added for a predicate like a > b, where b is outside the join-key set, to verify the fast path returns false.

Are these changes tested?

Yes.

A test case was added to cover the scenario where a predicate references a column outside the join key set:

a > b now explicitly verifies that is_restrict_null_predicate returns false.

This exercises the new early-return path and protects against regressions in predicate analysis behavior.

Are there any user-facing changes?

No.

This change is an internal optimizer performance improvement and does not change public APIs or intended query results.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

…ting conditions

Introduce a test case to assert non-restricting behavior when evaluating the predicate a > b, focusing on join keys that only include a. This directly tests the new early-return branch in the is_restrict_null_predicate function in utils.rs, enhancing overall code coverage.

Extract the column-membership check into a new helper function called `predicate_uses_only_columns` in utils.rs. Update the current implementation at utils.rs:91 to use this new helper, improving code readability and maintainability.

Add call-site contract comment in push_down_filter.rs to specify that only Ok(true) is treated as null-restricting. State that both Ok(false) and Err(_) are considered non-restricting and will be skipped during processing.

Inline iterator predicate in utils.rs and streamline the null-restrict handling in push_down_filter.rs. This reduces indirections and lines of code while maintaining the same logic and behavior. No public interface or behavior changes intended.

…te_uses_only_columns function

kosiew · 2026-03-16T14:08:56Z

run benchmark sql_planner_extended

adriangbot · 2026-03-16T14:11:46Z

🤖 Criterion benchmark running (GKE) | trigger
Linux bench-c4067930471-314-dhsls 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing push-down-02-20002 (3d3945c) to ab28234 (merge-base) diff
BENCH_NAME=sql_planner_extended
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner_extended
BENCH_FILTER=
Results will be posted here when complete

kosiew · 2026-03-18T07:51:37Z

show benchmark queue

adriangbot · 2026-03-18T07:51:38Z

Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)).

No pending jobs.

kosiew · 2026-03-18T07:52:02Z

run benchmark sql_planner_extended

kosiew · 2026-03-18T07:52:55Z

show benchmark queue

adriangbot · 2026-03-18T07:52:57Z

Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)).

Comment	Repo	PR	User	Benchmarks	Status
#4080428628	apache/datafusion	#20961	kosiew	["sql_planner_extended"]	running

adriangbot · 2026-03-18T07:54:33Z

🤖 Criterion benchmark running (GKE) | trigger
Linux bench-c4080428628-401-tbflf 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing push-down-02-20002 (3d3945c) to ab28234 (merge-base) diff
BENCH_NAME=sql_planner_extended
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner_extended
BENCH_FILTER=
Results will be posted here when complete

kosiew · 2026-03-18T08:48:50Z

show benchmark queue

adriangbot · 2026-03-18T08:48:52Z

Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)).

Comment	Repo	PR	User	Benchmarks	Status
#4080428628	apache/datafusion	#20961	kosiew	["sql_planner_extended"]	running

kosiew · 2026-03-18T12:30:39Z

show benchmark queue

adriangbot · 2026-03-18T12:30:40Z

Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)).

Comment	Repo	PR	User	Benchmarks	Status
#4082114714	apache/datafusion	#21026	Dandandan	["clickbench_partitioned"]	running
#4082114714	apache/datafusion	#21026	Dandandan	["tpcds"]	running
#4082114714	apache/datafusion	#21026	Dandandan	["tpch"]	running

kosiew · 2026-03-18T15:20:19Z

I think the benchmark never completes or gets killed because it's too heavy.
Amending benchmark in #21029

kosiew · 2026-03-19T02:24:24Z

run benchmark sql_planner_extended --sample-size 10

kosiew · 2026-03-19T02:25:48Z

show benchmark queue

adriangbot · 2026-03-19T02:25:50Z

Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)).

Comment	Repo	PR	User	Benchmarks	Status
#4087159258	apache/datafusion	#20961	kosiew	["sql_planner_extended"]	running
#4087159258	apache/datafusion	#20961	kosiew	["--sample-size"]	running
#4087159258	apache/datafusion	#20961	kosiew	["10"]	running

adriangbot · 2026-03-19T02:27:15Z

🤖 Criterion benchmark running (GKE) | trigger
Linux bench-c4087159258-443-hxscx 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing push-down-02-20002 (3d3945c) to ab28234 (merge-base) diff
BENCH_NAME=sql_planner_extended
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner_extended
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-19T02:27:16Z

🤖 Criterion benchmark running (GKE) | trigger
Linux bench-c4087159258-444-4d9n5 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing push-down-02-20002 (3d3945c) to ab28234 (merge-base) diff
BENCH_NAME=--sample-size
BENCH_COMMAND=cargo bench --features=parquet --bench --sample-size
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-19T02:27:16Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

HEAD is now at ab28234 Support `columns_sorted` in row_filters (#20497)
rustc 1.94.0 (4a4ef493e 2026-03-02)
3d3945ce07b7015c11b0a4f89f3b456d785b7bdf
ab2823475d0c79a749120ae354572ab85c043b78
error: unexpected argument '--sample-size' found

  tip: a similar argument exists: '--examples'
  tip: to pass '--sample-size' as a value, use '-- --sample-size'

Usage: cargo bench --features <FEATURES> --bench [<NAME>] --examples [BENCHNAME] [-- [ARGS]...]

For more information, try '--help'.
error: unexpected argument '--sample-size' found

  tip: a similar argument exists: '--examples'
  tip: to pass '--sample-size' as a value, use '-- --sample-size'

Usage: cargo bench --features <FEATURES> --bench [<NAME>] --examples [BENCHNAME] [-- [ARGS]...]

For more information, try '--help'.

adriangbot · 2026-03-19T02:27:49Z

🤖 Criterion benchmark running (GKE) | trigger
Linux bench-c4087159258-445-b4z9j 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing push-down-02-20002 (3d3945c) to ab28234 (merge-base) diff
BENCH_NAME=10
BENCH_COMMAND=cargo bench --features=parquet --bench 10
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-19T02:27:54Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

Cloning into '/workspace/datafusion-branch'...
push-down-02-20002
From https://github.com/apache/datafusion
 * [new ref]         refs/pull/20961/head -> push-down-02-20002
 * branch            main                 -> FETCH_HEAD
Switched to branch 'push-down-02-20002'
ab2823475d0c79a749120ae354572ab85c043b78
Cloning into '/workspace/datafusion-base'...
HEAD is now at ab28234 Support `columns_sorted` in row_filters (#20497)
rustc 1.94.0 (4a4ef493e 2026-03-02)
3d3945ce07b7015c11b0a4f89f3b456d785b7bdf
ab2823475d0c79a749120ae354572ab85c043b78
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on package cache
error: no bench target named `10` in default-run packages

help: a target with a similar name exists: `chr`

kosiew added 6 commits March 16, 2026 21:38

fix: optimize null predicate evaluation by early exit for non-restric…

b9828ca

…ting conditions

Clarify null-restricting behavior in filter check

144cab3

Add call-site contract comment in push_down_filter.rs to specify that only Ok(true) is treated as null-restricting. State that both Ok(false) and Err(_) are considered non-restricting and will be skipped during processing.

refactor: streamline null predicate evaluation by introducing predica…

3d3945c

…te_uses_only_columns function

github-actions bot added the optimizer Optimizer rules label Mar 16, 2026

Conversation

kosiew commented Mar 16, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

LLM-generated code disclosure

Uh oh!

kosiew commented Mar 16, 2026

Uh oh!

adriangbot commented Mar 16, 2026

Uh oh!

kosiew commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

kosiew commented Mar 18, 2026

Uh oh!

kosiew commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

kosiew commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

kosiew commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

kosiew commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kosiew commented Mar 19, 2026

Uh oh!

kosiew commented Mar 19, 2026

Uh oh!

adriangbot commented Mar 19, 2026

Uh oh!

adriangbot commented Mar 19, 2026

Uh oh!

adriangbot commented Mar 19, 2026

Uh oh!

adriangbot commented Mar 19, 2026

Uh oh!

adriangbot commented Mar 19, 2026

Uh oh!

adriangbot commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kosiew commented Mar 18, 2026 •

edited

Loading