Avoid null-restrict evaluation for predicates that reference non-join columns in PushDownFilter#20961
Avoid null-restrict evaluation for predicates that reference non-join columns in PushDownFilter#20961kosiew wants to merge 6 commits intoapache:mainfrom
Conversation
Introduce a test case to assert non-restricting behavior when evaluating the predicate a > b, focusing on join keys that only include a. This directly tests the new early-return branch in the is_restrict_null_predicate function in utils.rs, enhancing overall code coverage.
Extract the column-membership check into a new helper function called `predicate_uses_only_columns` in utils.rs. Update the current implementation at utils.rs:91 to use this new helper, improving code readability and maintainability.
Add call-site contract comment in push_down_filter.rs to specify that only Ok(true) is treated as null-restricting. State that both Ok(false) and Err(_) are considered non-restricting and will be skipped during processing.
Inline iterator predicate in utils.rs and streamline the null-restrict handling in push_down_filter.rs. This reduces indirections and lines of code while maintaining the same logic and behavior. No public interface or behavior changes intended.
…te_uses_only_columns function
|
run benchmark sql_planner_extended |
|
🤖 Criterion benchmark running (GKE) | trigger |
|
show benchmark queue |
|
Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)). No pending jobs. |
|
run benchmark sql_planner_extended |
|
show benchmark queue |
|
Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)).
|
|
🤖 Criterion benchmark running (GKE) | trigger |
|
show benchmark queue |
|
Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)).
|
|
show benchmark queue |
|
Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)).
|
|
I think the benchmark never completes or gets killed because it's too heavy. |
|
run benchmark sql_planner_extended --sample-size 10 |
|
show benchmark queue |
|
Hi @kosiew, you asked to view the benchmark queue (#20961 (comment)).
|
|
🤖 Criterion benchmark running (GKE) | trigger |
|
🤖 Criterion benchmark running (GKE) | trigger |
|
Benchmark for this request failed. Last 20 lines of output: Click to expand |
|
🤖 Criterion benchmark running (GKE) | trigger |
|
Benchmark for this request failed. Last 20 lines of output: Click to expand |
Which issue does this PR close?
Rationale for this change
PushDownFiltercan spend a disproportionate amount of planning time inferring predicates across joins. One expensive path isis_restrict_null_predicate, which falls back to compiling and evaluating the predicate against a null-filled schema to decide whether a predicate is null-rejecting.For predicates that reference columns outside the join-key set, that evaluation cannot succeed with the synthetic null schema built for join columns only. In practice, callers already treat evaluation failures as non-restricting, but we still pay the full cost of the physical-expression compilation and evaluation path first.
This change adds a cheap guard to detect predicates that reference columns outside the allowed join columns and returns
falseearly. That preserves the existing behavior while avoiding unnecessary work in a hot optimizer path.What changes are included in this PR?
This PR makes two focused changes:
is_restrict_null_predicate, collect the join columns into aHashSetand add a fast-path check that verifies whether the predicate only references those columns.Ok(false)immediately instead of attempting null-evaluation.Additionally:
evaluate_expr_with_null_columnpath.InferredPredicates::insert_inferred_predicateis simplified to use.unwrap_or(false)when consumingis_restrict_null_predicate, which matches the prior effective behavior of treating errors as non-restricting.a > b, wherebis outside the join-key set, to verify the fast path returnsfalse.Are these changes tested?
Yes.
A test case was added to cover the scenario where a predicate references a column outside the join key set:
a > bnow explicitly verifies thatis_restrict_null_predicatereturnsfalse.This exercises the new early-return path and protects against regressions in predicate analysis behavior.
Are there any user-facing changes?
No.
This change is an internal optimizer performance improvement and does not change public APIs or intended query results.
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.