Enable partial filter pushdown for conjunctive predicates in Iceberg datasets#45
Conversation
…xist; extract pushable AND operands
There was a problem hiding this comment.
Pull request overview
Improves filter pushdown for columnar (including Iceberg-backed) datasets by attempting partial pushdown of conjunctive (AND) predicates even when the overall filter would violate the “multiple array paths” constraint.
Changes:
- Adds a conservative fallback to select pushdown-eligible top-level
ANDconjuncts that don’t collectively introduce multiple array paths. - Reconstructs a reduced
ANDexpression (or a single predicate) from selected conjuncts for pushdown. - Introduces helper logic to collect per-subexpression path metadata used to enforce array-path constraints.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // single argument | ||
| ILogicalExpression single = selectedArgs.get(0).getValue(); | ||
| putFilterInformation(scanDefineDescriptor, single); | ||
| return; |
There was a problem hiding this comment.
When selectedArgs.size() == 1, this calls putFilterInformation(scanDefineDescriptor, single) recursively while leaving paths unchanged. Since paths.values() still contains the full set of paths (including the ones that triggered containsMultipleArrayPaths), the recursive call will typically hit the same containsMultipleArrayPaths(...) branch and then return early (because single is not an AND), resulting in no predicate being pushed down. Instead, avoid recursion here: set inlinedExpr to the single selected predicate and also restrict paths to only the entries needed by that selected predicate (and re-check/reset the visitor state if needed).
| // single argument | |
| ILogicalExpression single = selectedArgs.get(0).getValue(); | |
| putFilterInformation(scanDefineDescriptor, single); | |
| return; | |
| // Single argument: avoid recursion. Use it directly as the inlined expression | |
| // and restrict 'paths' to only those needed by this predicate. | |
| ILogicalExpression single = selectedArgs.get(0).getValue(); | |
| inlinedExpr = single; | |
| // Keep only entries whose ARecordType is among the selected paths | |
| paths.entrySet().removeIf(e -> !selectedPaths.contains(e.getValue())); |
| List<Mutable<ILogicalExpression>> selectedArgs = new ArrayList<>(); | ||
| List<ARecordType> selectedPaths = new ArrayList<>(); | ||
|
|
||
| for (Mutable<ILogicalExpression> argRef : args) { | ||
| ILogicalExpression arg = argRef.getValue(); | ||
| List<ARecordType> argPaths = collectPathsForExpression(arg); | ||
| // Test if adding this arg's paths would cause multiple array paths | ||
| checkerVisitor.beforeVisit(); | ||
| List<ARecordType> merged = new ArrayList<>(selectedPaths); | ||
| merged.addAll(argPaths); | ||
| if (!checkerVisitor.containsMultipleArrayPaths(merged)) { | ||
| selectedArgs.add(new MutableObject<>(arg)); | ||
| selectedPaths.addAll(argPaths); | ||
| } |
There was a problem hiding this comment.
The subset-selection logic checks containsMultipleArrayPaths(merged) using only selectedPaths + argPaths, but it doesn't include any paths that may already have been pushed down earlier for this scan (i.e., scanDefineDescriptor.getFilterPaths() from prior putFilterInformation calls). Since paths is accumulated across multiple candidate filters during pushdownFilter(...), this can still select a conjunct that introduces a second array path when combined with previously accepted predicates, defeating the constraint you’re trying to preserve. Consider seeding selectedPaths (or the merged list) with the already-pushed paths from the descriptor before evaluating each new conjunct.
This change improves filter pushdown behavior for Iceberg datasets when handling conjunctive (AND) predicates.
Previously, if a WHERE clause contained multiple predicates and at least one of them was not eligible for pushdown (e.g., due to multiple array paths or unsupported expressions), the entire filter would be skipped. This resulted in missed opportunities for pushdown and unnecessary data scanning.
With this update:
For example:
f1 = 'a' AND f2 LIKE 'a%'
Previously:
Now:
Implementation details:
This results in improved query performance by enabling partial pushdown while maintaining correctness.
Tested with mixed pushdown/non-pushdown predicates to verify: