-
Notifications
You must be signed in to change notification settings - Fork 160
Enable partial filter pushdown for conjunctive predicates in Iceberg datasets #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -159,10 +159,51 @@ protected boolean pushdownFilter(ScanDefineDescriptor scanDescriptor) throws Alg | |||||||||||||||||||||
| protected void putFilterInformation(ScanDefineDescriptor scanDefineDescriptor, ILogicalExpression inlinedExpr) | ||||||||||||||||||||||
| throws AlgebricksException { | ||||||||||||||||||||||
| if (checkerVisitor.containsMultipleArrayPaths(paths.values())) { | ||||||||||||||||||||||
| // Cannot pushdown a filter with multiple unnest | ||||||||||||||||||||||
| // TODO allow rewindable column readers for filters | ||||||||||||||||||||||
| // TODO this is a bit conservative (maybe too conservative) as we can push part of expression down | ||||||||||||||||||||||
| return; | ||||||||||||||||||||||
| // Conservative fallback: attempt to push a subset of top-level AND operands | ||||||||||||||||||||||
| // that do not collectively introduce multiple array paths. | ||||||||||||||||||||||
| if (inlinedExpr instanceof AbstractFunctionCallExpression | ||||||||||||||||||||||
| && BuiltinFunctions.AND.equals(((AbstractFunctionCallExpression) inlinedExpr).getFunctionIdentifier())) { | ||||||||||||||||||||||
| List<Mutable<ILogicalExpression>> args = ((AbstractFunctionCallExpression) inlinedExpr) | ||||||||||||||||||||||
| .getArguments(); | ||||||||||||||||||||||
| List<Mutable<ILogicalExpression>> selectedArgs = new ArrayList<>(); | ||||||||||||||||||||||
| List<ARecordType> selectedPaths = new ArrayList<>(); | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| for (Mutable<ILogicalExpression> argRef : args) { | ||||||||||||||||||||||
| ILogicalExpression arg = argRef.getValue(); | ||||||||||||||||||||||
| List<ARecordType> argPaths = collectPathsForExpression(arg); | ||||||||||||||||||||||
| // Test if adding this arg's paths would cause multiple array paths | ||||||||||||||||||||||
| checkerVisitor.beforeVisit(); | ||||||||||||||||||||||
| List<ARecordType> merged = new ArrayList<>(selectedPaths); | ||||||||||||||||||||||
| merged.addAll(argPaths); | ||||||||||||||||||||||
| if (!checkerVisitor.containsMultipleArrayPaths(merged)) { | ||||||||||||||||||||||
| selectedArgs.add(new MutableObject<>(arg)); | ||||||||||||||||||||||
| selectedPaths.addAll(argPaths); | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| if (selectedArgs.isEmpty()) { | ||||||||||||||||||||||
| // nothing pushable | ||||||||||||||||||||||
| return; | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| // Build new inlined expression from selectedArgs | ||||||||||||||||||||||
| AbstractFunctionCallExpression newExpr; | ||||||||||||||||||||||
| if (selectedArgs.size() == 1) { | ||||||||||||||||||||||
| // single argument | ||||||||||||||||||||||
| ILogicalExpression single = selectedArgs.get(0).getValue(); | ||||||||||||||||||||||
| putFilterInformation(scanDefineDescriptor, single); | ||||||||||||||||||||||
| return; | ||||||||||||||||||||||
|
Comment on lines
+192
to
+195
|
||||||||||||||||||||||
| // single argument | |
| ILogicalExpression single = selectedArgs.get(0).getValue(); | |
| putFilterInformation(scanDefineDescriptor, single); | |
| return; | |
| // Single argument: avoid recursion. Use it directly as the inlined expression | |
| // and restrict 'paths' to only those needed by this predicate. | |
| ILogicalExpression single = selectedArgs.get(0).getValue(); | |
| inlinedExpr = single; | |
| // Keep only entries whose ARecordType is among the selected paths | |
| paths.entrySet().removeIf(e -> !selectedPaths.contains(e.getValue())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subset-selection logic checks
containsMultipleArrayPaths(merged)using onlyselectedPaths+argPaths, but it doesn't include any paths that may already have been pushed down earlier for this scan (i.e.,scanDefineDescriptor.getFilterPaths()from priorputFilterInformationcalls). Sincepathsis accumulated across multiple candidate filters duringpushdownFilter(...), this can still select a conjunct that introduces a second array path when combined with previously accepted predicates, defeating the constraint you’re trying to preserve. Consider seedingselectedPaths(or themergedlist) with the already-pushed paths from the descriptor before evaluating each new conjunct.