Skip to content

Enable partial filter pushdown for conjunctive predicates in Iceberg datasets#45

Open
shishir-kuet wants to merge 1 commit intoapache:masterfrom
shishir-kuet:fix/ASTERIXDB-3743-iceberg-partition-pushdown
Open

Enable partial filter pushdown for conjunctive predicates in Iceberg datasets#45
shishir-kuet wants to merge 1 commit intoapache:masterfrom
shishir-kuet:fix/ASTERIXDB-3743-iceberg-partition-pushdown

Conversation

@shishir-kuet
Copy link
Copy Markdown

This change improves filter pushdown behavior for Iceberg datasets when handling conjunctive (AND) predicates.

Previously, if a WHERE clause contained multiple predicates and at least one of them was not eligible for pushdown (e.g., due to multiple array paths or unsupported expressions), the entire filter would be skipped. This resulted in missed opportunities for pushdown and unnecessary data scanning.

With this update:

  • Each conjunct (AND argument) is evaluated independently for pushdown eligibility
  • Pushdown-compatible predicates are selected and pushed down
  • Non-compatible predicates are safely ignored for pushdown and remain in the query plan
  • Existing constraints such as avoiding multiple array paths are preserved

For example:
f1 = 'a' AND f2 LIKE 'a%'

Previously:

  • No predicates were pushed down

Now:

  • "f1 = 'a'" is pushed down
  • "f2 LIKE 'a%'" remains in the plan

Implementation details:

  • Decomposes AND expressions into individual arguments
  • Tracks selected paths and ensures no violation of array path constraints
  • Reconstructs a new AND expression from eligible predicates when multiple are valid
  • Falls back to a single predicate when only one is pushdown-compatible

This results in improved query performance by enabling partial pushdown while maintaining correctness.

Tested with mixed pushdown/non-pushdown predicates to verify:

  • Correct partial pushdown behavior
  • No change in query results

Copilot AI review requested due to automatic review settings March 26, 2026 22:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves filter pushdown for columnar (including Iceberg-backed) datasets by attempting partial pushdown of conjunctive (AND) predicates even when the overall filter would violate the “multiple array paths” constraint.

Changes:

  • Adds a conservative fallback to select pushdown-eligible top-level AND conjuncts that don’t collectively introduce multiple array paths.
  • Reconstructs a reduced AND expression (or a single predicate) from selected conjuncts for pushdown.
  • Introduces helper logic to collect per-subexpression path metadata used to enforce array-path constraints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +192 to +195
// single argument
ILogicalExpression single = selectedArgs.get(0).getValue();
putFilterInformation(scanDefineDescriptor, single);
return;
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When selectedArgs.size() == 1, this calls putFilterInformation(scanDefineDescriptor, single) recursively while leaving paths unchanged. Since paths.values() still contains the full set of paths (including the ones that triggered containsMultipleArrayPaths), the recursive call will typically hit the same containsMultipleArrayPaths(...) branch and then return early (because single is not an AND), resulting in no predicate being pushed down. Instead, avoid recursion here: set inlinedExpr to the single selected predicate and also restrict paths to only the entries needed by that selected predicate (and re-check/reset the visitor state if needed).

Suggested change
// single argument
ILogicalExpression single = selectedArgs.get(0).getValue();
putFilterInformation(scanDefineDescriptor, single);
return;
// Single argument: avoid recursion. Use it directly as the inlined expression
// and restrict 'paths' to only those needed by this predicate.
ILogicalExpression single = selectedArgs.get(0).getValue();
inlinedExpr = single;
// Keep only entries whose ARecordType is among the selected paths
paths.entrySet().removeIf(e -> !selectedPaths.contains(e.getValue()));

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +181
List<Mutable<ILogicalExpression>> selectedArgs = new ArrayList<>();
List<ARecordType> selectedPaths = new ArrayList<>();

for (Mutable<ILogicalExpression> argRef : args) {
ILogicalExpression arg = argRef.getValue();
List<ARecordType> argPaths = collectPathsForExpression(arg);
// Test if adding this arg's paths would cause multiple array paths
checkerVisitor.beforeVisit();
List<ARecordType> merged = new ArrayList<>(selectedPaths);
merged.addAll(argPaths);
if (!checkerVisitor.containsMultipleArrayPaths(merged)) {
selectedArgs.add(new MutableObject<>(arg));
selectedPaths.addAll(argPaths);
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subset-selection logic checks containsMultipleArrayPaths(merged) using only selectedPaths + argPaths, but it doesn't include any paths that may already have been pushed down earlier for this scan (i.e., scanDefineDescriptor.getFilterPaths() from prior putFilterInformation calls). Since paths is accumulated across multiple candidate filters during pushdownFilter(...), this can still select a conjunct that introduces a second array path when combined with previously accepted predicates, defeating the constraint you’re trying to preserve. Consider seeding selectedPaths (or the merged list) with the already-pushed paths from the descriptor before evaluating each new conjunct.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants