Skip to content

Fix MATCH after CREATE returning 0 rows (issue #2308)#2340

Merged
jrgemignani merged 1 commit intoapache:masterfrom
gregfelice:fix_2308_match_after_create
Feb 27, 2026
Merged

Fix MATCH after CREATE returning 0 rows (issue #2308)#2340
jrgemignani merged 1 commit intoapache:masterfrom
gregfelice:fix_2308_match_after_create

Conversation

@gregfelice
Copy link
Contributor

@gregfelice gregfelice commented Feb 26, 2026

Problem

CREATE (a)-[e]->(b) WITH a,e,b MATCH p=(a)-[e]->(b) SET a.something = 'something' RETURN a returns 0 rows instead of the created vertex.

Reported in #2308.

Root Cause

When MATCH follows CREATE + WITH and re-uses all bound variables, the MATCH generates filter quals (age_start_id(e) = age_id(a) AND age_end_id(e) = age_id(b)) that reference only columns from the predecessor subquery. Since the MATCH adds no new table scans (all entities are bound), PostgreSQL's optimizer treats the subquery as transparent and pushes these filter quals down through the subquery layers into the CREATE's child plan.

Before fix — EXPLAIN plan showing the bug:

Custom Scan (Cypher Set)
  ->  Subquery Scan on _age_default_alias_previous_cypher_clause
        ->  Result
              ->  Custom Scan (Cypher Create)
                    ->  Subquery Scan on _age_default_alias_previous_cypher_clause_1
                          Filter: (age_start_id(e) = age_id(a) AND age_end_id(e) = age_id(b))
                          ->  Result

The filter is placed below the CREATE custom scan, on its child subquery. At execution time:

  1. The innermost Result produces a dummy row with NULL values
  2. The pushed-down filter evaluates age_start_id(NULL) = age_id(NULL) → fails
  3. The filter produces 0 rows → CREATE never receives input → nothing is created
  4. The entire query returns 0 rows

The filter should evaluate after CREATE produces its output (where a, e, b have actual values), not before.

Fix

In transform_cypher_match_pattern(), after transforming the predecessor clause chain into a subquery RTE, check if the chain contains any data-modifying operation (CREATE, SET, DELETE, or MERGE). If it does, set rte->security_barrier = true on the subquery RTE. This is PostgreSQL's standard mechanism to prevent qual pushdown through subqueries — the optimizer will not flatten a security-barrier subquery or push filter conditions into it.

A helper function clause_chain_has_dml() walks the clause chain to detect DML operations.

After fix — EXPLAIN plan showing correct structure:

Custom Scan (Cypher Set)
  ->  Subquery Scan on _age_default_alias_previous_cypher_clause
        ->  Subquery Scan on _age_default_alias_previous_cypher_clause_1
              Filter: (age_start_id(e) = age_id(a) AND age_end_id(e) = age_id(b))
              ->  Custom Scan (Cypher Create)
                    ->  Subquery Scan on _age_default_alias_previous_cypher_clause_2
                          ->  Result

Now the filter is placed above the CREATE custom scan. CREATE runs first (inserting entities), then the filter evaluates on the output values and correctly passes.

Files Changed

  • src/backend/parser/cypher_clause.c — Added clause_chain_has_dml() helper function and security_barrier logic in transform_cypher_match_pattern()
  • regress/sql/cypher_match.sql — Regression tests for issue MATCH after CREATE does not return the newly created row #2308
  • regress/expected/cypher_match.out — Expected test output

Regression Tests

Added tests covering:

  • Reporter's exact case: CREATE + WITH + MATCH + SET + RETURN (1 row expected)
  • Bound variables without SET (1 row expected)
  • Reversed direction filter — confirms the filter still works correctly (0 rows expected)
  • Node-only MATCH with bound variable (1 row expected)
  • MATCH after SET (DML chain detection) (1 row expected)

All 31 existing regression tests pass with this change.

Closes #2308.

AI Disclosure

AI tools (Claude by Anthropic) were used to assist in developing this fix, including root cause analysis, code changes, and regression tests.

When a MATCH clause follows CREATE + WITH and re-uses bound variables
(e.g. CREATE (a)-[e]->(b) WITH a,e,b MATCH p=(a)-[e]->(b)), the MATCH
generates filter quals (age_start_id(e) = age_id(a), etc.) that
reference only columns from the predecessor subquery. PostgreSQL's
optimizer pushes these quals through the transparent subquery layers
into the CREATE's child plan, where they evaluate on NULL values before
CREATE has executed — always yielding 0 rows.

Fix: mark the predecessor subquery RTE as security_barrier when the
clause chain contains a data-modifying operation (CREATE, SET, DELETE,
or MERGE). This prevents PostgreSQL from pushing filter quals into the
subquery, ensuring they evaluate after the DML produces output values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gregfelice added a commit to gregfelice/age that referenced this pull request Feb 26, 2026
…che#2193)

When CREATE introduces a new label and a subsequent MATCH references it
(e.g., CREATE (:Person) WITH ... MATCH (p:Person)), the query returns
0 rows on first execution but works on the second.

Root cause: match_check_valid_label() in transform_cypher_match() runs
before transform_prev_cypher_clause() processes the predecessor chain.
Since CREATE has not yet executed its transform (which creates the label
table as a side effect), the label is not in the cache and the check
generates a One-Time Filter: false plan that returns no rows.

Fix: Skip the early label validity check when the predecessor clause
chain contains a data-modifying operation (CREATE, SET, DELETE, MERGE).
After transform_prev_cypher_clause() completes and any new labels exist
in the cache, run a deferred label check. If the labels are still
invalid at that point, generate an empty result via makeBoolConst(false).

This preserves the existing behavior for MATCH without DML predecessors
(e.g., MATCH-MATCH chains still get the early check and proper error
messages for invalid labels).

Depends on: PR apache#2340 (clause_chain_has_dml helper)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jrgemignani jrgemignani requested a review from Copilot February 27, 2026 17:17
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a planner/optimizer interaction in Apache AGE where a MATCH following CREATE ... WITH ... (reusing bound variables) could have its generated filter quals pushed below the DML plan, preventing the DML from executing and causing the query to return 0 rows.

Changes:

  • Mark the predecessor subquery RTE as a PostgreSQL security_barrier when the predecessor clause chain includes DML (CREATE/SET/DELETE/MERGE), preventing qual pushdown into the DML’s child plan.
  • Add a helper (clause_chain_has_dml()) to detect DML operations in the clause chain.
  • Add regression coverage for issue #2308 and corresponding expected output.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/backend/parser/cypher_clause.c Adds DML detection and sets security_barrier on the predecessor subquery RTE to prevent incorrect qual pushdown past DML.
regress/sql/cypher_match.sql Adds regression queries covering CREATE+WITH+MATCH (and related variations) for issue #2308.
regress/expected/cypher_match.out Captures expected outputs for the new regression cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jrgemignani
Copy link
Contributor

@gregfelice Please see Copilot's comment above. Thoughts?

@gregfelice
Copy link
Contributor Author

Hi jrgemignani,

Copilot raises whether clause_chain_has_dml() should descend into cypher_sub_query nodes to look for nested DML.

This isn't a concern in practice — Cypher subquery expressions (EXISTS{}, COUNT{}) are restricted by the grammar to reading clauses only (MATCH, UNWIND, CALL). They cannot contain CREATE, SET, DELETE, or MERGE, so a cypher_sub_query node in the predecessor chain will never contain hidden DML that the helper would miss.

Additionally, subqueries are transformed into standalone Query nodes by transform_cypher_sub_query(), so their internals are already isolated from the outer plan — PostgreSQL treats the subquery as a black box for optimization purposes.

If a future grammar extension ever allows DML inside subqueries, the helper would need updating, but that would be a much larger change with its own design considerations.

Thanks, Greg

@jrgemignani jrgemignani merged commit 217467a into apache:master Feb 27, 2026
10 checks passed
gregfelice added a commit to gregfelice/age that referenced this pull request Feb 28, 2026
…che#2193)

When CREATE introduces a new label and a subsequent MATCH references it
(e.g., CREATE (:Person) WITH ... MATCH (p:Person)), the query returns
0 rows on first execution but works on the second.

Root cause: match_check_valid_label() in transform_cypher_match() runs
before transform_prev_cypher_clause() processes the predecessor chain.
Since CREATE has not yet executed its transform (which creates the label
table as a side effect), the label is not in the cache and the check
generates a One-Time Filter: false plan that returns no rows.

Fix: Skip the early label validity check when the predecessor clause
chain contains a data-modifying operation (CREATE, SET, DELETE, MERGE).
After transform_prev_cypher_clause() completes and any new labels exist
in the cache, run a deferred label check. If the labels are still
invalid at that point, generate an empty result via makeBoolConst(false).

This preserves the existing behavior for MATCH without DML predecessors
(e.g., MATCH-MATCH chains still get the early check and proper error
messages for invalid labels).

Depends on: PR apache#2340 (clause_chain_has_dml helper)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MATCH after CREATE does not return the newly created row

3 participants