Fix MATCH after CREATE returning 0 rows (issue #2308)#2340
Fix MATCH after CREATE returning 0 rows (issue #2308)#2340jrgemignani merged 1 commit intoapache:masterfrom
Conversation
When a MATCH clause follows CREATE + WITH and re-uses bound variables (e.g. CREATE (a)-[e]->(b) WITH a,e,b MATCH p=(a)-[e]->(b)), the MATCH generates filter quals (age_start_id(e) = age_id(a), etc.) that reference only columns from the predecessor subquery. PostgreSQL's optimizer pushes these quals through the transparent subquery layers into the CREATE's child plan, where they evaluate on NULL values before CREATE has executed — always yielding 0 rows. Fix: mark the predecessor subquery RTE as security_barrier when the clause chain contains a data-modifying operation (CREATE, SET, DELETE, or MERGE). This prevents PostgreSQL from pushing filter quals into the subquery, ensuring they evaluate after the DML produces output values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…che#2193) When CREATE introduces a new label and a subsequent MATCH references it (e.g., CREATE (:Person) WITH ... MATCH (p:Person)), the query returns 0 rows on first execution but works on the second. Root cause: match_check_valid_label() in transform_cypher_match() runs before transform_prev_cypher_clause() processes the predecessor chain. Since CREATE has not yet executed its transform (which creates the label table as a side effect), the label is not in the cache and the check generates a One-Time Filter: false plan that returns no rows. Fix: Skip the early label validity check when the predecessor clause chain contains a data-modifying operation (CREATE, SET, DELETE, MERGE). After transform_prev_cypher_clause() completes and any new labels exist in the cache, run a deferred label check. If the labels are still invalid at that point, generate an empty result via makeBoolConst(false). This preserves the existing behavior for MATCH without DML predecessors (e.g., MATCH-MATCH chains still get the early check and proper error messages for invalid labels). Depends on: PR apache#2340 (clause_chain_has_dml helper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes a planner/optimizer interaction in Apache AGE where a MATCH following CREATE ... WITH ... (reusing bound variables) could have its generated filter quals pushed below the DML plan, preventing the DML from executing and causing the query to return 0 rows.
Changes:
- Mark the predecessor subquery RTE as a PostgreSQL
security_barrierwhen the predecessor clause chain includes DML (CREATE/SET/DELETE/MERGE), preventing qual pushdown into the DML’s child plan. - Add a helper (
clause_chain_has_dml()) to detect DML operations in the clause chain. - Add regression coverage for issue #2308 and corresponding expected output.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/backend/parser/cypher_clause.c | Adds DML detection and sets security_barrier on the predecessor subquery RTE to prevent incorrect qual pushdown past DML. |
| regress/sql/cypher_match.sql | Adds regression queries covering CREATE+WITH+MATCH (and related variations) for issue #2308. |
| regress/expected/cypher_match.out | Captures expected outputs for the new regression cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@gregfelice Please see Copilot's comment above. Thoughts? |
|
Hi jrgemignani, Copilot raises whether This isn't a concern in practice — Cypher subquery expressions ( Additionally, subqueries are transformed into standalone If a future grammar extension ever allows DML inside subqueries, the helper would need updating, but that would be a much larger change with its own design considerations. Thanks, Greg |
…che#2193) When CREATE introduces a new label and a subsequent MATCH references it (e.g., CREATE (:Person) WITH ... MATCH (p:Person)), the query returns 0 rows on first execution but works on the second. Root cause: match_check_valid_label() in transform_cypher_match() runs before transform_prev_cypher_clause() processes the predecessor chain. Since CREATE has not yet executed its transform (which creates the label table as a side effect), the label is not in the cache and the check generates a One-Time Filter: false plan that returns no rows. Fix: Skip the early label validity check when the predecessor clause chain contains a data-modifying operation (CREATE, SET, DELETE, MERGE). After transform_prev_cypher_clause() completes and any new labels exist in the cache, run a deferred label check. If the labels are still invalid at that point, generate an empty result via makeBoolConst(false). This preserves the existing behavior for MATCH without DML predecessors (e.g., MATCH-MATCH chains still get the early check and proper error messages for invalid labels). Depends on: PR apache#2340 (clause_chain_has_dml helper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Problem
CREATE (a)-[e]->(b) WITH a,e,b MATCH p=(a)-[e]->(b) SET a.something = 'something' RETURN areturns 0 rows instead of the created vertex.Reported in #2308.
Root Cause
When MATCH follows CREATE + WITH and re-uses all bound variables, the MATCH generates filter quals (
age_start_id(e) = age_id(a) AND age_end_id(e) = age_id(b)) that reference only columns from the predecessor subquery. Since the MATCH adds no new table scans (all entities are bound), PostgreSQL's optimizer treats the subquery as transparent and pushes these filter quals down through the subquery layers into the CREATE's child plan.Before fix — EXPLAIN plan showing the bug:
The filter is placed below the CREATE custom scan, on its child subquery. At execution time:
Resultproduces a dummy row with NULL valuesage_start_id(NULL) = age_id(NULL)→ failsThe filter should evaluate after CREATE produces its output (where
a,e,bhave actual values), not before.Fix
In
transform_cypher_match_pattern(), after transforming the predecessor clause chain into a subquery RTE, check if the chain contains any data-modifying operation (CREATE, SET, DELETE, or MERGE). If it does, setrte->security_barrier = trueon the subquery RTE. This is PostgreSQL's standard mechanism to prevent qual pushdown through subqueries — the optimizer will not flatten a security-barrier subquery or push filter conditions into it.A helper function
clause_chain_has_dml()walks the clause chain to detect DML operations.After fix — EXPLAIN plan showing correct structure:
Now the filter is placed above the CREATE custom scan. CREATE runs first (inserting entities), then the filter evaluates on the output values and correctly passes.
Files Changed
src/backend/parser/cypher_clause.c— Addedclause_chain_has_dml()helper function andsecurity_barrierlogic intransform_cypher_match_pattern()regress/sql/cypher_match.sql— Regression tests for issue MATCH after CREATE does not return the newly created row #2308regress/expected/cypher_match.out— Expected test outputRegression Tests
Added tests covering:
CREATE + WITH + MATCH + SET + RETURN(1 row expected)All 31 existing regression tests pass with this change.
Closes #2308.
AI Disclosure
AI tools (Claude by Anthropic) were used to assist in developing this fix, including root cause analysis, code changes, and regression tests.