Fix chained MERGE not seeing sibling MERGE's changes (#1446)#2344
Fix chained MERGE not seeing sibling MERGE's changes (#1446)#2344jrgemignani merged 2 commits intoapache:masterfrom
Conversation
|
Wrong branch specified |
When multiple MERGEs are chained (e.g. MATCH ... MERGE ... MERGE ...), the non-terminal (first) MERGE returned rows one at a time to the parent plan node. The parent MERGE's lateral join would materialize its hash table on the first row, before the child MERGE had finished all its iterations. This caused the second MERGE to not see entities created by the first MERGE, leading to duplicate nodes. Fix by making non-terminal MERGE eager: it processes ALL input rows and buffers the projected results before returning any to the parent. This ensures all entity creations are committed before any parent plan node scans the tables. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
52c21d4 to
01e8240
Compare
|
Rebased onto master and retarget the PR base branch. Apologies for the wrong branch. |
There was a problem hiding this comment.
Pull request overview
This PR fixes issue #1446 where chained MERGE clauses (e.g., MATCH (x) MERGE ... MERGE ...) failed to see each other's changes, causing duplicate node creation. The root cause was that non-terminal MERGE nodes returned rows one at a time, allowing the parent MERGE's lateral join to materialize its hash table before the child MERGE completed all iterations.
Changes:
- Modified non-terminal MERGE execution to eagerly process all input rows and buffer results before returning any tuples
- Added three new fields to track eager buffering state:
eager_tuples,eager_tuples_index, andeager_buffer_filled - Added comprehensive regression tests reproducing the reported issue and verifying the fix
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/include/executor/cypher_utils.h | Added three fields to cypher_merge_custom_scan_state for eager tuple buffering |
| src/backend/executor/cypher_merge.c | Restructured Case 1 execution logic to eagerly buffer non-terminal MERGE results; added cleanup for eager buffer in end_cypher_merge |
| regress/sql/cypher_merge.sql | Added regression tests for issue #1446 covering 2-node and 3-node scenarios |
| regress/expected/cypher_merge.out | Updated expected output with correct results showing nodes are reused properly |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@gregfelice See Copilot review above, plz. Otherwise, looks good. |
When a non-terminal MERGE receives no input rows from its predecessor (e.g., MATCH returns 0 rows), the eager buffer is filled but empty. The condition at line 688 checked `css->eager_tuples != NIL`, which evaluated to false for an empty buffer, causing execution to fall through to the terminal MERGE code path. This could incorrectly create entities when none should be created. Fix by checking `css->eager_buffer_filled` instead, which correctly distinguishes "buffer not yet filled" from "buffer filled but empty". Add regression test for chained MERGE with empty MATCH result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Fixes #1446.
When multiple MERGEs are chained (e.g.
MATCH (x) MERGE (x)-[:r]->(:t) MERGE (:C)-[:r]->(:t)), the non-terminal (first) MERGE returned rows one at a time to the parent plan node. The parent MERGE's lateral join materialized its hash table on the first row, before the child MERGE had finished all its iterations. This caused the second MERGE to not see entities created by the first, leading to duplicate nodes.Root Cause
The execution plan for chained MERGEs nests as:
The Hash Join for MERGE2's pattern check is materialized once on the first call. When MERGE1 creates new entities during subsequent iterations, those entities are invisible to the already-materialized hash table.
Fix
Make non-terminal MERGE eager: it processes ALL input rows (creating/matching paths as needed) and buffers the projected results in a
ListofHeapTuples. Subsequent calls return rows from the buffer one at a time. This ensures all entity creations from the first MERGE are committed before any parent plan node scans the tables.Changes:
src/backend/executor/cypher_merge.c: Restructureexec_cypher_mergeCase 1 to eagerly buffer results for non-terminal MERGE; terminal MERGE retains existing behaviorsrc/include/executor/cypher_utils.h: Addeager_tuples,eager_tuples_index,eager_buffer_filledtocypher_merge_custom_scan_stateregress/sql/cypher_merge.sql: Add regression tests for First MERGE does not see the second MERGE's changes #1446regress/expected/cypher_merge.out: Updated expected outputReproduction (from issue)
Before fix: Returns 4 rows with duplicate
:Cnodes created.After fix: Returns 2 rows (A, C);
:Cis found and reused by the second MERGE.Test Plan
AI Disclosure
AI tools (Claude by Anthropic) were used to assist in developing this fix, including root cause analysis, code changes, and regression tests.