Skip to content

Fix chained MERGE not seeing sibling MERGE's changes (#1446)#2344

Merged
jrgemignani merged 2 commits intoapache:masterfrom
gregfelice:fix_1446_chained_merge_visibility
Feb 27, 2026
Merged

Fix chained MERGE not seeing sibling MERGE's changes (#1446)#2344
jrgemignani merged 2 commits intoapache:masterfrom
gregfelice:fix_1446_chained_merge_visibility

Conversation

@gregfelice
Copy link
Contributor

@gregfelice gregfelice commented Feb 26, 2026

Summary

Fixes #1446.

When multiple MERGEs are chained (e.g. MATCH (x) MERGE (x)-[:r]->(:t) MERGE (:C)-[:r]->(:t)), the non-terminal (first) MERGE returned rows one at a time to the parent plan node. The parent MERGE's lateral join materialized its hash table on the first row, before the child MERGE had finished all its iterations. This caused the second MERGE to not see entities created by the first, leading to duplicate nodes.

Root Cause

The execution plan for chained MERGEs nests as:

MERGE2 (terminal)
  └── NestLoop Left Join
        ├── MERGE1 (non-terminal)
        │     └── NestLoop Left Join
        │           ├── child scan (MATCH)
        │           └── lateral check for MERGE1 pattern
        └── lateral check for MERGE2 pattern (Hash Join)

The Hash Join for MERGE2's pattern check is materialized once on the first call. When MERGE1 creates new entities during subsequent iterations, those entities are invisible to the already-materialized hash table.

Fix

Make non-terminal MERGE eager: it processes ALL input rows (creating/matching paths as needed) and buffers the projected results in a List of HeapTuples. Subsequent calls return rows from the buffer one at a time. This ensures all entity creations from the first MERGE are committed before any parent plan node scans the tables.

Changes:

  • src/backend/executor/cypher_merge.c: Restructure exec_cypher_merge Case 1 to eagerly buffer results for non-terminal MERGE; terminal MERGE retains existing behavior
  • src/include/executor/cypher_utils.h: Add eager_tuples, eager_tuples_index, eager_buffer_filled to cypher_merge_custom_scan_state
  • regress/sql/cypher_merge.sql: Add regression tests for First MERGE does not see the second MERGE's changes #1446
  • regress/expected/cypher_merge.out: Updated expected output

Reproduction (from issue)

SELECT * FROM create_graph('g');
SELECT * FROM cypher('g', $$ CREATE (:A), (:C) $$) AS (a agtype);
SELECT * FROM cypher('g', $$
    MATCH (x)
    MERGE (x)-[:r]->(:t)
    MERGE (:C)-[:r]->(:t)
    RETURN x
$$) AS (a agtype);

Before fix: Returns 4 rows with duplicate :C nodes created.
After fix: Returns 2 rows (A, C); :C is found and reused by the second MERGE.

Test Plan

  • Reporter's exact reproduction case returns correct results
  • Verified graph state: A(1), C(1), t(2), 2 edges
  • 3-node variant: hub created once and reused across iterations
  • All 31 existing regression tests pass

AI Disclosure

AI tools (Claude by Anthropic) were used to assist in developing this fix, including root cause analysis, code changes, and regression tests.

@jrgemignani
Copy link
Contributor

Wrong branch specified

Copy link
Contributor

@jrgemignani jrgemignani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong branch

When multiple MERGEs are chained (e.g. MATCH ... MERGE ... MERGE ...),
the non-terminal (first) MERGE returned rows one at a time to the parent
plan node. The parent MERGE's lateral join would materialize its hash
table on the first row, before the child MERGE had finished all its
iterations. This caused the second MERGE to not see entities created by
the first MERGE, leading to duplicate nodes.

Fix by making non-terminal MERGE eager: it processes ALL input rows and
buffers the projected results before returning any to the parent. This
ensures all entity creations are committed before any parent plan node
scans the tables.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gregfelice gregfelice force-pushed the fix_1446_chained_merge_visibility branch from 52c21d4 to 01e8240 Compare February 27, 2026 00:09
@gregfelice gregfelice changed the base branch from release/PG18/1.7.0 to master February 27, 2026 00:09
@gregfelice
Copy link
Contributor Author

Rebased onto master and retarget the PR base branch. Apologies for the wrong branch.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes issue #1446 where chained MERGE clauses (e.g., MATCH (x) MERGE ... MERGE ...) failed to see each other's changes, causing duplicate node creation. The root cause was that non-terminal MERGE nodes returned rows one at a time, allowing the parent MERGE's lateral join to materialize its hash table before the child MERGE completed all iterations.

Changes:

  • Modified non-terminal MERGE execution to eagerly process all input rows and buffer results before returning any tuples
  • Added three new fields to track eager buffering state: eager_tuples, eager_tuples_index, and eager_buffer_filled
  • Added comprehensive regression tests reproducing the reported issue and verifying the fix

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/include/executor/cypher_utils.h Added three fields to cypher_merge_custom_scan_state for eager tuple buffering
src/backend/executor/cypher_merge.c Restructured Case 1 execution logic to eagerly buffer non-terminal MERGE results; added cleanup for eager buffer in end_cypher_merge
regress/sql/cypher_merge.sql Added regression tests for issue #1446 covering 2-node and 3-node scenarios
regress/expected/cypher_merge.out Updated expected output with correct results showing nodes are reused properly

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jrgemignani
Copy link
Contributor

@gregfelice See Copilot review above, plz. Otherwise, looks good.

When a non-terminal MERGE receives no input rows from its predecessor
(e.g., MATCH returns 0 rows), the eager buffer is filled but empty.
The condition at line 688 checked `css->eager_tuples != NIL`, which
evaluated to false for an empty buffer, causing execution to fall
through to the terminal MERGE code path. This could incorrectly
create entities when none should be created.

Fix by checking `css->eager_buffer_filled` instead, which correctly
distinguishes "buffer not yet filled" from "buffer filled but empty".

Add regression test for chained MERGE with empty MATCH result.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jrgemignani jrgemignani merged commit 20ada84 into apache:master Feb 27, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

First MERGE does not see the second MERGE's changes

3 participants