Skip to content

fix: add foreign record locking to ManyMany, OneMany, and OneOne link operations#2679

Open
dkindlund wants to merge 1 commit intoteableio:developfrom
dkindlund:fix/many-many-junction-locking
Open

fix: add foreign record locking to ManyMany, OneMany, and OneOne link operations#2679
dkindlund wants to merge 1 commit intoteableio:developfrom
dkindlund:fix/many-many-junction-locking

Conversation

@dkindlund
Copy link
Contributor

Bug Description

saveForeignKeyForManyMany has NO record locking before modifying junction table entries. In contrast, saveForeignKeyForManyOne correctly calls lockForeignRecords before modifications (added in commit 77a8291d2).

Without locking, concurrent ManyMany link updates can:

  • Create duplicate junction rows when two requests both add the same record pair
  • Lose deletions when one request deletes while another reads stale state
  • Cause progressive desync between JSONB cell values and junction table data

Similarly, saveForeignKeyForOneMany (non-one-way path) and saveForeignKeyForOneOne (non-ManyOne branch) modify foreign table records without locking.

Steps to Reproduce

  1. Create two tables with a ManyMany link field
  2. From multiple concurrent API clients, rapidly update the link field on the same record (adding/removing links)
  3. Over time, the junction table accumulates rows that don't match the JSONB cell value
  4. The record becomes stuck: updates crash with TypeError (foreignRecordMap mismatch), deletes fail with FK constraint violations

Expected Behavior

All link field save operations should lock affected foreign records before modifications, preventing concurrent corruption.

What This Fix Does

Adds lockForeignRecords calls to three methods that were missing them:

  1. saveForeignKeyForManyMany: Locks foreign records from toDelete, toAdd, and toDeleteAndReinsert before any junction table modifications
  2. saveForeignKeyForOneMany (non-one-way): Locks all affected foreign records before updating FK columns on the foreign table
  3. saveForeignKeyForOneOne (non-ManyOne branch): Locks affected foreign records before modifications

Also fixes a missing await on the saveForeignKeyForManyMany delegation in saveForeignKeyForOneMany when isOneWay=true. The async call was previously fire-and-forget, meaning junction table modifications could complete after the caller assumed they were done.

Client Information

  • OS: Linux (Docker container on Google Cloud Run)
  • Database: PostgreSQL 17 (Google Cloud SQL)

Platform

Docker standalone (Google Cloud Run)

Additional Context

This was discovered during a production incident where a "Microsoft Windows" technology record accumulated 602 junction table rows while its JSONB cell value showed 0. The desync made the record permanently stuck — API updates crashed with TypeError and deletes failed with FK constraint violations, requiring direct database intervention to resolve.

The root cause was concurrent n8n workflow executions updating the same ManyMany link field simultaneously. The upstream fix in commit 77a8291d2 added locking to saveForeignKeyForManyOne but not to saveForeignKeyForManyMany, saveForeignKeyForOneMany, or saveForeignKeyForOneOne.

… operations

saveForeignKeyForManyMany had NO record locking, unlike
saveForeignKeyForManyOne which calls lockForeignRecords. This meant
concurrent ManyMany link updates could create duplicate junction rows
or lose deletions.

Similarly, saveForeignKeyForOneMany (non-one-way) and
saveForeignKeyForOneOne (non-ManyOne branch) modified foreign table
records without locking.

This fix adds lockForeignRecords calls to all three methods, matching
the pattern already used by saveForeignKeyForManyOne:

- saveForeignKeyForManyMany: locks foreign records from toDelete,
  toAdd, and toDeleteAndReinsert before any junction table modifications
- saveForeignKeyForOneMany: locks all affected foreign records before
  updating FK columns on the foreign table
- saveForeignKeyForOneOne: locks affected foreign records in the else
  branch (selfKeyName !== '__id')

Also fixes a missing await on the saveForeignKeyForManyMany delegation
in saveForeignKeyForOneMany when isOneWay=true. The async call was
previously fire-and-forget, meaning junction table modifications could
complete after the caller assumed they were done.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dkindlund added a commit to dkindlund/teable that referenced this pull request Mar 9, 2026
## Problem

The /link-fix endpoint's InvalidLinkReference repair has a race condition
under concurrent write load. The current two-step approach:

1. checkLinks() — SELECT to detect desynced record IDs
2. fixLinks(recordIds) — UPDATE only those specific records

Between steps 1 and 2, concurrent writes can:
- Create NEW desyncs not in the recordIds list (missed until next run)
- Worsen existing desyncs (JSONB changes between detection and fix)
- Overwrite the fix immediately after it's applied

In production under sustained concurrent API load (200-800 writes/hr),
we've observed full JSONB array wipes (e.g., 9,781 entries reduced to 0)
caused by the read-modify-write race in Teable's link update path. The
integrity fix endpoint, meant to repair these desyncs, is vulnerable to
the same concurrent writes because of this detection-fix gap.

## Solution

Add atomicFixLinks() to IntegrityQueryPostgres that combines detection
and fix into a single UPDATE ... WHERE __id IN (SELECT ...) statement.
Detection and fix execute under the same MVCC snapshot, eliminating the
application-level gap entirely.

The service (LinkFieldIntegrityService.checkAndFix) tries the atomic
method first. If the database engine doesn't support it (e.g., SQLite),
it falls back to the existing two-step approach. This is a safe,
backwards-compatible change — SQLite behavior is completely unchanged.

## What the /link-fix endpoint covers

For context, the endpoint handles 10 integrity issue types. This PR
improves only InvalidLinkReference. Here is the full list:

| Issue Type | What it detects | Fix applied |
|---|---|---|
| InvalidLinkReference | JSONB link column diverged from junction/FK source of truth | Rebuilds JSONB from junction/FK data (THIS PR) |
| MissingRecordReference | Junction table has rows pointing to deleted records | Deletes orphaned junction rows |
| ForeignTableNotFound | Link field references a deleted table | No auto-fix (requires manual intervention) |
| ForeignKeyHostTableNotFound | Junction table is missing | No auto-fix |
| ForeignKeyNotFound | Missing FK columns in junction table | Recreates columns, backfills from JSONB |
| SelfKeyNotFound | Missing self-reference key in junction | No auto-fix |
| SymmetricFieldNotFound | Bidirectional link missing its counterpart | Converts to one-way link |
| ReferenceFieldNotFound | Referenced record was deleted | Deletes orphaned reference |
| UniqueIndexNotFound | Missing unique constraint for OneOne links | Creates the index |
| EmptyString | Text fields have empty strings instead of NULL | Converts to NULL |

## Relationship types handled

The atomic fix handles all four relationship types:
- ManyMany (isMultiValue=true): Rebuilds JSONB array from junction table
- OneMany (isMultiValue=true): Same as ManyMany
- ManyOne (isMultiValue=false, FK in same table): Rebuilds single JSONB object from FK column
- OneOne (isMultiValue=false, FK in host table): Rebuilds via cross-table join

## Files changed

- abstract.ts: Added atomicFixLinks() with default null return
- integrity-query.postgres.ts: PostgreSQL implementation of atomicFixLinks()
- link-field.service.ts: checkAndFix() tries atomic first, falls back to two-step

## Related issues

- teableio#2680 (DataLoader cache invalidation under concurrency)
- teableio#2676 (Sort record IDs in lockForeignRecords)
- teableio#2677 (Wrap simpleUpdateRecords with transaction/timeout/retry)
- teableio#2679 (Add foreign record locking to ManyMany, OneMany, OneOne)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant