Add data directory locking to NodeFS to prevent multi-process corruption#892
Add data directory locking to NodeFS to prevent multi-process corruption#892reisepass wants to merge 3 commits intoelectric-sql:mainfrom
Conversation
Tests that FAIL without the fix — reproducing real corruption: - Overlapping instances: triple, staggered, DDL writer, rapid cycling - HMR double-instance: lock-based blocking and rapid swap corruption - WAL bloat burst mode: rapid kills without checkpoint cause corruption Co-Authored-By: Matthaus Wolff <8714327+WolffM@users.noreply.github.com>
- Add PID-based file lock to NodeFS to prevent overlapping instance corruption - Detect partially-initialized data dirs and move to .corrupt-<timestamp> backup - Add tests verifying partial init backup behavior
|
This makes a lot of sense. Accidental multi-process access during local dev is very real, especially when using pnpm dev in multiple terminals. The lock file approach feels like a clean and pragmatic safeguard, especially since Postgres assumes exclusive control over the data directory. Curious is there any plan to expose a clearer error message when the second process is rejected so it’s obvious what happened? |
|
This is a really practical improvement. Locking the data dir should prevent a whole class of annoying dev-time corruption issues, and the partial initdb handling makes the setup much more robust. Great work! |
|
@reisepass Thank you for this!
I would prefer if you address only the lock file issue for the moment. I just skimmed through the changes (sorry, really busy with other things), seems like there are a lot of other files (tests?) added. For the sake of simplicity, if the only thing addressed is the locking of folder access, please consolidate in a single test file or add to existing test files the minimum that tests the new functionality.
Sounds reasonable. |
…-safety tests - Replace verbose crash-safety test suite with one focused nodefs-lock test - Update error message to clearly state data dir is "already in use" with actionable guidance (close other instance, use different dir, or delete lock) - Remove partial initdb detection (out of scope for this PR) - Clean up .gitignore and changeset description
I'v been running into issues with pglite getting corrupted in persist to disk mode. The most human scenario is that you
have on pnpm dev running in one window and then you start for a quick test not remembering that you already had one open.
In my dev work this honestly happens daily so it was hard to use pglite practically as an sqlite replacement.
But simple fix just add a lock file.
This is nothing fancy since postgres assumes it has full control we just reject the second process trying to use the data
dir.