Description
A race condition exists in dbsync that can block the synchronization process, especially with large projects that take a long time to download. When dbsync initiates a pull operation, and another client pushes a new version to the Mergin Maps server before the pull is complete, dbsync ends up with an outdated local version of the project.
This leads to a failure in the subsequent push operation, because of a strict version check that ensures the local version matches the server version. The push function raises an error: "There are pending changes on server - need to pull them first.". This creates a loop where dbsync is stuck trying to pull, but each pull is slow and susceptible to the same race condition, requiring manual intervention like --force-init, which can lead to data loss.
Why --force-init is not a solution
Using --force-init is a heavy-handed approach that wipes the local state and re-initializes the synchronization from scratch. This is not a viable solution in a production environment for several reasons:
- Data Loss: If there are changes in the PostgreSQL database that have not been pushed to the Mergin Maps server, a
--force-init will wipe the base and modified schemas and re-create them from the GeoPackage file. This will cause any changes made in the database to be lost.
- Manual Intervention: The need for manual intervention defeats the purpose of an automated synchronization daemon.
- Downtime: The re-initialization process can be time-consuming for large projects, leading to extended downtime for the synchronization service.
The problematic version check is located in the push function in dbsync.py:
# dbsync.py in push()
# ...
# check there are no pending changes on server
if server_version != local_version:
raise DbSyncError("There are pending changes on server - need to pull them first.")
Real-world Scenario
- T0:
dbsync starts a pull operation for a large project with many photos. The server is at version v100. The download is expected to take over a minute.
- T0 + 30s: A surveyor in the field finishes their work and syncs their mobile client. This creates version
v101 on the Mergin Maps server.
- T0 + 90s:
dbsync completes its download of v100 and applies the changes to the PostgreSQL database. The local project version for dbsync is now v100.
- T0 + 95s: The
dbsync daemon proceeds to the push step to sync changes from the database back to Mergin Maps.
- Failure: The
push operation detects that the server is at v101 while the local version is v100. It aborts the push, and dbsync is effectively blocked.
Proposed Solution
To resolve this, the push function should be made more resilient. Instead of immediately failing upon a version mismatch, it should attempt to resolve the situation automatically by pulling the latest changes.
The proposed solution is to modify the push function in dbsync.py. When a version mismatch is detected, dbsync should:
- Automatically trigger the
pull function. The existing pull function is capable of handling a rebase of local database changes on top of the incoming server changes.
- After the
pull is complete, re-check the version.
- If the versions now match, proceed with the
push operation.
- If the versions still do not match after the automatic pull, then raise an error, as this would indicate a more serious problem that requires manual intervention.
This "pull-and-retry" mechanism would make the synchronization process more robust for projects with long download times and active collaboration, avoiding the need for manual resets.
Description
A race condition exists in
dbsyncthat can block the synchronization process, especially with large projects that take a long time to download. Whendbsyncinitiates apulloperation, and another client pushes a new version to the Mergin Maps server before thepullis complete,dbsyncends up with an outdated local version of the project.This leads to a failure in the subsequent
pushoperation, because of a strict version check that ensures the local version matches the server version. Thepushfunction raises an error:"There are pending changes on server - need to pull them first.". This creates a loop wheredbsyncis stuck trying to pull, but each pull is slow and susceptible to the same race condition, requiring manual intervention like--force-init, which can lead to data loss.Why
--force-initis not a solutionUsing
--force-initis a heavy-handed approach that wipes the local state and re-initializes the synchronization from scratch. This is not a viable solution in a production environment for several reasons:--force-initwill wipe thebaseandmodifiedschemas and re-create them from the GeoPackage file. This will cause any changes made in the database to be lost.The problematic version check is located in the
pushfunction indbsync.py:Real-world Scenario
dbsyncstarts apulloperation for a large project with many photos. The server is at versionv100. The download is expected to take over a minute.v101on the Mergin Maps server.dbsynccompletes its download ofv100and applies the changes to the PostgreSQL database. The local project version fordbsyncis nowv100.dbsyncdaemon proceeds to thepushstep to sync changes from the database back to Mergin Maps.pushoperation detects that the server is atv101while the local version isv100. It aborts the push, anddbsyncis effectively blocked.Proposed Solution
To resolve this, the
pushfunction should be made more resilient. Instead of immediately failing upon a version mismatch, it should attempt to resolve the situation automatically by pulling the latest changes.The proposed solution is to modify the
pushfunction indbsync.py. When a version mismatch is detected,dbsyncshould:pullfunction. The existingpullfunction is capable of handling a rebase of local database changes on top of the incoming server changes.pullis complete, re-check the version.pushoperation.This "pull-and-retry" mechanism would make the synchronization process more robust for projects with long download times and active collaboration, avoiding the need for manual resets.