ref(service): Consistent atomic tombstone writes by jan-auer · Pull Request #396 · getsentry/objectstore

jan-auer · 2026-03-19T17:06:25Z

Before this PR, the tiered storage implementation used unconditional writes and stored all objects under the same key as their tombstone. This could lead to lost updates and orphans on concurrent writes and deletes. This PR adds a unique revision to the key path of each large-object and uses check-and-set that commits only if the stored tombstone's revision still matches the last known state. Together, these provide atomic commit points for all three operations:

Replaces create_tombstone with a single compare_and_write method on HighVolumeBackend. The method takes a TieredWrite (Tombstone, Object, or Delete) and an optional expected redirect target, and applies the mutation only if the current row state matches.
Large-object writes now store the payload at a unique revision key ({key}/{uuid_v7}) so each write gets a distinct storage path. A get_tiered_metadata read establishes the CAS precondition before writing to GCS, and a subsequent compare_and_write atomically commits the tombstone. CAS conflicts clean up the new GCS blob; CAS errors do the same then propagate.
Small-object writes that encounter an existing tombstone now CAS-swap it for inline data rather than routing to LT. This fixes the expiry-mismatch TODO and keeps small objects in HV.
Deletes Remove the tombstone first (commit point), then clean up GCS best-effort. This is the inverse of the previous ordering: if GCS cleanup fails an orphan blob remains (accepted), but the tombstone is gone and the object is unreachable.

Tests have been reorganized for readability. There are no tests covering races and extreme edge cases. These will be added in #363.

Ref FS-309

Replace `create_tombstone` with a single `cas_put` method on `HighVolumeBackend`. The method takes a `CasMutation` (WriteTombstone, WriteInline, or Delete) and an optional expected redirect target, and applies the mutation only if the current row state matches. Large-object writes now store the payload at a unique revision key (`{key}/{uuid_v7}`) before CAS-committing the tombstone, so concurrent writers don't silently overwrite each other. Small objects that land on an existing tombstone are CAS-swapped inline (fixing the expiry-mismatch TODO). Deletes CAS-remove the tombstone first, then clean up GCS best-effort — reversing the previous ordering. Co-Authored-By: Claude <noreply@anthropic.com>

linear-code · 2026-03-19T17:06:30Z

FS-309 Implement tiered storage v2

Rename `CasMutation` to `TieredWrite` and `cas_put` to `compare_and_write` for clarity — the names now match the trait method and the write variants (`WriteInline` → `Object`, `WriteTombstone` → `Tombstone`) align with the existing `TieredGet`/`TieredMetadata` vocabulary. Extract `put_high_volume`, `put_long_term`, and `counting_stream` from the `put_object` body in `tiered.rs`, and rename `revision_id` to `new_long_term_revision` to make its purpose explicit. No behavior change. Co-Authored-By: Claude <noreply@anthropic.com>

…ents Factor the shared legacy-metadata branch out of both `tombstone_predicate` and `redirect_target_filter` into a `legacy_tombstone_filter` helper. Improve the `redirect_target_filter` doc comment to show the Chain/Interleave structure as bullet points. Co-Authored-By: Claude <noreply@anthropic.com>

Update doc comment to reference `compare_and_write` instead of the old `tiered_write` method name. Co-Authored-By: Claude <noreply@anthropic.com>

…odel The tiered storage documentation still described the pre-atomic write model. Update all three documentation layers to reflect the current compare-and-swap algorithm: - architecture.md: motivates why HighVolumeBackend needs atomic operations and points to the tiered module for details - tiered module doc: describes revision keys and per-operation sequences (large write, small write, delete) with conflict handling - TieredStorage struct doc: concise consistency summary with pointer to the module-level diagrams Also fixes the delete ordering (was backwards) and corrects field name references (high_volume_backend -> high_volume).

jan-auer · 2026-03-19T21:28:01Z

objectstore-service/src/backend/tiered.rs

+            .compare_and_write(id, Some(&target), write)
+            .await?;
+
+        // TODO: Schedule cleanups into background to ensure eventual cleanup


These TODOs are intentional and should be merged in. We'll address them when the oplog is merged and we can safely start async background jobs with exponential backoff that persist across restarts and crashes.

This is also why we swallow the errors below (let _ = ...). Internally, all these operations are retried 3 times, which makes them much less likely to fail.

jan-auer · 2026-03-19T21:35:34Z

objectstore-service/src/backend/tiered.rs

+//! motivation, and the [`TieredStorage`] struct docs for routing and tombstone
+//! semantics.
+//!
+//! # Cross-Tier Consistency


This is where the algorithm is described

Fold redundant tests, fill gaps in CAS conflict paths, and reorganize into named groups matching the operation they exercise.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-19T22:07:27Z

objectstore-service/src/backend/bigtable.rs

+                .await
        })
+        .await?
+        .predicate_matched;


Retrying non-idempotent CAS risks false negatives and data loss

Medium Severity

The compare_and_write method wraps check_and_mutate_row in a retry block, but CheckAndMutateRow is not idempotent. If the server applies the mutation but the response is lost to a transient error, the retry re-evaluates against the already-modified row and returns an incorrect predicate_matched value. Unlike put_non_tombstone/delete_non_tombstone, which have outer re-read-and-retry loops to self-heal from false CAS results, compare_and_write returns the result directly. In put_long_term, a false negative causes the caller to delete the blob it just successfully committed, leaving a dangling tombstone.

Additional Locations (1)

objectstore-service/src/backend/tiered.rs#L267-L286

This is true. The final plan includes we also compare against the target revision and treat it as match - this got dropped in the implementation since we're not doing crash recovery yet. Maybe it's better to add this right away.

The general problem is much harder to solve: what if the CAS succeeds but we get a non retryable failure back.

I'll be adding this in a follow-up that sets up proper idempotency for the CAS predicates.

objectstore-service/src/service.rs

lcian

Looks good to me.

jan-auer changed the title ~~ref(service): Add CAS commit points for atomic tombstone writes~~ ref(service): Consistent atomic tombstone writes Mar 19, 2026

jan-auer and others added 4 commits March 19, 2026 21:13

fix(service): Fix broken intra-doc link on TieredWrite

3e77dcf

Update doc comment to reference `compare_and_write` instead of the old `tiered_write` method name. Co-Authored-By: Claude <noreply@anthropic.com>

jan-auer commented Mar 19, 2026

View reviewed changes

test(service): Restructure tiered.rs tests for coverage and clarity

670a283

Fold redundant tests, fill gaps in CAS conflict paths, and reorganize into named groups matching the operation they exercise.

jan-auer force-pushed the ref/atomic-tombstone-cas branch from aaf81e7 to 670a283 Compare March 19, 2026 21:51

jan-auer marked this pull request as ready for review March 19, 2026 21:56

jan-auer requested a review from a team as a code owner March 19, 2026 21:56

cursor bot reviewed Mar 19, 2026

View reviewed changes

jan-auer added 2 commits March 20, 2026 08:24

meta: Document last-writer-wins semantics

9bbf7b2

meta: Further improve

13e67f6

jan-auer mentioned this pull request Mar 20, 2026

test(service): Restructure bigtable tests into labelled sections #397

Merged

lcian approved these changes Mar 20, 2026

View reviewed changes

jan-auer merged commit 38876f9 into main Mar 20, 2026
21 checks passed

jan-auer deleted the ref/atomic-tombstone-cas branch March 20, 2026 10:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ref(service): Consistent atomic tombstone writes#396

ref(service): Consistent atomic tombstone writes#396
jan-auer merged 8 commits intomainfrom
ref/atomic-tombstone-cas

jan-auer commented Mar 19, 2026 •

edited

Loading

Uh oh!

linear-code bot commented Mar 19, 2026

Uh oh!

jan-auer Mar 19, 2026 •

edited

Loading

Uh oh!

jan-auer Mar 19, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 19, 2026

Uh oh!

jan-auer Mar 19, 2026 •

edited

Loading

Uh oh!

jan-auer Mar 20, 2026

Uh oh!

Uh oh!

lcian left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jan-auer commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear-code bot commented Mar 19, 2026

Uh oh!

jan-auer Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jan-auer Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 19, 2026

Choose a reason for hiding this comment

Retrying non-idempotent CAS risks false negatives and data loss

Uh oh!

jan-auer Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jan-auer Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lcian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jan-auer commented Mar 19, 2026 •

edited

Loading

jan-auer Mar 19, 2026 •

edited

Loading

jan-auer Mar 19, 2026 •

edited

Loading