Skip to content

Comprehensive consistency test suite for TieredStorage#363

Draft
jan-auer wants to merge 3 commits intomainfrom
test/tiered-storage-consistency
Draft

Comprehensive consistency test suite for TieredStorage#363
jan-auer wants to merge 3 commits intomainfrom
test/tiered-storage-consistency

Conversation

@jan-auer
Copy link
Member

@jan-auer jan-auer commented Mar 9, 2026

Formalizes three consistency invariants for TieredStorage and adds a structured test suite that proves they hold under normal operation and documents where they break.

Invariants

  • No OrphanLT: if LT has data, HV must have a tombstone pointing to it
  • No DualData: HV and LT must not both contain non-tombstone data
  • OrphanTombstone is safe: tombstone in HV with nothing in LT must return None on read

Testing strategy

Tests are organized into five categories, all using mocked backends (no real GCS/BigTable):

  1. Happy path (16 tests) — all state transitions including overwrites, boundary cases (0 bytes, exactly 1 MiB), and tombstone redirect behavior
  2. Backend outages (12 tests) — error injection at every step of every operation, verifying check_invariants passes after each failure (state unchanged or safely degraded)
  3. Pod termination (2 tests) — drop futures mid-operation via SyncBackend + timeout, proving OrphanLT occurs on insert kill and OrphanTombstone (safe) on delete kill
  4. Concurrent races (3 tests) — deterministic interleaving via Notify-based sync hooks, proving insert+insert and insert+delete produce invariant violations
  5. Property-based fuzzing (1 proptest, 100 random sequences) — random operation sequences on 3 keys with assert_consistent after every operation

Known violations

All four known violations run through check_invariants and assert it returns Err:

  • Pod kill between LT write and tombstone write → OrphanLT
  • Concurrent insert+insert on same key → DualData
  • Concurrent insert+delete on same key → OrphanLT
  • Insert tombstone write fails AND cleanup fails → OrphanLT

When a fix lands, check_invariants will return Ok, the unwrap_err() will panic, and the test must be updated to assert_consistent — making fixes self-enforcing.

Ref FS-236

Ref FS-236

Formalizes the TieredStorage consistency invariants and adds structured
tests proving they hold under normal operation and documenting where
they break under failure, pod termination, and concurrency.
@linear-code
Copy link

linear-code bot commented Mar 9, 2026

jan-auer added 2 commits March 9, 2026 17:40
Extract insert_small/insert_large, make_failing_storage, payload
constants, check_invariants_core, and SyncBackend builder to eliminate
repeated boilerplate across 34 tests. No coverage changes.
Add three chaos fuzz tests that run concurrent operations against
TieredStorage with a ChaosBackend (yield-based interleaving + error
injection) and assert that the known invariant violations occur:

- concurrent_insert_large_insert_small: DualData from racing inserts
- concurrent_insert_delete_from_large_state: OrphanLT + DualData
- concurrent_inserts_with_tombstone_write_errors: DualData from
  tombstone write failure + cleanup failure

Each test documents a gap in the current algorithm. When the algorithm
is hardened, flip the assertions to verify the violations are gone.
jan-auer added a commit that referenced this pull request Mar 20, 2026
Before this PR, the tiered storage implementation used unconditional
writes and stored all objects under the same key as their tombstone.
This could lead to lost updates and orphans on concurrent writes and
deletes. This PR adds a unique revision to the key path of each
large-object and uses check-and-set that commits only if the stored
tombstone's revision still matches the last known state. Together, these
provide atomic commit points for all three operations:

- Replaces `create_tombstone` with a single `compare_and_write` method
on `HighVolumeBackend`. The method takes a `TieredWrite` (Tombstone,
Object, or Delete) and an optional expected redirect target, and applies
the mutation only if the current row state matches.

- **Large-object writes** now store the payload at a unique revision key
(`{key}/{uuid_v7}`) so each write gets a distinct storage path. A
`get_tiered_metadata` read establishes the CAS precondition before
writing to GCS, and a subsequent `compare_and_write` atomically commits
the tombstone. CAS conflicts clean up the new GCS blob; CAS errors do
the same then propagate.

- **Small-object writes** that encounter an existing tombstone now
CAS-swap it for inline data rather than routing to LT. This fixes the
expiry-mismatch TODO and keeps small objects in HV.

- **Deletes** Remove the tombstone first (commit point), then clean up
GCS best-effort. This is the inverse of the previous ordering: if GCS
cleanup fails an orphan blob remains (accepted), but the tombstone is
gone and the object is unreachable.

Tests have been reorganized for readability. There are no tests covering
races and extreme edge cases. These will be added in #363.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant