node/components/storage/snapshot: add file inventory, trust model, range arithmetic#20527
Merged
node/components/storage/snapshot: add file inventory, trust model, range arithmetic#20527
Conversation
AskAlexSharov
approved these changes
Apr 14, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Introduces foundational data structures for decentralized/sparse snapshot distribution: a storage-side snapshot inventory with trust semantics and step-range arithmetic, plus a V2 chain.toml manifest format and additional ENR metadata to advertise domain coverage/merge depth.
Changes:
- Extend
chain-tomlENR entry to include domain step coverage and merge-depth metadata. - Add new
node/components/storage/snapshotpackage implementingTrustLevel, step-range set operations, canonical layout helpers, and anInventory/LiveInventory. - Add
db/downloader/chaintoml_v2.go+ tests to generate/parse a structured V2chain.tomlfrom the snapshot inventory.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
p2p/enr/chain_toml.go |
Adds DomainSteps/MergeDepth fields to the chain-toml ENR entry RLP payload. |
node/components/storage/snapshot/trust.go |
Introduces the snapshot trust ladder (none → consensus → verified) and helpers. |
node/components/storage/snapshot/ranges.go |
Implements half-open step ranges and range-set operations (normalize/coverage/gaps/union). |
node/components/storage/snapshot/ranges_test.go |
Unit tests for step-range arithmetic and set operations. |
node/components/storage/snapshot/inventory.go |
Adds an in-memory snapshot file inventory with coverage queries, merges, and trust promotion. |
node/components/storage/snapshot/inventory_test.go |
Unit tests for inventory behavior (coverage, gaps, trust promotion, merge replacement). |
node/components/storage/snapshot/populate.go |
Adds LiveInventory to refresh inventory state from a pinned “visible files” provider and to set torrent hashes. |
node/components/storage/snapshot/canonical.go |
Adds canonical layout computation and helpers to compare/validate layouts. |
node/components/storage/snapshot/canonical_test.go |
Unit tests for canonical layout/validation logic. |
db/downloader/chaintoml_v2.go |
Adds V2 structured chain.toml model + generator/marshal/parser + trust filtering helpers. |
db/downloader/chaintoml_v2_test.go |
Tests for V2 version detection, generation, parsing, trust filtering, and canonicity checks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
20
to
33
| type ChainToml struct { | ||
| AuthoritativeTx uint64 // max tx for entries from local disk + preverified.toml | ||
| KnownTx uint64 // max tx for all entries (≥ AuthoritativeTx) | ||
| InfoHash [20]byte // BitTorrent V1 info-hash (SHA1) of the chain.toml torrent | ||
| DomainSteps uint64 // total domain steps covered (0 = V1-only, no domain data) | ||
| MergeDepth uint64 // largest canonical file size in steps (0 = unknown) | ||
| } | ||
|
|
||
| func (v ChainToml) ENRKey() string { return "chain-toml" } | ||
|
|
||
| // EncodeRLP implements rlp.Encoder. | ||
| func (v ChainToml) EncodeRLP(w io.Writer) error { | ||
| return rlp.Encode(w, &chainTomlRLP{AuthoritativeTx: v.AuthoritativeTx, KnownTx: v.KnownTx, InfoHash: v.InfoHash}) | ||
| return rlp.Encode(w, &chainTomlRLP{AuthoritativeTx: v.AuthoritativeTx, KnownTx: v.KnownTx, InfoHash: v.InfoHash, DomainSteps: v.DomainSteps, MergeDepth: v.MergeDepth}) | ||
| } |
Comment on lines
+93
to
+109
| // Build the layout from local files and check canonicity. | ||
| var layout snapshotinv.StepRanges | ||
| for _, f := range files { | ||
| layout = append(layout, f.Range()) | ||
| } | ||
| layout = layout.Normalize() | ||
|
|
||
| // Compute total coverage. | ||
| var coverageFrom, coverageTo uint64 | ||
| if len(layout) > 0 { | ||
| coverageFrom = layout[0].From | ||
| coverageTo = layout[len(layout)-1].To | ||
| } | ||
|
|
||
| dm := &DomainManifest{ | ||
| Coverage: [2]uint64{coverageFrom, coverageTo}, | ||
| } |
Comment on lines
+158
to
+165
| // MarshalV2 serializes a V2 manifest to deterministic TOML bytes. | ||
| func MarshalV2(manifest *ChainTomlV2) ([]byte, error) { | ||
| var buf bytes.Buffer | ||
| enc := toml.NewEncoder(&buf) | ||
| if err := enc.Encode(manifest); err != nil { | ||
| return nil, fmt.Errorf("encoding chain.toml V2: %w", err) | ||
| } | ||
| return buf.Bytes(), nil |
Comment on lines
+123
to
+129
| // Snapshot returns the current inventory. The returned value is safe to read | ||
| // concurrently — it won't be modified (a new Inventory is created on each Refresh). | ||
| func (li *LiveInventory) Snapshot() *Inventory { | ||
| li.mu.RLock() | ||
| defer li.mu.RUnlock() | ||
| return li.current | ||
| } |
Comment on lines
+204
to
+216
| // LocalFiles returns all files that exist on disk for a domain. | ||
| func (inv *Inventory) LocalFiles(domain Domain) []*FileEntry { | ||
| inv.mu.RLock() | ||
| defer inv.mu.RUnlock() | ||
|
|
||
| var result []*FileEntry | ||
| for _, e := range inv.domains[domain] { | ||
| if e.Local { | ||
| result = append(result, e) | ||
| } | ||
| } | ||
| return result | ||
| } |
Comment on lines
+121
to
+123
| if r.From > cursor { | ||
| gaps = append(gaps, StepRange{cursor, min(r.From, to)}) | ||
| } |
Comment on lines
+158
to
+159
| intFrom := max(gap.From, r.From) | ||
| intTo := min(gap.To, r.To) |
Comment on lines
+50
to
+59
| // Largest power-of-2 that fits in remaining. | ||
| maxFit := uint64(1) | ||
| for maxFit*2 <= remaining { | ||
| maxFit *= 2 | ||
| } | ||
|
|
||
| size := min(maxAligned, maxFit) | ||
| if maxMergeSize > 0 && size > maxMergeSize { | ||
| size = maxMergeSize | ||
| } |
mh0lt
added a commit
that referenced
this pull request
Apr 16, 2026
Eight comments addressed: 1. ENR backward compatibility (chain_toml.go): merged renamed AuthoritativeBlocks/KnownBlocks fields from base branch with the new DomainSteps/MergeDepth V2 extensions. Added comment explaining that RLP tolerates trailing elements — old 3-field decoders skip the extras, new decoders see zero defaults for missing fields. 2. Coverage/Files mismatch (chaintoml_v2.go): coverage is now computed from the published file list (canonical + hashed only), not from all local files. Prevents advertising coverage for uncanonical/unhashed files. 3. Map determinism (chaintoml_v2.go): false positive — pelletier/go-toml/v2 sorts map keys via slices.SortFunc (verified in marshaler.go:683). Output is deterministic. 4. Snapshot() mutability (populate.go): comment updated to document that the returned Inventory is shared and entries MUST NOT be mutated directly. SetTorrentHash is the sanctioned mutation path. 5. Pointer leakage (inventory.go): comments added to LocalFiles and AllDomainFiles documenting the shared-pointer constraint — callers must not mutate returned entries. 6-8. Missing min/max helpers (ranges.go, canonical.go): false positives — Go 1.25 has built-in min/max for ordered types. Verified compiles clean.
1a1b77a to
2fd63d2
Compare
…nge arithmetic Foundation for decentralized snapshot distribution and sparse snapshot loading. New package node/components/storage/snapshot/ provides: - TrustLevel (none/consensus/verified) — incremental trust model from issues #19657, #19658, #19659. Files are tagged with how their integrity was established and can be promoted (none → consensus → verified) but never demoted. - StepRanges — sorted non-overlapping step range arithmetic: normalize, gaps, coverage, union, GapsAgainst. Used to compare local vs peer coverage and decide what to download. - FileEntry — tracks a snapshot file (block or domain) with step range, torrent hash, trust level, local/remote status, seeding status. - Inventory — thread-safe registry of all known files per domain. Supports: - Coverage queries (local-only, full, or filtered by trust level) - Gap analysis against peer manifests - Atomic file rotation for merge-safe swaps - Trust promotion - Local vs remote file distinction This is the state model that storage uses to decide what to download, what to seed, and what to advertise in chain.toml. Also serves as the registry for sparse snapshot loading (which files are available for which step ranges). See erigon-documents/cocoon/pocs-and-proposals/decentralized-snapshots/design.md
… merge CanonicalLayout(steps, maxMergeSize) computes the deterministic file layout for any step count. Uses power-of-2 aligned files: at each position, the largest aligned power-of-2 that fits is used. The maxMergeSize parameter allows nodes behind on merges to publish at a shallower level. Key property: every merge level is a valid canonical layout. Deeper merges REPLACE files but never invalidate them. [0-4096) replaces [0-2048) + [2048-4096), but both are correct at their respective merge depths. MissingMerges(current, target) computes what merges are needed to converge from one layout to another. Ordered deepest-first for correct execution. IsCanonical(layout, steps) validates that a file layout is canonical: all files are power-of-2 sized, aligned, and cover [0, steps) without gaps. This is the foundation for: - Deterministic chain.toml: all nodes at same step + merge depth produce identical file lists and torrent hashes - Convergent merge loop: compute target, diff, execute missing merges - chain.toml comparison: consumer validates peer layouts are canonical See #20531 for the merge determinism issue this addresses.
chain.toml V2 is a versioned manifest with structured sections: - [blocks]: deterministic block snapshot files (same as V1) - [meta]: metadata files (erigondb.toml, salt files) - [domains.<name>]: per-domain state snapshot coverage with individual file entries carrying step ranges, torrent hashes, and trust levels Only canonical files (power-of-2 sized, aligned) are included in domain sections. Non-canonical files from merge backlog are excluded until merges catch up. This ensures chain.toml content is deterministic for a given (step, merge depth) — all nodes at the same level produce identical manifests. Key types: - ChainTomlV2: structured manifest with version, blocks, meta, domains - DomainManifest: coverage + file list for a single domain - DomainFileEntry: name, step range, torrent hash, trust level - GenerateV2(inventory): builds V2 from snapshot inventory - ParseV2/MarshalV2: round-trip serialization - DetectVersion: distinguish V1 from V2 without full parse ENR entry extended with: - DomainSteps: total domain steps covered (0 = V1-only peer) - MergeDepth: largest canonical file size (for merge-depth preference) See erigon-documents/cocoon/pocs-and-proposals/decentralized-snapshots/design.md
…le files LiveInventory holds a pinned view of the aggregator's visible files via a VisibleFileProvider (analogous to a read transaction). On file-change events, it closes the old view, opens a new one, and rebuilds from the new visible file set. Torrent hashes are preserved across refreshes. Key design: - Inventory only reflects visible files (not dirty/in-progress) - Files are pinned by the provider lifetime — can't change during reads - Refresh is atomic: new Inventory built before old one is replaced - Snapshot() returns the current Inventory for concurrent read access - No directory scanning — uses the aggregator's existing file tracking The VisibleFileProvider/VisibleFileOpener interfaces decouple from the aggregator's concrete types, avoiding circular imports.
Eight comments addressed: 1. ENR backward compatibility (chain_toml.go): merged renamed AuthoritativeBlocks/KnownBlocks fields from base branch with the new DomainSteps/MergeDepth V2 extensions. Added comment explaining that RLP tolerates trailing elements — old 3-field decoders skip the extras, new decoders see zero defaults for missing fields. 2. Coverage/Files mismatch (chaintoml_v2.go): coverage is now computed from the published file list (canonical + hashed only), not from all local files. Prevents advertising coverage for uncanonical/unhashed files. 3. Map determinism (chaintoml_v2.go): false positive — pelletier/go-toml/v2 sorts map keys via slices.SortFunc (verified in marshaler.go:683). Output is deterministic. 4. Snapshot() mutability (populate.go): comment updated to document that the returned Inventory is shared and entries MUST NOT be mutated directly. SetTorrentHash is the sanctioned mutation path. 5. Pointer leakage (inventory.go): comments added to LocalFiles and AllDomainFiles documenting the shared-pointer constraint — callers must not mutate returned entries. 6-8. Missing min/max helpers (ranges.go, canonical.go): false positives — Go 1.25 has built-in min/max for ordered types. Verified compiles clean.
2fd63d2 to
3352143
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Foundation for decentralized snapshot distribution (#19660) and sparse snapshot loading. Adds the state model that storage uses to decide what to download, what to seed, what to advertise, and what trust level to require.
New package:
node/components/storage/snapshot/Trust model (
trust.go):TrustLevel: none → consensus → verified (incremental, from Decentralized snapshot distribution: POC — ENR + BitTorrent flow #19657/Decentralized snapshot distribution: Statistical trust via threshold consensus #19658/Decentralized snapshot distribution: Identity trust via UCAN delegation #19659)Satisfies(required)for consumer-side filteringRange arithmetic (
ranges.go):StepRange: half-open interval [From, To) of aggregator stepsStepRanges: sorted, non-overlapping range set with:File inventory (
inventory.go):FileEntry: snapshot file with domain, step range, torrent hash, trust level, local/remote/seeding flagsInventory: thread-safe per-domain file registryReplaceWithMergefor merge-safe file rotationPromoteTrustfor trust ladder progressionHow it connects
Test plan
make lintpassesmake erigonbuildsDepends on: #20526 (decentralized snapshots rebase)