feat(storage): multi-copy upload with store->pull->commit flow#593
feat(storage): multi-copy upload with store->pull->commit flow#593rvagg wants to merge 2 commits intorvagg/sp-sp-fetchfrom
Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
synapse-dev | a3a248f | Commit Preview URL Branch Preview URL |
Feb 16 2026, 11:15 AM |
|
Docs lint failing, this still needs a big docs addition but that can come a little later as we get through review here. Here's some notes I built up about failure modes and handling: Multi-Copy Upload: Failure HandlingPhilosophy
Partial Success Over AtomicityWhen a user requests N copies and we can only achieve fewer, we commit what we have rather than throwing everything away:
Failure Modes by StageThe multi-copy upload has a sequential pipeline: select → store → pull → commit. Stage 0: Provider Selection (before any upload)Provider selection uses a tiered approach with ping validation at each step:
Ping validation: Before selecting any provider, we ping their PDP endpoint. If ping fails, we try the next provider in the current tier before falling to the next tier.
Key distinction:
Stage 1: Store (upload data to primary SP)Store has two sub-stages:
Store failure is unambiguous from the SDK's perspective: either we have confirmed parked data, or we don't. The user can safely retry. Note: If 1b times out, data might exist on the SP but we can't confirm it. The SP will eventually GC parked pieces that aren't committed. Stage 2: Pull (SP-to-SP fetch to secondaries)
Pull failure is recoverable: data is still on the primary, no on-chain state exists yet. Retrying pull is cheap (SP-to-SP, no client bandwidth). Stage 3: Commit (addPieces on-chain transaction)
Behaviour Matrix
Error Types/** Primary store failed - no data stored anywhere, safe to retry */
class StoreError extends Error {
name = 'StoreError'
}
/** All commits failed - data stored on SP(s) but nothing on-chain, safe to retry */
class CommitError extends Error {
name = 'CommitError'
}
// Partial commit failures appear in result.failures[] with role: 'primary' or 'secondary'
// Only throws CommitError when ALL providers fail to commitWhat Users Must CheckUsers should always inspect // If ALL commits fail, upload() throws CommitError
// If at least one succeeds, we get a result:
const result = await synapse.storage.upload(data, { count: 3 })
// Check if endorsed provider (primary) failed
const primaryFailed = result.failures.find(f => f.role === 'primary')
if (primaryFailed) {
console.warn(`Endorsed provider ${primaryFailed.providerId} failed: ${primaryFailed.error}`)
// Data is only on non-endorsed secondaries
}
// Check if we got all requested copies
if (result.copies.length < 3) {
console.warn(`Only ${result.copies.length}/3 copies succeeded`)
for (const failure of result.failures) {
console.warn(` Provider ${failure.providerId} (${failure.role}): ${failure.error}`)
}
}
// Every copy in copies[] is committed on-chain
for (const copy of result.copies) {
console.log(`Provider ${copy.providerId}, dataset ${copy.dataSetId}, piece ${copy.pieceId}`)
}Auto-Retry LogicWhen user calls
When user specifies Design Decision: Primary Commit Failure HandlingCurrent implementation commits on all providers in parallel via Endorsed providers are selected as primary because they're curated for reliability. If primary (endorsed) fails but secondary (non-endorsed) succeeds, the user ends up with data only on non-endorsed providers. This may not meet product requirements of having one copy on an endorsed provider. // Check if endorsed provider failed
const primaryFailed = result.failures.some(f => f.role === 'primary')
if (primaryFailed) {
// Handle: retry, alert, or treat as error depending on requirements
} |
|
I noticed this:
What is the test for the availability of an Endorsed Provider in the case we have more than one? If the first store fails, is there a retry? Under retry:
If we have 2 Endorsed, and the store on primary operation fails do we retry the other endorsed? |
|
@timfong888 I've clarified the post above with more detail:
|
|
Docs updated to pass lint, additional tests added to address some gaps. |
|
I am not clear on this:
My understanding is if no Endorsed SP succeeds, it's a failure operation, because if there is no Endorsed and we only have Approved, that has a low durability guarantee. |
For primary selection (first context), exhaustion = error (can't proceed) The above seems right. If Primary exhausts, it's error, not go to the next tier, right? |
|
Question: If the endorsed provider passed ping during selection but then fails during store() (HTTP upload or parking |
What happens if GC before retry? |
eb878ac to
29ac8ad
Compare
|
On the tier question: yes, the current code does fall back to approved-only if no endorsed provider passes the health check. A
Not right now. Couple of reasons:
Curio GCs unreferenced pieces after 24 hours, so there's a comfortable window for retries for the commit phase. |
|
Okay. So it randomizes across the Endorsed SP for ping if no existing context. As long as they are good and an endorsed stores and commits successfully we are good. That's a fair assumption. |
…lity Borrowed a lot of this from #593, and merged with foc-devnet-info support.
59e576b to
63c6170
Compare
2d43c4f to
70fa757
Compare
|
Two design changes landed based on product discussion with @timfong888:
|
29ac8ad to
98792d4
Compare
There was a problem hiding this comment.
synthesises with changes that are in #600 but demonstrates the multi upload() flow and the multi-piece variant
814e22e to
619499d
Compare
|
Updated on top of #544. Minor updates to the original post here (which is the commit message) to reflect latest form with newest product requirements implemented. |
Implement store->pull->commit flow for efficient multi-copy storage replication. Split operations API on StorageContext: - store(): upload data to SP, wait for parking confirmation - presignForCommit(): pre-sign EIP-712 extraData for pull + commit reuse - pull(): request SP-to-SP transfer from another provider - commit(): add pieces on-chain with optional pre-signed extraData - getPieceUrl(): get retrieval URL for SP-to-SP pulls StorageManager.upload() orchestration: - Default 2 copies (endorsed primary + any approved secondary) - Single-provider: store->commit flow - Multi-copy: store on primary, presign, pull to secondaries, commit all - Auto-retry failed secondaries with provider exclusion (up to 5 attempts) Provider selection: - Primary requires endorsed provider (throws if none reachable) - Secondaries use any approved provider from the pool - 2-tier selection per role: existing dataset, then new dataset Callback refinements: - Remove redundant onUploadComplete (use onStored instead) - onStored(providerId, pieceCid) - after data parked on provider - onPieceAdded(providerId, pieceCid) - after on-chain submission - onPieceConfirmed(providerId, pieceCid, pieceId) - after confirmation Type clarity: - Rename UploadOptions.metadata -> pieceMetadata (piece-level) - Rename CommitOptions.pieces[].metadata -> pieceMetadata - StoreError/CommitError carry providerId and endpoint for optional telemetry - New: CopyResult, FailedCopy for multi-copy transparency Implements #494
619499d to
f63e566
Compare
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
…docs for multi-copy Move provider selection logic (selectProviders, fetchProviderSelectionInput, findMatchingDataSets) from SDK internals to synapse-core as public API for DIY users. Simplify selection from 4-tier fallback to 2-tier preference (existing dataset -> new dataset) since endorsedIds already controls the eligible pool. Clean up createContexts() to three explicit paths (dataSetIds, providerIds, smartSelect) with count validation and duplicate-provider guard. Update storage docs to reflect multi-copy as the default upload path.
4770bab to
a3a248f
Compare
|
@hugomrdias (and @rjan90 ) I'm bailing on my 3rd PR and just putting it in here as a second commit. I discovered when doing this that I'd lost something during my rebase to post-0.37 master (when you give providerIds and dataSetIds it should only use them and not do the cascade thing). I put that back in the latest commit and it's now more complete (:crossed_fingers:). But, as you might see if you looked at that commit, it's the one that pulls a bunch more stuff back into synapse-core, the previous commit didn't touch core, that was all left for #544, and this new one adds a big docs modification. The docs have 3 levels:
example-storage-e2e.js works, confirmed working for single and multiple files, small and large, in devnet and on calibnet 🥳. |
test: mocked JSON RPC Update packages/synapse-core/test/foc-devnet-info.test.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Update packages/synapse-core/src/foc-devnet-info/src/index.ts Co-authored-by: Rod Vagg <rod@vagg.org> fix: make example script work again, refactor for maximum example utility (#604) Borrowed a lot of this from #593, and merged with foc-devnet-info support. Update packages/synapse-core/src/foc-devnet-info/src/index.ts Co-authored-by: Rod Vagg <rod@vagg.org> fixes: PR review fix: remove redundant loadDevnetInfo() function
Multi-Copy Durability in Synapse (What's New)Store data across multiple storage providers with a single upload. The SDK handles replication server-side: data is uploaded once and providers copy it between themselves. What's NewMulti-Copy Uploads
const result = await synapse.storage.upload(data)
// result.copies: each successful copy with provider, dataset, and retrieval URL
// result.failures: any providers that failedIf the primary copy fails (store or commit), Target Specific ProvidersControl where your copies go: // Specific providers
await synapse.storage.upload(data, { providerIds: [1n, 2n, 3n] })
// Specific existing datasets
await synapse.storage.upload(data, { dataSetIds: [10n, 20n] })
// Or let the SDK choose (default: 2 copies, endorsed primary)
await synapse.storage.upload(data, { count: 3 })Split Operations for Batching & Greater ControlBreak the upload pipeline into independent phases - const [primary, secondary] = await synapse.storage.createContexts({
count: 2,
metadata: { source: "my-service" },
})
// Store multiple pieces on the primary
const stored = await Promise.all(files.map(file => primary.store(file)))
const pieceCids = stored.map(s => s.pieceCid)
// Pre-sign once for all pieces (avoids multiple wallet prompts)
const extraData = await secondary.presignForCommit(
pieceCids.map(cid => ({ pieceCid: cid }))
)
// Secondary pulls all pieces from primary (server-to-server, no client bandwidth)
await secondary.pull({ pieces: pieceCids, from: primary, extraData })
// Commit all pieces on-chain in one transaction per provider
await primary.commit({ pieces: pieceCids.map(cid => ({ pieceCid: cid })) })
await secondary.commit({ pieces: pieceCids.map(cid => ({ pieceCid: cid })), extraData })Each phase is independently retryable. If the on-chain commit fails, the data is already stored on the provider, retry Upload Progress VisibilityTrack what's happening across providers: await synapse.storage.upload(data, {
onStored: (providerId, pieceCid) => { /* data uploaded to provider */ },
onPullProgress: (providerId, pieceCid, status) => { /* SP-to-SP transfer progress */ },
onCopyComplete: (providerId, pieceCid) => { /* secondary copy confirmed */ },
onCopyFailed: (providerId, pieceCid, error) => { /* secondary copy failed */ },
onPiecesAdded: (txHash, providerId, pieces) => { /* on-chain tx submitted */ },
onPiecesConfirmed: (dataSetId, providerId, pieces) => { /* on-chain tx confirmed */ },
})Structured ErrorsErrors now tell you exactly what failed and where:
Both carry the Provider Selection for Core UsersFor applications that need direct control without the SDK wrapper, provider selection is now available as stateless functions in import { fetchProviderSelectionInput, selectProviders } from "@filoz/synapse-core/warm-storage"
// Single multicall gathers providers, endorsements, and existing datasets
const input = await fetchProviderSelectionInput(client, {
address: walletAddress,
metadata: { source: "my-service" },
})
// Pure function, no network calls, deterministic
const [primary] = selectProviders(
{ ...input, endorsedIds: input.endorsedIds }, // endorsed only
{ count: 1 }
)
const [secondary] = selectProviders(
{ ...input, endorsedIds: new Set() }, // any approved provider
{ count: 1, excludeProviderIds: new Set([primary.provider.id]) }
)SP-to-SP Pull for Core UsersInitiate and monitor server-side replication directly: import { pullPieces, waitForPullStatus } from "@filoz/synapse-core/sp"
const result = await waitForPullStatus(client, {
serviceURL: secondaryProvider.pdp.serviceURL,
pieces: [{
pieceCid,
sourceUrl: `${primaryProvider.pdp.serviceURL}/pdp/piece/${pieceCid}`,
}],
payee: secondaryProvider.serviceProvider,
payer: client.account.address,
cdn: false,
metadata: { source: "my-service" },
onStatus: (response) => console.log(response.status),
})The pull endpoint is idempotent, the same signed request can be safely retried and doubles as a status check. Breaking Changes
|
Sits on top of #544 which has the synapse-core side of this.
Implement store->pull->commit flow for efficient multi-copy storage replication.
Split operations API on StorageContext:
StorageManager.upload() orchestration:
Provider selection:
Callback refinements:
Type clarity:
Implements #494