bulk-fetch offchain txs in walkVtxoChain to reduce DB round-trips#1005
bulk-fetch offchain txs in walkVtxoChain to reduce DB round-trips#1005bitcoin-coder-bob merged 10 commits intobob/dag-1from
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Protocol-Critical Review: bulk-fetch offchain txs in walkVtxoChain
FLAG: This PR touches VTXO chain walking (protocol-critical code). Human review and approval required before merge.
Overall Assessment:
The approach is sound -- prefetch all offchain txs needed per iteration in a single bulk query, cache them in-memory, and fall back to individual GetOffchainTx on cache miss. The code is well-structured and the test coverage is thorough. However, I have several concerns that should be addressed before merging.
- Correctness -- Bulk vs. Single Fetch Parity
Postgres GetOffchainTxsByTxids result ordering is non-deterministic (Go map iteration). This is acceptable because the caller populates a map (offchainTxCache[tx.ArkTxid] = tx), so ordering does not matter. Good.
The domain-object construction in GetOffchainTxsByTxids is copy-pasted from GetOffchainTx in both Postgres and SQLite backends. This is a maintenance hazard -- if the mapping changes, the bulk variant could diverge. Recommend extracting a shared helper.
- Security -- Risk of Missing Transactions
Silent skip on bulk miss is correct. The bulk query filters with COALESCE(fail_reason, '') = '', same as GetOffchainTx. Behavior is consistent. Good.
Badger implementation (badger/ark_repo.go) loops individually with strings.Contains(err.Error(), "not found") to skip missing entries. This string-matching is fragile -- if getOffchainTx changes its error message, this silently breaks. Recommend using a sentinel error or errors.Is pattern. Also, Badger users get zero batching benefit.
- Performance -- CRITICAL CONCERN
No upper bound on batch size. SQLite has a default SQLITE_MAX_VARIABLE_NUMBER of 999. The sqlc SLICE expansion creates one ? per txid, so a batch of >999 txids will FAIL on SQLite. A very wide fanout tree could hit this. Recommend adding a batch-size cap (e.g., chunk into groups of 500).
Cache grows unboundedly across the chain walk. Probably fine in practice (bounded by page size), but worth a comment.
- Edge Cases
Empty txids slice: All three backends handle correctly. Good.
Partial bulk fetch: Fallback picks up misses via individual GetOffchainTx. Tested. Good.
Cache key mismatch risk: Cache keyed by tx.ArkTxid, looked up by vtxo.Txid. These match (both come from the same column). Confirmed correct.
- Test Coverage
Good coverage overall:
- TestBulkOffchainTxReducesDBCalls validates call counts
- BenchmarkOffchainTxBulkVsSingle validates wall-clock improvement
- Integration test for the repo method
- Exposure tests cover bulk-hit and bulk-miss-with-fallback
- Existing tests updated with .Maybe()
Minor gap: No test for mixed preconfirmed/non-preconfirmed VTXOs in the same iteration.
Summary:
- Correctness: OK
- Security: OK (fallback catches misses)
- Performance: CONCERN -- no batch-size cap could hit SQLite limits
- Code quality: Suggestion -- extract shared row-to-domain mapping
- Badger: Fragile string-based error matching
- Tests: Good, minor gap
Recommendation: Address the SQLite batch-size limit concern before merging. The rest are non-blocking. This is protocol-critical code -- requesting human sign-off.
|
3 branch comparison performance using: |
There was a problem hiding this comment.
Incremental Review — new commit 01d06d93 (index + repo implementations)
Previous review: 2026-04-08. This review covers only the new commit.
✅ What's good
Index is correct. CREATE INDEX idx_checkpoint_tx_offchain_txid ON checkpoint_tx (offchain_txid) directly accelerates the LEFT JOIN in offchain_tx_vw (ON offchain_tx.txid = checkpoint_tx.offchain_txid). This is the right index for the new bulk query. Migration up/down files are present for both Postgres and SQLite. ✓
Postgres query uses ANY($1::varchar[]) with pq.Array() — no parameter expansion issue. ✓
Service tests cover both hit and miss cases for GetOffchainTxsByTxids. ✓
COALESCE(fail_reason, '') = '' filter is consistent with GetOffchainTx. ✓
⚠️ Previous concerns NOT addressed in this commit
1. SQLite batch-size limit — still open (MEDIUM)
sqlite/sqlc/queries/query.sql.go uses /*SLICE:txids*/? which expands to one ? per txid. SQLite's default SQLITE_MAX_VARIABLE_NUMBER is 999. A chain walk with >999 unique preconfirmed txids in a single iteration will fail. Postgres is fine (ANY takes a single array parameter). Recommend chunking in the caller or in the SQLite repo implementation — e.g., batches of 500.
2. Row-to-domain mapping duplication — still open (LOW)
postgres/offchain_tx_repo.go:GetOffchainTxsByTxids (lines ~117-175) and sqlite/offchain_tx_repo.go:GetOffchainTxsByTxids (lines ~117-175) both copy-paste the same grouping/mapping logic from their respective GetOffchainTx. A divergence in one backend would be a subtle correctness bug. Extracting a shared rowsToDomainOffchainTx helper would prevent this.
3. Badger fragile string-matching — still open (LOW)
badger/ark_repo.go uses strings.Contains(err.Error(), "not found") to skip missing entries. Not introduced by this commit but still a concern for the feature as a whole.
🔍 New findings in this commit
4. Postgres nullable field handling differs from SQLite — verify correctness (MEDIUM)
In postgres/offchain_tx_repo.go:GetOffchainTxsByTxids:
if vw.CheckpointTxid.Valid && vw.CheckpointTx.Valid {
checkpointTxs[vw.CheckpointTxid.String] = vw.CheckpointTx.String
commitmentTxids[vw.CheckpointTxid.String] = vw.CommitmentTxid.String
if vw.IsRootCommitmentTxid.Valid && vw.IsRootCommitmentTxid.Bool {In sqlite/offchain_tx_repo.go:GetOffchainTxsByTxids:
if vw.CheckpointTxid != "" && vw.CheckpointTx != "" {
checkpointTxs[vw.CheckpointTxid] = vw.CheckpointTx
commitmentTxids[vw.CheckpointTxid] = vw.CommitmentTxid.String
if vw.IsRootCommitmentTxid.Bool {This mirrors the existing GetOffchainTx patterns (Postgres uses sql.NullString, SQLite uses COALESCE to empty strings in the view), so it's consistent — but it reinforces concern #2 about duplication. The different null-handling logic between backends is a maintenance hazard.
5. Test coverage gap — no multi-txid bulk fetch test (LOW)
service_test.go tests bulk fetch with a single txid ([]string{arkTxid}). There's no test that fetches multiple distinct txids in a single call and verifies all are returned correctly. This would catch bugs in the grouping logic.
Summary
The new commit is sound — the index is correct and well-placed. The Postgres and SQLite repo implementations are consistent with their existing single-fetch counterparts.
The SQLite batch-size concern from my previous review remains the only MEDIUM risk. Everything else is LOW/cleanup.
|
after last commit I got: |
There was a problem hiding this comment.
Incremental Review — commit 2206d890 (timing instrumentation + e2e test)
Previous reviews: initial (2026-04-08), incremental on 01d06d93. This covers only the new commit.
Timing instrumentation (indexer.go)
1. log.Info on every call is too noisy for production (LOW)
walkVtxoChain is called on every GetVtxoChain API request. The timing breakdown logs at Info level unconditionally (indexer.go:789-800). This will spam production logs. Should be Debug level, or gated behind a flag. This is profiling scaffolding — either promote it to a proper metrics emission (histogram/gauge) or downgrade to Debug before merge.
2. Early-return pagination path skips the timing log (LOW)
The pagination early-return at indexer.go:658 (return chain, allOutpoints, token, nil) bypasses the timing log at line 789. This means you only get timing data when the chain is fully exhausted (no pagination). For the 5000-hop benchmark that's fine, but in production with pagination enabled, you'll never see these timings. If the intent is production observability, the log should fire on both paths.
3. tOther can go negative (NIT)
tOther := walkTotal - tEnsureVtxos - tBulkOffchain - tSingleOffchain - tPsbtDeser - tVtxoTree — time.Since calls are not monotonically additive with a single walkStart. GC pauses, goroutine scheduling, and the non-timed work between measurements mean tOther could theoretically be negative. Unlikely to matter in practice, but worth a comment.
E2e test (vtxo_chain_test.go)
4. Private key logged in cleartext (MEDIUM)
vtxo_chain_test.go:66: t.Logf("wallet seed: %s", seed) logs the raw private key hex. Even in test output, CI logs can be persisted and shared. For a regtest test this is low-risk, but it's bad hygiene — especially in a protocol-critical codebase. Log a truncated fingerprint instead, or remove entirely (the seed can be passed via -seed flag for reproducibility).
5. Tight spin loop with no backoff (MEDIUM)
vtxo_chain_test.go:112-115:
for len(spendable) == 0 {
spendable, _, err = client.ListVtxos(ctx)
require.NoError(t, err)
}This is a busy-wait loop hammering the server with zero delay. On a slow CI or loaded server, this burns CPU and floods the server with RPCs. Add a time.Sleep(100 * time.Millisecond) or use a polling helper with exponential backoff.
6. notifyErr goroutine error handling is correct but fragile (LOW)
The pattern of var notifyErr error + wg.Go(func() { ..., notifyErr = ... }) + wg.Wait() + require.NoError(t, notifyErr) works because WaitGroup.Wait() provides happens-before. But if someone later adds a second goroutine writing to notifyErr, it becomes a data race. Consider using errgroup.Group instead, which is the idiomatic Go pattern for this.
7. time.Sleep for synchronization (LOW)
vtxo_chain_test.go:100 (time.Sleep(time.Second)) and line 133 (time.Sleep(2 * time.Second)) are timing-based synchronization. Flaky under load. Acceptable for a manual perf test, but if this is meant to run in CI, it'll eventually flake.
Previously raised concerns — status
| # | Concern | Status |
|---|---|---|
| 1 | SQLite batch-size limit (MEDIUM) | Still open |
| 2 | Row-to-domain mapping duplication (LOW) | Still open |
| 3 | Badger fragile string-matching (LOW) | Still open |
Summary
This commit is profiling/test scaffolding — no protocol logic changes. The timing instrumentation and e2e test are useful for development but need cleanup before merge:
- Downgrade timing log to
Debugor convert to proper metrics - Remove private key logging from the test
- Add backoff to the polling loop
No new protocol-correctness concerns. Previous MEDIUM (SQLite batch-size) remains open.
There was a problem hiding this comment.
Incremental Review — commits f3abfe23 (SQLite chunking + multi-txid test) & 1cf3a4ce (lint)
Reviewed the two new commits since my last review of 2206d890.
Changes
-
SQLite batch chunking (
sqlite/offchain_tx_repo.go):GetOffchainTxsByTxidsnow chunks txid slices into batches of 500 (sqliteMaxBulkTxids) to stay underSQLITE_MAX_VARIABLE_NUMBER(default 999). Correct fix — the sqlcSLICEexpansion emits one?per txid, so unbounded input would blow the parameter limit. -
Multi-txid integration test (
service_test.go): Inserts two distinct offchain txs, fetches both + a missing txid in one call, and asserts no cross-txid checkpoint contamination. Good coverage of the row-grouping logic. -
Lint (
indexer.go): Field alignment in the timing log. No-op.
Verdict
All three look correct. Specific notes:
sqliteMaxBulkTxids = 500: Sensible headroom under the 999 limit. The comment correctly explains why. ✅- Postgres impl (
postgres/offchain_tx_repo.go): No chunking — PG's default param limit is 65535, so this is fine for realistic workloads. No action needed now. - Cross-contamination assertions in the test (
require.NotContainsfor checkpoint txids across txs) are exactly the right thing to test given the grouped-row assembly. ✅ - No protocol-critical changes in this increment — it's purely infra plumbing and test coverage.
No new issues. Previous review findings still apply (see earlier comments re: %s vs %w for error wrapping).
🤖 arkana · protocol-critical review
|
There was a problem hiding this comment.
Incremental Review — commits 573effa0, 91c7a054, 6ab1ed61
Previous reviews: initial (2026-04-08), 01d06d93 (index + repo), 2206d890 (timing + e2e), f3abfe23/1cf3a4ce (SQLite chunking + lint). This covers the three new commits.
What changed
-
573effa0— preload offchain txs via marker DAG + nbxplorer bumppreloadVtxosByMarkers→preloadByMarkers: now also bulk-fetches offchain txs during the marker DAG walk, so the main loop's in-loop bulk fetch becomes a fallback rather than the primary path. This is the key optimization that dropped the 5000-hop benchmark from ~4s to ~445ms.- nbxplorer image bumped from
2.5.30to2.5.30-1(regtest only).
-
91c7a054— nil-guard on offchainTxRepo- Adds
&& offchainTxRepo != nilguard inpreloadByMarkersso test helpers that don't wire up the offchain-tx repo don't panic. Comment explains the fallback path.
- Adds
-
6ab1ed61— move timing instrumentation out of prod into test- Removes all
time.Now()/time.Since()timing accumulators and thelog.Infobreakdown fromwalkVtxoChain(production code). - Adds
TestVtxoChainTimingBreakdownwith timing-decorated repo wrappers (phaseTimings,timingVtxoRepo,timingMarkerRepo,timingOffchainTxRepo) in the bench test file. - This directly addresses my previous review feedback (timing logs too noisy for production). ✅
- Removes all
✅ What's good
- Timing removed from prod code. All profiling scaffolding is now test-only. Previous concern resolved. ✅
preloadByMarkerspiggyback is correct. Deduplicates txids viaseenmap, skips already-cached entries, only callsGetOffchainTxsByTxidswhen there are actual misses. The in-loop bulk fetch + individual fallback remain as safety nets. ✅- Nil-guard is the right fix. Comment explains why the guard exists and what happens when it fires (walk loop fallback). ✅
- Test timing wrappers are well-structured.
phaseTimingswith mutex is correct for concurrent use.timingVtxoRepo/timingMarkerRepo/timingOffchainTxRepodelegate cleanly.wrappedRepoManageris minimal and explicit about which repos are wired. ✅ - nbxplorer bump is regtest-only. No production impact. ✅
⚠️ Notes (non-blocking)
1. SQLite batch-size limit in preloadByMarkers — inherited from GetOffchainTxsByTxids (MEDIUM, previously raised)
The new piggyback path in preloadByMarkers calls GetOffchainTxsByTxids(ctx, missingTxids) without its own chunking. This is fine because the SQLite implementation already chunks internally (commit f3abfe23). Just confirming this is covered — no action needed. ✅
2. time.Sleep in timing wrappers simulates latency but doesn't measure real DB overhead (NIT)
timingVtxoRepo et al. add simulatedLatency via time.Sleep which inflates wall clock but doesn't reflect real query cost. This is fine for a relative phase breakdown, but the absolute numbers are meaningless. The test name (TimingBreakdown) could mislead. Consider adding a comment noting the simulated nature, or renaming to TestVtxoChainPhaseProportions.
3. wrappedRepoManager returns nil for unused repos (LOW)
Events(), Rounds(), ScheduledSession(), Convictions(), Assets(), Fees() all return nil. If any code path in walkVtxoChain or GetVtxoChain ever touches these, it'll panic with a nil pointer instead of a descriptive error. Acceptable for a focused test, but a panic("not wired") would be safer.
Previously raised concerns — final status
| # | Concern | Status |
|---|---|---|
| 1 | SQLite batch-size limit | ✅ Fixed in f3abfe23 |
| 2 | Row-to-domain mapping duplication | Still open (LOW) |
| 3 | Badger fragile string-matching | Still open (LOW) |
| 4 | Timing logs too noisy for prod | ✅ Fixed in 6ab1ed61 |
| 5 | Private key logged in e2e test | Still open (LOW, regtest only) |
| 6 | Tight spin loop in e2e test | Still open (LOW) |
Summary
These three commits are clean. The main optimization (offchain tx preloading via marker DAG) is correct and well-tested. The timing instrumentation has been properly moved out of production code into test infrastructure, addressing the key concern from my last review.
No new issues found. All previous MEDIUM concerns have been addressed. Remaining items are LOW-priority cleanup.
Performance results speak for themselves: 5000-hop chain walk went from ~30.8s (master) → ~11.2s (initial PR) → ~4.0s (with index) → ~445ms (with marker-DAG preloading). Excellent work.
🤖 arkana · protocol-critical review
There was a problem hiding this comment.
Incremental Review — commit 3a04208e
Reviewed the new commit ("clarify simulated latency in timing breakdown and panic on unwired repo accessors"). This is test-only — no production code changed.
Changes
-
Simulated latency comment — adds a doc comment to
TestVtxoChainTimingBreakdownclarifying that absolute numbers don't reflect real DB cost, only relative phase proportions. Directly addresses my previous note #2. ✅ -
panic("not wired")on unwired repos —wrappedRepoManageraccessors forEvents,Rounds,ScheduledSession,Convictions,Assets,Feesnow panic with a descriptive message instead of returning nil. Directly addresses my previous note #3. ✅
Verdict
Both changes are correct and implement exactly what was suggested. No new issues introduced.
All previously raised concerns have been addressed across the commit history:
- ✅ SQLite batch chunking (500 cap) —
f3abfe23 - ✅ Timing instrumentation removed from prod —
6ab1ed61 - ✅ Simulated latency documented — this commit
- ✅ Nil-guard panics — this commit
Approving the code quality and implementation. Human review still required — this PR touches VTXO chain walking which is protocol-critical.
🤖 arkana · protocol-critical review
There was a problem hiding this comment.
Incremental Review — commit 3a04208e
Reviewed the new commit ("clarify simulated latency in timing breakdown and panic on unwired repo accessors"). This is test-only — no production code changed.
Changes
-
Simulated latency comment — adds a doc comment to
TestVtxoChainTimingBreakdownclarifying that absolute numbers don't reflect real DB cost, only relative phase proportions. Directly addresses my previous note #2. ✅ -
panic("not wired")on unwired repos —wrappedRepoManageraccessors forEvents,Rounds,ScheduledSession,Convictions,Assets,Feesnow panic with a descriptive message instead of returning nil. Directly addresses my previous note #3. ✅
Verdict
Both changes are correct and implement exactly what was suggested. No new issues introduced.
All previously raised concerns have been addressed across the commit history:
- ✅ SQLite batch chunking (500 cap) —
f3abfe23 - ✅ Timing instrumentation removed from prod —
6ab1ed61 - ✅ Simulated latency documented — this commit
- ✅ Nil-guard panics — this commit
Approving the code quality and implementation. Human review still required — this PR touches VTXO chain walking which is protocol-critical.
🤖 arkana · protocol-critical review
Summary
Reduce DB round-trips in
walkVtxoChainacross two different chain shapes:GetOffchainTxsByTxids).On a 10002-entry self-send chain,
walkVtxoChaintotal time dropped from 4.03s → 0.44s (9.2×) end-to-end, withbulkOffchainTxgoing from 3.49s to 0s.The problem
Before this PR, every preconfirmed VTXO visited in
walkVtxoChaintriggered an individualGetOffchainTx(ctx, vtxo.Txid)call — one DB round-trip per hop. Two bad scenarios:What changed
1. Bulk-fetch offchain txs per iteration (helps wide fanout)
GetOffchainTxsByTxidsbulk query onOffchainTxRepositorywith sqlc implementations:ANY($1::varchar[])— single bound array parameter, unbounded batch size.sqlc.slice('txids')which expands to one?per txid. Chunked at 500 per batch (sqliteMaxBulkTxids) to stay under SQLite's defaultSQLITE_MAX_VARIABLE_NUMBER = 999.walkVtxoChainprefetch + cache: at the start of each BFS iteration, collect preconfirmed VTXO txids not yet in the offchain tx cache and bulk-fetch them in one call. IndividualGetOffchainTxremains as a fallback for cache misses.2. Marker-DAG offchain tx preload (helps deep linear chains)
PR #908 previously introduced a marker DAG that lets
walkVtxoChainbulk-load VTXO records by walking a sparse DAG of depth-indexed checkpoints (one marker perMarkerInterval = 100depths). This PR extends the marker walk to also populateoffchainTxCache: at each level, after fetching the VTXOs covered by a marker, it collects the preconfirmed txids and issues oneGetOffchainTxsByTxidscall for the whole window.Result: for a chain of depth N, offchain tx round-trips go from O(N) to O(N / MarkerInterval). On the 10k linear chain this means ~100 bulk calls instead of ~10000 sequential single calls, and the in-loop bulk fetch path becomes a dead fallback.
The in-loop bulk fetch from change #1 stays in place as a fallback for VTXOs outside a marker window (race conditions, mixed workloads).
3. Supporting changes
idx_checkpoint_tx_offchain_txidoncheckpoint_tx(offchain_txid)for both Postgres and SQLite. Theoffchain_tx_vwview doesLEFT JOIN checkpoint_tx ON offchain_tx.txid = checkpoint_tx.offchain_txid, and this index accelerates the bulk query.nbxplorerimage bump indocker-compose.regtest.yml(2.5.30→2.5.30-1) to fix a crash loop against Bitcoin Core 29.Test results
End-to-end on a 10002-entry linear chain (self-send)
internal/test/e2e/vtxo_chain_test.gobuilds a long chain and timesGetVtxoChainagainst it. Timing breakdown from the server-side log:bulkOffchainTx = 0sis the signal the marker preload worked: everyoffchainTxCachelookup inside the BFS loop hit the cache populated up front.Call-counting test on a 511-VTXO fanout tree
TestBulkOffchainTxReducesDBCalls— exercises change #1 in isolation:~57× reduction in DB round-trips. The 9 bulk calls correspond to 9 BFS depth levels.
Latency benchmark
BenchmarkOffchainTxBulkVsSingle— same tree, 50µs simulated DB latency:Repo-level tests
service_test.go— multi-txid bulk fetch test that inserts two distinct offchain txs, fetches[txA, txB, missing-txid]in one call, and asserts both return with their own checkpoint maps (no cross-txid row contamination).Visual explainer
Background: the VTXO tree and
walkVtxoChainA VTXO chain is a tree of transactions.
walkVtxoChaindoes a BFS — it processes all VTXOs at one depth, discovers their children via checkpoint tx inputs, then moves to the next depth.Each VTXO is either on-chain (leaf of a batch tree) or preconfirmed (created by an offchain Ark tx). Preconfirmed VTXOs have a corresponding OffchainTx record containing checkpoint PSBTs whose inputs point to parent VTXOs — that's how the traversal discovers the next level.
graph TD subgraph "VTXO tree walkVtxoChain traverses" R["Round Commitment Tx (on-chain root)"] R --> A["VTXO A (depth 0)"] R --> B["VTXO B (depth 0)"] A --> C["VTXO C (preconfirmed, depth 1)"] A --> D["VTXO D (preconfirmed, depth 1)"] B --> E["VTXO E (preconfirmed, depth 1)"] C --> F["VTXO F (preconfirmed, depth 2)"] C --> G["VTXO G (preconfirmed, depth 2)"] D --> H["VTXO H (preconfirmed, depth 2)"] E --> I["VTXO I (preconfirmed, depth 2)"] E --> J["VTXO J (preconfirmed, depth 2)"] endContext: PR #908 — marker DAG for VTXO records
PR #908 introduced a marker system — DAG checkpoints every 100 depths. Before the BFS loop starts,
preloadVtxosByMarkersbulk-loads all VTXO records intovtxoCache, eliminating per-VTXO DB calls.graph TD subgraph "Marker DAG (small, ~N/100 nodes)" M0["Marker M0 (depths 0-99)"] M1["Marker M1 (depths 100-199)"] M2["Marker M2 (depths 200-299)"] M0 --> M1 --> M2 end subgraph "VTXO tree (large, N nodes)" V0["~100 VTXOs (depths 0-99)"] V1["~100 VTXOs (depths 100-199)"] V2["~100 VTXOs (depths 200-299)"] end M0 -.->|covers| V0 M1 -.->|covers| V1 M2 -.->|covers| V2Problem #1: O(N) OffchainTx fetches (fixed by change #1)
After PR #908, VTXO records come from cache — but each preconfirmed VTXO still triggered an individual
GetOffchainTxcall. For 511 preconfirmed VTXOs in a wide tree, that's 511 sequential DB round-trips.Change #1 batches those per BFS level using
GetOffchainTxsByTxids:graph TD subgraph "BFS iteration: 3 preconfirmed VTXOs at depth 1" C["VTXO C - txid: abc1"] D["VTXO D - txid: abc2"] E["VTXO E - txid: abc3"] end subgraph "Old: 3 individual queries" Q1["SELECT ... WHERE txid = 'abc1'"] Q2["SELECT ... WHERE txid = 'abc2'"] Q3["SELECT ... WHERE txid = 'abc3'"] end subgraph "New: 1 bulk query" QB["SELECT ... WHERE txid IN ('abc1','abc2','abc3')"] end C --> Q1 D --> Q2 E --> Q3 C --> QB D --> QB E --> QBProblem #2: linear chains still pay O(N) round-trips (fixed by change #2)
The bulk fetch from change #1 helps wide trees — batch size K per iteration collapses K queries into 1. But a length-N linear chain has batch size 1 at every iteration, so it still does N sequential round-trips, just via the bulk API instead of the single API. Zero improvement.
Change #2 solves this by piggybacking offchain tx loading onto the existing marker DAG walk that already preloads VTXO records:
sequenceDiagram participant W as walkVtxoChain participant MDAG as Marker DAG walk participant DB as Database Note over W,MDAG: Before BFS loop starts W->>MDAG: preloadByMarkers(startVtxos) loop For each marker window (N/100 iterations) MDAG->>DB: GetVtxoChainByMarkers(window) DB-->>MDAG: 100 VTXO records MDAG->>DB: GetOffchainTxsByTxids(preconfirmed txids) DB-->>MDAG: 100 OffchainTx records MDAG->>DB: GetMarkersByIds(window) DB-->>MDAG: parent marker IDs end MDAG-->>W: vtxoCache + offchainTxCache fully populated Note over W,DB: BFS loop runs entirely from cache loop N iterations W->>W: vtxoCache lookup (hit) W->>W: offchainTxCache lookup (hit) endCombined effect (10002-entry linear chain)
graph TD subgraph "Before this PR" BA["VTXO fetches: ~100 bulk queries (markers, from PR #908)"] BB["OffchainTx fetches: ~10000 single queries"] BC["Total: ~10100 round-trips / 4.03s"] BA --> BC BB --> BC end subgraph "After change #1 alone (bulk fetch)" MA["VTXO fetches: ~100 bulk queries"] MB["OffchainTx fetches: ~10000 bulk-of-1 queries"] MC["Total: ~10100 round-trips / 4.03s (linear chain sees no improvement)"] MA --> MC MB --> MC end subgraph "After both changes" FA["VTXO fetches: ~100 bulk queries (markers)"] FB["OffchainTx fetches: ~100 bulk queries (markers)"] FC["Total: ~200 round-trips / 0.44s"] FA --> FC FB --> FC endA new test was added called
TestVtxoChainTimingBreakdownwhich is an in-process phase breakdown ininternal/core/application/indexer_bench_test.gothat gives per-phase timing forGetVtxoChain(without instrumenting prod code). Output looks like this giving a breakdown of timing:What it does:
time.Sleep per call to simulate DB round-trip cost so phase ratios are visible against in-memory fakes.
phases, and "other" (PSBT parse + in-process overhead).
What it replaces (code i had locally on my branch in prior commits):
The hand-rolled timing accumulators (tEnsureVtxos, tBulkOffchain, tSingleOffchain, tPsbtDeser, tVtxoTree) and the log.WithFields(...).Info("walkVtxoChain timing breakdown") block that previously lived
in walkVtxoChain itself. Prod code now has zero timing instrumentation.
What it does not cover:
Real network, real DB query plans, real PSBT signing, real cross-process latency. Those are still the e2e TestVtxoChain's job. This test is a complement for phase-ratio regression signal, not a
replacement
Tradeoffs & potential concerns:
vtxoCache and offchainTxCache for every VTXO reachable above the frontier. On a 10k-deep chain a paginated request still allocates ~10k Vtxo records + ~10k OffchainTx records (each holding checkpoint
PSBTs) regardless of pageSize. The early-termination cursor at the BFS level (indexer.go:649) bounds output size but not transient RSS (resident set size). Under concurrent deep-chain requests this is a heap-spike risk.
ancestry even if the caller only wants a handful of pages near the frontier.
callers hitting overlapping chains), but isn't in this PR.
traded for read-path latency.
linear chains. Operators running against a database with old VTXOs won't see the 9× improvement until those VTXOs age out.