Skip to content

bulk-fetch offchain txs in walkVtxoChain to reduce DB round-trips#1005

Merged
bitcoin-coder-bob merged 10 commits intobob/dag-1from
bob/GetOffchainTxn-enhancement
Apr 15, 2026
Merged

bulk-fetch offchain txs in walkVtxoChain to reduce DB round-trips#1005
bitcoin-coder-bob merged 10 commits intobob/dag-1from
bob/GetOffchainTxn-enhancement

Conversation

@bitcoin-coder-bob
Copy link
Copy Markdown
Collaborator

@bitcoin-coder-bob bitcoin-coder-bob commented Apr 7, 2026

Summary

Reduce DB round-trips in walkVtxoChain across two different chain shapes:

  1. Wide fanout trees — bulk-fetch offchain txs per BFS iteration instead of one-by-one (GetOffchainTxsByTxids).
  2. Deep linear chains — piggyback an offchain tx bulk fetch on the existing marker-DAG VTXO preload so the BFS loop never hits the DB for offchain txs at all.

On a 10002-entry self-send chain, walkVtxoChain total time dropped from 4.03s → 0.44s (9.2×) end-to-end, with bulkOffchainTx going from 3.49s to 0s.

The problem

Before this PR, every preconfirmed VTXO visited in walkVtxoChain triggered an individual GetOffchainTx(ctx, vtxo.Txid) call — one DB round-trip per hop. Two bad scenarios:

  • A wide fanout tree with N preconfirmed VTXOs does N sequential round-trips, even though all of them could be fetched in one query.
  • A deep linear chain of length N does N sequential round-trips and can't even benefit from batching (each BFS level has exactly one VTXO).

What changed

1. Bulk-fetch offchain txs per iteration (helps wide fanout)

  • New GetOffchainTxsByTxids bulk query on OffchainTxRepository with sqlc implementations:
    • Postgres uses ANY($1::varchar[]) — single bound array parameter, unbounded batch size.
    • SQLite uses sqlc.slice('txids') which expands to one ? per txid. Chunked at 500 per batch (sqliteMaxBulkTxids) to stay under SQLite's default SQLITE_MAX_VARIABLE_NUMBER = 999.
    • Badger loops individually as a compatibility shim (no bulk primitive).
  • walkVtxoChain prefetch + cache: at the start of each BFS iteration, collect preconfirmed VTXO txids not yet in the offchain tx cache and bulk-fetch them in one call. Individual GetOffchainTx remains as a fallback for cache misses.

2. Marker-DAG offchain tx preload (helps deep linear chains)

PR #908 previously introduced a marker DAG that lets walkVtxoChain bulk-load VTXO records by walking a sparse DAG of depth-indexed checkpoints (one marker per MarkerInterval = 100 depths). This PR extends the marker walk to also populate offchainTxCache: at each level, after fetching the VTXOs covered by a marker, it collects the preconfirmed txids and issues one GetOffchainTxsByTxids call for the whole window.

Result: for a chain of depth N, offchain tx round-trips go from O(N) to O(N / MarkerInterval). On the 10k linear chain this means ~100 bulk calls instead of ~10000 sequential single calls, and the in-loop bulk fetch path becomes a dead fallback.

The in-loop bulk fetch from change #1 stays in place as a fallback for VTXOs outside a marker window (race conditions, mixed workloads).

3. Supporting changes

  • New DB index idx_checkpoint_tx_offchain_txid on checkpoint_tx(offchain_txid) for both Postgres and SQLite. The offchain_tx_vw view does LEFT JOIN checkpoint_tx ON offchain_tx.txid = checkpoint_tx.offchain_txid, and this index accelerates the bulk query.
  • nbxplorer image bump in docker-compose.regtest.yml (2.5.302.5.30-1) to fix a crash loop against Bitcoin Core 29.

Test results

End-to-end on a 10002-entry linear chain (self-send)

internal/test/e2e/vtxo_chain_test.go builds a long chain and times GetVtxoChain against it. Timing breakdown from the server-side log:

total bulkOffchainTx psbtDeser ensureVtxos other
Before (this PR's bulk fetch alone) 4.029s 3.492s 363ms 9ms 165ms
After (+ marker-DAG offchain preload) 0.436s 0s 218ms 2ms 215ms

bulkOffchainTx = 0s is the signal the marker preload worked: every offchainTxCache lookup inside the BFS loop hit the cache populated up front.

Call-counting test on a 511-VTXO fanout tree

TestBulkOffchainTxReducesDBCalls — exercises change #1 in isolation:

Bulk calls Individual calls Total round-trips
With bulk prefetch 9 0 9
Without (pre-optimization) 0 511 511

~57× reduction in DB round-trips. The 9 bulk calls correspond to 9 BFS depth levels.

Latency benchmark

BenchmarkOffchainTxBulkVsSingle — same tree, 50µs simulated DB latency:

Time/op
Bulk prefetch 13.7ms
No bulk (fallback) 409ms
Speedup ~30×

Repo-level tests

  • service_test.go — multi-txid bulk fetch test that inserts two distinct offchain txs, fetches [txA, txB, missing-txid] in one call, and asserts both return with their own checkpoint maps (no cross-txid row contamination).
  • Existing repo tests updated to cover the new bulk method against both Postgres and SQLite.

Visual explainer

Background: the VTXO tree and walkVtxoChain

A VTXO chain is a tree of transactions. walkVtxoChain does a BFS — it processes all VTXOs at one depth, discovers their children via checkpoint tx inputs, then moves to the next depth.

Each VTXO is either on-chain (leaf of a batch tree) or preconfirmed (created by an offchain Ark tx). Preconfirmed VTXOs have a corresponding OffchainTx record containing checkpoint PSBTs whose inputs point to parent VTXOs — that's how the traversal discovers the next level.

graph TD
    subgraph "VTXO tree walkVtxoChain traverses"
        R["Round Commitment Tx (on-chain root)"]
        R --> A["VTXO A (depth 0)"]
        R --> B["VTXO B (depth 0)"]
        A --> C["VTXO C (preconfirmed, depth 1)"]
        A --> D["VTXO D (preconfirmed, depth 1)"]
        B --> E["VTXO E (preconfirmed, depth 1)"]
        C --> F["VTXO F (preconfirmed, depth 2)"]
        C --> G["VTXO G (preconfirmed, depth 2)"]
        D --> H["VTXO H (preconfirmed, depth 2)"]
        E --> I["VTXO I (preconfirmed, depth 2)"]
        E --> J["VTXO J (preconfirmed, depth 2)"]
    end
Loading

Context: PR #908 — marker DAG for VTXO records

PR #908 introduced a marker system — DAG checkpoints every 100 depths. Before the BFS loop starts, preloadVtxosByMarkers bulk-loads all VTXO records into vtxoCache, eliminating per-VTXO DB calls.

graph TD
    subgraph "Marker DAG (small, ~N/100 nodes)"
        M0["Marker M0 (depths 0-99)"]
        M1["Marker M1 (depths 100-199)"]
        M2["Marker M2 (depths 200-299)"]
        M0 --> M1 --> M2
    end

    subgraph "VTXO tree (large, N nodes)"
        V0["~100 VTXOs (depths 0-99)"]
        V1["~100 VTXOs (depths 100-199)"]
        V2["~100 VTXOs (depths 200-299)"]
    end

    M0 -.->|covers| V0
    M1 -.->|covers| V1
    M2 -.->|covers| V2
Loading

Problem #1: O(N) OffchainTx fetches (fixed by change #1)

After PR #908, VTXO records come from cache — but each preconfirmed VTXO still triggered an individual GetOffchainTx call. For 511 preconfirmed VTXOs in a wide tree, that's 511 sequential DB round-trips.

Change #1 batches those per BFS level using GetOffchainTxsByTxids:

graph TD
    subgraph "BFS iteration: 3 preconfirmed VTXOs at depth 1"
        C["VTXO C - txid: abc1"]
        D["VTXO D - txid: abc2"]
        E["VTXO E - txid: abc3"]
    end

    subgraph "Old: 3 individual queries"
        Q1["SELECT ... WHERE txid = 'abc1'"]
        Q2["SELECT ... WHERE txid = 'abc2'"]
        Q3["SELECT ... WHERE txid = 'abc3'"]
    end

    subgraph "New: 1 bulk query"
        QB["SELECT ... WHERE txid IN ('abc1','abc2','abc3')"]
    end

    C --> Q1
    D --> Q2
    E --> Q3
    C --> QB
    D --> QB
    E --> QB
Loading

Problem #2: linear chains still pay O(N) round-trips (fixed by change #2)

The bulk fetch from change #1 helps wide trees — batch size K per iteration collapses K queries into 1. But a length-N linear chain has batch size 1 at every iteration, so it still does N sequential round-trips, just via the bulk API instead of the single API. Zero improvement.

Change #2 solves this by piggybacking offchain tx loading onto the existing marker DAG walk that already preloads VTXO records:

sequenceDiagram
    participant W as walkVtxoChain
    participant MDAG as Marker DAG walk
    participant DB as Database

    Note over W,MDAG: Before BFS loop starts
    W->>MDAG: preloadByMarkers(startVtxos)

    loop For each marker window (N/100 iterations)
        MDAG->>DB: GetVtxoChainByMarkers(window)
        DB-->>MDAG: 100 VTXO records
        MDAG->>DB: GetOffchainTxsByTxids(preconfirmed txids)
        DB-->>MDAG: 100 OffchainTx records
        MDAG->>DB: GetMarkersByIds(window)
        DB-->>MDAG: parent marker IDs
    end

    MDAG-->>W: vtxoCache + offchainTxCache fully populated

    Note over W,DB: BFS loop runs entirely from cache
    loop N iterations
        W->>W: vtxoCache lookup (hit)
        W->>W: offchainTxCache lookup (hit)
    end
Loading

Combined effect (10002-entry linear chain)

graph TD
    subgraph "Before this PR"
        BA["VTXO fetches: ~100 bulk queries (markers, from PR #908)"]
        BB["OffchainTx fetches: ~10000 single queries"]
        BC["Total: ~10100 round-trips / 4.03s"]
        BA --> BC
        BB --> BC
    end

    subgraph "After change #1 alone (bulk fetch)"
        MA["VTXO fetches: ~100 bulk queries"]
        MB["OffchainTx fetches: ~10000 bulk-of-1 queries"]
        MC["Total: ~10100 round-trips / 4.03s (linear chain sees no improvement)"]
        MA --> MC
        MB --> MC
    end

    subgraph "After both changes"
        FA["VTXO fetches: ~100 bulk queries (markers)"]
        FB["OffchainTx fetches: ~100 bulk queries (markers)"]
        FC["Total: ~200 round-trips / 0.44s"]
        FA --> FC
        FB --> FC
    end
Loading

10100 → 10100 → 200 round-trips, 4.03s → 4.03s → 0.44s for a 10k linear chain. Change #1 covers wide trees (57× on the 511-VTXO fanout benchmark); change #2 covers deep linear chains (9× end-to-end). Together they reduce round-trips regardless of chain shape.

A new test was added called TestVtxoChainTimingBreakdown which is an in-process phase breakdown in internal/core/application/indexer_bench_test.go that gives per-phase timing for GetVtxoChain (without instrumenting prod code). Output looks like this giving a breakdown of timing:

=== RUN   TestVtxoChainTimingBreakdown
    indexer_bench_test.go:839: GetVtxoChain timing breakdown: linear chain n=10000, simulated repo latency=50µs
    indexer_bench_test.go:839:   wall clock (GetVtxoChain)        385.503759ms
    indexer_bench_test.go:839:   Markers.GetMarkersByIds          104.166695ms  (100 calls)
    indexer_bench_test.go:839:   Markers.GetVtxoChainByMarkers     83.184731ms  (100 calls)
    indexer_bench_test.go:839:   OffchainTxs.GetOffchainTxsByTxids  81.076346ms  (100 calls)
    indexer_bench_test.go:839:   Vtxos.GetVtxos                     1.200509ms  (1 calls)
    indexer_bench_test.go:839:   sum of repo phases               269.628281ms
    indexer_bench_test.go:839:   other (psbt parse + overhead)    115.875478ms
--- PASS: TestVtxoChainTimingBreakdown (0.42s)

What it does:

  1. Builds a 10,000-entry linear preconfirmed VTXO chain with markers as plain Go maps via buildLinearChain — no DB, no network, no signing.
  2. Wraps the fake repos in timing decorators (timingVtxoRepo, timingMarkerRepo, timingOffchainTxRepo) that record time.Since into a shared phaseTimings accumulator on every call and inject a 50µs
    time.Sleep per call to simulate DB round-trip cost so phase ratios are visible against in-memory fakes.
  3. Calls indexerService.GetVtxoChain once end-to-end, asserts the returned chain length, then logs a per-phase breakdown: wall clock, time spent in each repo method with call counts, sum of repo
    phases, and "other" (PSBT parse + in-process overhead).

What it replaces (code i had locally on my branch in prior commits):
The hand-rolled timing accumulators (tEnsureVtxos, tBulkOffchain, tSingleOffchain, tPsbtDeser, tVtxoTree) and the log.WithFields(...).Info("walkVtxoChain timing breakdown") block that previously lived
in walkVtxoChain itself. Prod code now has zero timing instrumentation.

What it does not cover:
Real network, real DB query plans, real PSBT signing, real cross-process latency. Those are still the e2e TestVtxoChain's job. This test is a complement for phase-ratio regression signal, not a
replacement

Tradeoffs & potential concerns:

  • Preload ignores pageSize — peak memory scales with full chain length, not page size. preloadByMarkers walks the marker DAG from the frontier upward to the root before the BFS loop starts, populating
    vtxoCache and offchainTxCache for every VTXO reachable above the frontier. On a 10k-deep chain a paginated request still allocates ~10k Vtxo records + ~10k OffchainTx records (each holding checkpoint
    PSBTs) regardless of pageSize. The early-termination cursor at the BFS level (indexer.go:649) bounds output size but not transient RSS (resident set size). Under concurrent deep-chain requests this is a heap-spike risk.
  • No depth cap on the upward marker walk. Nothing in preloadByMarkers limits how far up the DAG it traverses — it always walks to the root. A frontier deep in a long-lived chain will preload the entire
    ancestry even if the caller only wants a handful of pages near the frontier.
  • No cross-request cache. Every GetVtxoChain call rebuilds vtxoCache / offchainTxCache from scratch. A process-level LRU keyed by marker ID would collapse duplicate work across pagination (and across
    callers hitting overlapping chains), but isn't in this PR.
  • New index idx_checkpoint_tx_offchain_txid adds write-path cost. Every checkpoint tx insert now updates an extra index on both Postgres and SQLite. Low-risk, but worth flagging as a permanent overhead
    traded for read-path latency.
  • Existing (pre-Scale the DAG #908) VTXOs skip the preload path. Same caveat Scale the DAG #908 already called out: VTXOs without marker_ids populated fall through to the in-loop bulk fetch, which helps wide trees but not deep
    linear chains. Operators running against a database with old VTXOs won't see the 9× improvement until those VTXOs age out.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 7, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (1)
  • next-version

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1110ef16-2bf2-440d-9151-207f94ee8614

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bob/GetOffchainTxn-enhancement

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@arkanaai arkanaai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protocol-Critical Review: bulk-fetch offchain txs in walkVtxoChain

FLAG: This PR touches VTXO chain walking (protocol-critical code). Human review and approval required before merge.

Overall Assessment:
The approach is sound -- prefetch all offchain txs needed per iteration in a single bulk query, cache them in-memory, and fall back to individual GetOffchainTx on cache miss. The code is well-structured and the test coverage is thorough. However, I have several concerns that should be addressed before merging.

  1. Correctness -- Bulk vs. Single Fetch Parity

Postgres GetOffchainTxsByTxids result ordering is non-deterministic (Go map iteration). This is acceptable because the caller populates a map (offchainTxCache[tx.ArkTxid] = tx), so ordering does not matter. Good.

The domain-object construction in GetOffchainTxsByTxids is copy-pasted from GetOffchainTx in both Postgres and SQLite backends. This is a maintenance hazard -- if the mapping changes, the bulk variant could diverge. Recommend extracting a shared helper.

  1. Security -- Risk of Missing Transactions

Silent skip on bulk miss is correct. The bulk query filters with COALESCE(fail_reason, '') = '', same as GetOffchainTx. Behavior is consistent. Good.

Badger implementation (badger/ark_repo.go) loops individually with strings.Contains(err.Error(), "not found") to skip missing entries. This string-matching is fragile -- if getOffchainTx changes its error message, this silently breaks. Recommend using a sentinel error or errors.Is pattern. Also, Badger users get zero batching benefit.

  1. Performance -- CRITICAL CONCERN

No upper bound on batch size. SQLite has a default SQLITE_MAX_VARIABLE_NUMBER of 999. The sqlc SLICE expansion creates one ? per txid, so a batch of >999 txids will FAIL on SQLite. A very wide fanout tree could hit this. Recommend adding a batch-size cap (e.g., chunk into groups of 500).

Cache grows unboundedly across the chain walk. Probably fine in practice (bounded by page size), but worth a comment.

  1. Edge Cases

Empty txids slice: All three backends handle correctly. Good.
Partial bulk fetch: Fallback picks up misses via individual GetOffchainTx. Tested. Good.
Cache key mismatch risk: Cache keyed by tx.ArkTxid, looked up by vtxo.Txid. These match (both come from the same column). Confirmed correct.

  1. Test Coverage

Good coverage overall:

  • TestBulkOffchainTxReducesDBCalls validates call counts
  • BenchmarkOffchainTxBulkVsSingle validates wall-clock improvement
  • Integration test for the repo method
  • Exposure tests cover bulk-hit and bulk-miss-with-fallback
  • Existing tests updated with .Maybe()

Minor gap: No test for mixed preconfirmed/non-preconfirmed VTXOs in the same iteration.

Summary:

  • Correctness: OK
  • Security: OK (fallback catches misses)
  • Performance: CONCERN -- no batch-size cap could hit SQLite limits
  • Code quality: Suggestion -- extract shared row-to-domain mapping
  • Badger: Fragile string-based error matching
  • Tests: Good, minor gap

Recommendation: Address the SQLite batch-size limit concern before merging. The rest are non-blocking. This is protocol-critical code -- requesting human sign-off.

@bitcoin-coder-bob
Copy link
Copy Markdown
Collaborator Author

3 branch comparison performance using: go test -timeout=2000s -v -run TestVtxoChain ./internal/test/e2e/ -args -chain-length=5000 -initial-amount=10000

on branch: bob/GetOffchainTxn-enhancement

    vtxo_chain_test.go:130: chain built: 5000 hops in 12m53.751128083s
    vtxo_chain_test.go:149: GetVtxoChain: 10002 entries in 11.189885736s 

on branch: bob/dag-1

    vtxo_chain_test.go:130: chain built: 5000 hops in 20m16.062503786s
    vtxo_chain_test.go:149: GetVtxoChain: 10002 entries in 19.470237384s 

on branch: master

    vtxo_chain_test.go:130: chain built: 5000 hops in 27m2.487438143s
    vtxo_chain_test.go:149: GetVtxoChain: 10002 entries in 30.81779828s 

Copy link
Copy Markdown
Contributor

@arkanaai arkanaai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental Review — new commit 01d06d93 (index + repo implementations)

Previous review: 2026-04-08. This review covers only the new commit.


✅ What's good

Index is correct. CREATE INDEX idx_checkpoint_tx_offchain_txid ON checkpoint_tx (offchain_txid) directly accelerates the LEFT JOIN in offchain_tx_vw (ON offchain_tx.txid = checkpoint_tx.offchain_txid). This is the right index for the new bulk query. Migration up/down files are present for both Postgres and SQLite. ✓

Postgres query uses ANY($1::varchar[]) with pq.Array() — no parameter expansion issue. ✓

Service tests cover both hit and miss cases for GetOffchainTxsByTxids. ✓

COALESCE(fail_reason, '') = '' filter is consistent with GetOffchainTx. ✓


⚠️ Previous concerns NOT addressed in this commit

1. SQLite batch-size limit — still open (MEDIUM)
sqlite/sqlc/queries/query.sql.go uses /*SLICE:txids*/? which expands to one ? per txid. SQLite's default SQLITE_MAX_VARIABLE_NUMBER is 999. A chain walk with >999 unique preconfirmed txids in a single iteration will fail. Postgres is fine (ANY takes a single array parameter). Recommend chunking in the caller or in the SQLite repo implementation — e.g., batches of 500.

2. Row-to-domain mapping duplication — still open (LOW)
postgres/offchain_tx_repo.go:GetOffchainTxsByTxids (lines ~117-175) and sqlite/offchain_tx_repo.go:GetOffchainTxsByTxids (lines ~117-175) both copy-paste the same grouping/mapping logic from their respective GetOffchainTx. A divergence in one backend would be a subtle correctness bug. Extracting a shared rowsToDomainOffchainTx helper would prevent this.

3. Badger fragile string-matching — still open (LOW)
badger/ark_repo.go uses strings.Contains(err.Error(), "not found") to skip missing entries. Not introduced by this commit but still a concern for the feature as a whole.


🔍 New findings in this commit

4. Postgres nullable field handling differs from SQLite — verify correctness (MEDIUM)

In postgres/offchain_tx_repo.go:GetOffchainTxsByTxids:

if vw.CheckpointTxid.Valid && vw.CheckpointTx.Valid {
    checkpointTxs[vw.CheckpointTxid.String] = vw.CheckpointTx.String
    commitmentTxids[vw.CheckpointTxid.String] = vw.CommitmentTxid.String
    if vw.IsRootCommitmentTxid.Valid && vw.IsRootCommitmentTxid.Bool {

In sqlite/offchain_tx_repo.go:GetOffchainTxsByTxids:

if vw.CheckpointTxid != "" && vw.CheckpointTx != "" {
    checkpointTxs[vw.CheckpointTxid] = vw.CheckpointTx
    commitmentTxids[vw.CheckpointTxid] = vw.CommitmentTxid.String
    if vw.IsRootCommitmentTxid.Bool {

This mirrors the existing GetOffchainTx patterns (Postgres uses sql.NullString, SQLite uses COALESCE to empty strings in the view), so it's consistent — but it reinforces concern #2 about duplication. The different null-handling logic between backends is a maintenance hazard.

5. Test coverage gap — no multi-txid bulk fetch test (LOW)
service_test.go tests bulk fetch with a single txid ([]string{arkTxid}). There's no test that fetches multiple distinct txids in a single call and verifies all are returned correctly. This would catch bugs in the grouping logic.


Summary

The new commit is sound — the index is correct and well-placed. The Postgres and SQLite repo implementations are consistent with their existing single-fetch counterparts.

The SQLite batch-size concern from my previous review remains the only MEDIUM risk. Everything else is LOW/cleanup.

⚠️ PROTOCOL-CRITICAL: This PR touches VTXO chain walking. Human review and approval required before merge, regardless of code quality.

@bitcoin-coder-bob
Copy link
Copy Markdown
Collaborator Author

after last commit I got:

test output:

    vtxo_chain_test.go:130: chain built: 5000 hops in 12m43.526453458s
    vtxo_chain_test.go:149: GetVtxoChain: 10002 entries in 4.044252394s 

Copy link
Copy Markdown
Contributor

@arkanaai arkanaai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental Review — commit 2206d890 (timing instrumentation + e2e test)

Previous reviews: initial (2026-04-08), incremental on 01d06d93. This covers only the new commit.


Timing instrumentation (indexer.go)

1. log.Info on every call is too noisy for production (LOW)
walkVtxoChain is called on every GetVtxoChain API request. The timing breakdown logs at Info level unconditionally (indexer.go:789-800). This will spam production logs. Should be Debug level, or gated behind a flag. This is profiling scaffolding — either promote it to a proper metrics emission (histogram/gauge) or downgrade to Debug before merge.

2. Early-return pagination path skips the timing log (LOW)
The pagination early-return at indexer.go:658 (return chain, allOutpoints, token, nil) bypasses the timing log at line 789. This means you only get timing data when the chain is fully exhausted (no pagination). For the 5000-hop benchmark that's fine, but in production with pagination enabled, you'll never see these timings. If the intent is production observability, the log should fire on both paths.

3. tOther can go negative (NIT)
tOther := walkTotal - tEnsureVtxos - tBulkOffchain - tSingleOffchain - tPsbtDeser - tVtxoTreetime.Since calls are not monotonically additive with a single walkStart. GC pauses, goroutine scheduling, and the non-timed work between measurements mean tOther could theoretically be negative. Unlikely to matter in practice, but worth a comment.


E2e test (vtxo_chain_test.go)

4. Private key logged in cleartext (MEDIUM)
vtxo_chain_test.go:66: t.Logf("wallet seed: %s", seed) logs the raw private key hex. Even in test output, CI logs can be persisted and shared. For a regtest test this is low-risk, but it's bad hygiene — especially in a protocol-critical codebase. Log a truncated fingerprint instead, or remove entirely (the seed can be passed via -seed flag for reproducibility).

5. Tight spin loop with no backoff (MEDIUM)
vtxo_chain_test.go:112-115:

for len(spendable) == 0 {
    spendable, _, err = client.ListVtxos(ctx)
    require.NoError(t, err)
}

This is a busy-wait loop hammering the server with zero delay. On a slow CI or loaded server, this burns CPU and floods the server with RPCs. Add a time.Sleep(100 * time.Millisecond) or use a polling helper with exponential backoff.

6. notifyErr goroutine error handling is correct but fragile (LOW)
The pattern of var notifyErr error + wg.Go(func() { ..., notifyErr = ... }) + wg.Wait() + require.NoError(t, notifyErr) works because WaitGroup.Wait() provides happens-before. But if someone later adds a second goroutine writing to notifyErr, it becomes a data race. Consider using errgroup.Group instead, which is the idiomatic Go pattern for this.

7. time.Sleep for synchronization (LOW)
vtxo_chain_test.go:100 (time.Sleep(time.Second)) and line 133 (time.Sleep(2 * time.Second)) are timing-based synchronization. Flaky under load. Acceptable for a manual perf test, but if this is meant to run in CI, it'll eventually flake.


Previously raised concerns — status

# Concern Status
1 SQLite batch-size limit (MEDIUM) Still open
2 Row-to-domain mapping duplication (LOW) Still open
3 Badger fragile string-matching (LOW) Still open

Summary

This commit is profiling/test scaffolding — no protocol logic changes. The timing instrumentation and e2e test are useful for development but need cleanup before merge:

  • Downgrade timing log to Debug or convert to proper metrics
  • Remove private key logging from the test
  • Add backoff to the polling loop

No new protocol-correctness concerns. Previous MEDIUM (SQLite batch-size) remains open.

⚠️ PROTOCOL-CRITICAL: Human review and approval still required before merge.

Copy link
Copy Markdown
Contributor

@arkanaai arkanaai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental Review — commits f3abfe23 (SQLite chunking + multi-txid test) & 1cf3a4ce (lint)

Reviewed the two new commits since my last review of 2206d890.

Changes

  1. SQLite batch chunking (sqlite/offchain_tx_repo.go): GetOffchainTxsByTxids now chunks txid slices into batches of 500 (sqliteMaxBulkTxids) to stay under SQLITE_MAX_VARIABLE_NUMBER (default 999). Correct fix — the sqlc SLICE expansion emits one ? per txid, so unbounded input would blow the parameter limit.

  2. Multi-txid integration test (service_test.go): Inserts two distinct offchain txs, fetches both + a missing txid in one call, and asserts no cross-txid checkpoint contamination. Good coverage of the row-grouping logic.

  3. Lint (indexer.go): Field alignment in the timing log. No-op.

Verdict

All three look correct. Specific notes:

  • sqliteMaxBulkTxids = 500: Sensible headroom under the 999 limit. The comment correctly explains why. ✅
  • Postgres impl (postgres/offchain_tx_repo.go): No chunking — PG's default param limit is 65535, so this is fine for realistic workloads. No action needed now.
  • Cross-contamination assertions in the test (require.NotContains for checkpoint txids across txs) are exactly the right thing to test given the grouped-row assembly. ✅
  • No protocol-critical changes in this increment — it's purely infra plumbing and test coverage.

No new issues. Previous review findings still apply (see earlier comments re: %s vs %w for error wrapping).

🤖 arkana · protocol-critical review

@bitcoin-coder-bob
Copy link
Copy Markdown
Collaborator Author

 vtxo_chain_test.go:149: GetVtxoChain: 10002 entries in 445.412779ms

Copy link
Copy Markdown
Contributor

@arkanaai arkanaai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental Review — commits 573effa0, 91c7a054, 6ab1ed61

Previous reviews: initial (2026-04-08), 01d06d93 (index + repo), 2206d890 (timing + e2e), f3abfe23/1cf3a4ce (SQLite chunking + lint). This covers the three new commits.


What changed

  1. 573effa0 — preload offchain txs via marker DAG + nbxplorer bump

    • preloadVtxosByMarkerspreloadByMarkers: now also bulk-fetches offchain txs during the marker DAG walk, so the main loop's in-loop bulk fetch becomes a fallback rather than the primary path. This is the key optimization that dropped the 5000-hop benchmark from ~4s to ~445ms.
    • nbxplorer image bumped from 2.5.30 to 2.5.30-1 (regtest only).
  2. 91c7a054 — nil-guard on offchainTxRepo

    • Adds && offchainTxRepo != nil guard in preloadByMarkers so test helpers that don't wire up the offchain-tx repo don't panic. Comment explains the fallback path.
  3. 6ab1ed61 — move timing instrumentation out of prod into test

    • Removes all time.Now()/time.Since() timing accumulators and the log.Info breakdown from walkVtxoChain (production code).
    • Adds TestVtxoChainTimingBreakdown with timing-decorated repo wrappers (phaseTimings, timingVtxoRepo, timingMarkerRepo, timingOffchainTxRepo) in the bench test file.
    • This directly addresses my previous review feedback (timing logs too noisy for production). ✅

✅ What's good

  • Timing removed from prod code. All profiling scaffolding is now test-only. Previous concern resolved. ✅
  • preloadByMarkers piggyback is correct. Deduplicates txids via seen map, skips already-cached entries, only calls GetOffchainTxsByTxids when there are actual misses. The in-loop bulk fetch + individual fallback remain as safety nets. ✅
  • Nil-guard is the right fix. Comment explains why the guard exists and what happens when it fires (walk loop fallback). ✅
  • Test timing wrappers are well-structured. phaseTimings with mutex is correct for concurrent use. timingVtxoRepo/timingMarkerRepo/timingOffchainTxRepo delegate cleanly. wrappedRepoManager is minimal and explicit about which repos are wired. ✅
  • nbxplorer bump is regtest-only. No production impact. ✅

⚠️ Notes (non-blocking)

1. SQLite batch-size limit in preloadByMarkers — inherited from GetOffchainTxsByTxids (MEDIUM, previously raised)
The new piggyback path in preloadByMarkers calls GetOffchainTxsByTxids(ctx, missingTxids) without its own chunking. This is fine because the SQLite implementation already chunks internally (commit f3abfe23). Just confirming this is covered — no action needed. ✅

2. time.Sleep in timing wrappers simulates latency but doesn't measure real DB overhead (NIT)
timingVtxoRepo et al. add simulatedLatency via time.Sleep which inflates wall clock but doesn't reflect real query cost. This is fine for a relative phase breakdown, but the absolute numbers are meaningless. The test name (TimingBreakdown) could mislead. Consider adding a comment noting the simulated nature, or renaming to TestVtxoChainPhaseProportions.

3. wrappedRepoManager returns nil for unused repos (LOW)
Events(), Rounds(), ScheduledSession(), Convictions(), Assets(), Fees() all return nil. If any code path in walkVtxoChain or GetVtxoChain ever touches these, it'll panic with a nil pointer instead of a descriptive error. Acceptable for a focused test, but a panic("not wired") would be safer.


Previously raised concerns — final status

# Concern Status
1 SQLite batch-size limit ✅ Fixed in f3abfe23
2 Row-to-domain mapping duplication Still open (LOW)
3 Badger fragile string-matching Still open (LOW)
4 Timing logs too noisy for prod ✅ Fixed in 6ab1ed61
5 Private key logged in e2e test Still open (LOW, regtest only)
6 Tight spin loop in e2e test Still open (LOW)

Summary

These three commits are clean. The main optimization (offchain tx preloading via marker DAG) is correct and well-tested. The timing instrumentation has been properly moved out of production code into test infrastructure, addressing the key concern from my last review.

No new issues found. All previous MEDIUM concerns have been addressed. Remaining items are LOW-priority cleanup.

Performance results speak for themselves: 5000-hop chain walk went from ~30.8s (master) → ~11.2s (initial PR) → ~4.0s (with index) → ~445ms (with marker-DAG preloading). Excellent work.

⚠️ PROTOCOL-CRITICAL: This PR modifies VTXO chain walking logic. Human review and approval required before merge, regardless of code quality.

🤖 arkana · protocol-critical review

@bitcoin-coder-bob bitcoin-coder-bob marked this pull request as ready for review April 14, 2026 19:44
Copy link
Copy Markdown
Contributor

@arkanaai arkanaai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental Review — commit 3a04208e

Reviewed the new commit ("clarify simulated latency in timing breakdown and panic on unwired repo accessors"). This is test-only — no production code changed.

Changes

  1. Simulated latency comment — adds a doc comment to TestVtxoChainTimingBreakdown clarifying that absolute numbers don't reflect real DB cost, only relative phase proportions. Directly addresses my previous note #2. ✅

  2. panic("not wired") on unwired reposwrappedRepoManager accessors for Events, Rounds, ScheduledSession, Convictions, Assets, Fees now panic with a descriptive message instead of returning nil. Directly addresses my previous note #3. ✅

Verdict

Both changes are correct and implement exactly what was suggested. No new issues introduced.

All previously raised concerns have been addressed across the commit history:

  • ✅ SQLite batch chunking (500 cap) — f3abfe23
  • ✅ Timing instrumentation removed from prod — 6ab1ed61
  • ✅ Simulated latency documented — this commit
  • ✅ Nil-guard panics — this commit

Approving the code quality and implementation. Human review still required — this PR touches VTXO chain walking which is protocol-critical.

🤖 arkana · protocol-critical review

Copy link
Copy Markdown
Contributor

@arkanaai arkanaai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental Review — commit 3a04208e

Reviewed the new commit ("clarify simulated latency in timing breakdown and panic on unwired repo accessors"). This is test-only — no production code changed.

Changes

  1. Simulated latency comment — adds a doc comment to TestVtxoChainTimingBreakdown clarifying that absolute numbers don't reflect real DB cost, only relative phase proportions. Directly addresses my previous note #2. ✅

  2. panic("not wired") on unwired reposwrappedRepoManager accessors for Events, Rounds, ScheduledSession, Convictions, Assets, Fees now panic with a descriptive message instead of returning nil. Directly addresses my previous note #3. ✅

Verdict

Both changes are correct and implement exactly what was suggested. No new issues introduced.

All previously raised concerns have been addressed across the commit history:

  • ✅ SQLite batch chunking (500 cap) — f3abfe23
  • ✅ Timing instrumentation removed from prod — 6ab1ed61
  • ✅ Simulated latency documented — this commit
  • ✅ Nil-guard panics — this commit

Approving the code quality and implementation. Human review still required — this PR touches VTXO chain walking which is protocol-critical.

🤖 arkana · protocol-critical review

@bitcoin-coder-bob bitcoin-coder-bob merged commit 69879d0 into bob/dag-1 Apr 15, 2026
1 check passed
@arkanaai arkanaai Bot mentioned this pull request Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants