Skip to content

Scale the DAG#908

Open
bitcoin-coder-bob wants to merge 61 commits intomasterfrom
bob/dag-1
Open

Scale the DAG#908
bitcoin-coder-bob wants to merge 61 commits intomasterfrom
bob/dag-1

Conversation

@bitcoin-coder-bob
Copy link
Copy Markdown
Collaborator

@bitcoin-coder-bob bitcoin-coder-bob commented Feb 10, 2026

Closes #833

Summary

Introduces a marker DAG — sparse checkpoints placed every 100 depths in the VTXO chain — that enables bulk DB operations instead of per-VTXO queries. Combined with bulk offchain tx fetching (merged from #1005), GetVtxoChain on a 10k-deep linear chain drops from thousands of sequential DB round-trips to ~200 bulk calls.

What changed

  1. Marker system — new marker and swept_marker tables, depth and markers JSONB columns on vtxo, with GIN indexes. Markers are created at every MarkerInterval = 100 depths. Each VTXO carries the IDs of markers covering it.
  2. Bulk VTXO preload via marker DAGpreloadByMarkers() walks the marker DAG upward from the frontier, bulk-loading all VTXO records into vtxoCache in ~N/100 calls instead of N individual fetches.
  3. Bulk offchain tx preload (from bulk-fetch offchain txs in walkVtxoChain to reduce DB round-trips #1005) — piggybacked onto the marker DAG walk, GetOffchainTxsByTxids bulk-fetches offchain txs per marker window into offchainTxCache, so the BFS loop never hits the DB for individual offchain tx lookups.
  4. In-loop bulk offchain tx fallback (from bulk-fetch offchain txs in walkVtxoChain to reduce DB round-trips #1005) — for VTXOs outside a marker window (race conditions, legacy VTXOs without markers), each BFS iteration collects missing txids and does one bulk fetch via GetOffchainTxsByTxids instead of one-by-one.
  5. Swept markers replace per-VTXO swept column — the swept boolean column is removed from vtxo; swept status is now computed dynamically via EXISTS (SELECT 1 FROM swept_marker sm WHERE v.markers @> jsonb_build_array(sm.marker_id)). BulkSweepMarkers sweeps all VTXOs under a marker in one operation.
  6. Cursor-based pagination for GetVtxoChainpageToken/nextPageToken using base64-encoded frontier outpoints.
  7. depth field on VTXOs — exposed via protobuf for clients to track chain position.
  8. New DB index idx_checkpoint_tx_offchain_txid on checkpoint_tx(offchain_txid) to accelerate the bulk offchain tx query's join.

Notes

  • Existing VTXOs (pre-migration) get a backfill marker per-VTXO but won't benefit from the deep-chain marker DAG optimization since they lack proper depth/parent-marker linkage. New VTXOs created after migration inherit markers from application code.
  • Preload ignores pageSize — peak memory scales with full chain length. A process-level LRU cache keyed by marker ID could collapse duplicate work across pagination calls but is out of scope.
  • idx_checkpoint_tx_offchain_txid adds a small write-path cost per checkpoint tx insert.

Visual explainer

Background: the VTXO tree and walkVtxoChain

A VTXO chain is a tree of transactions. walkVtxoChain does a BFS — it processes all VTXOs at one depth, discovers their children via checkpoint tx inputs, then moves to the next depth.

Each VTXO is either on-chain (leaf of a batch tree) or preconfirmed (created by an offchain Ark tx). Preconfirmed VTXOs have a corresponding OffchainTx record containing checkpoint PSBTs whose inputs point to parent VTXOs — that's how the traversal discovers the next level.

graph TD
    subgraph "VTXO tree walkVtxoChain traverses"
        R["Round Commitment Tx (on-chain root)"]
        R --> A["VTXO A (depth 0)"]
        R --> B["VTXO B (depth 0)"]
        A --> C["VTXO C (preconfirmed, depth 1)"]
        A --> D["VTXO D (preconfirmed, depth 1)"]
        B --> E["VTXO E (preconfirmed, depth 1)"]
        C --> F["VTXO F (preconfirmed, depth 2)"]
        C --> G["VTXO G (preconfirmed, depth 2)"]
        D --> H["VTXO H (preconfirmed, depth 2)"]
        E --> I["VTXO I (preconfirmed, depth 2)"]
        E --> J["VTXO J (preconfirmed, depth 2)"]
    end
Loading

Marker DAG: compressed traversal structure

Instead of visiting every VTXO individually, the marker DAG creates checkpoints every 100 depths. preloadByMarkers walks this small DAG to bulk-load all VTXOs and their offchain txs before the BFS loop starts.

graph TD
    subgraph "Marker DAG (small, ~N/100 nodes)"
        M0["Marker M0 (depths 0-99)"]
        M1["Marker M1 (depths 100-199)"]
        M2["Marker M2 (depths 200-299)"]
        M0 --> M1 --> M2
    end

    subgraph "VTXO tree (large, N nodes)"
        V0["~100 VTXOs (depths 0-99)"]
        V1["~100 VTXOs (depths 100-199)"]
        V2["~100 VTXOs (depths 200-299)"]
    end

    M0 -.->|covers| V0
    M1 -.->|covers| V1
    M2 -.->|covers| V2
Loading

How preloadByMarkers works

The marker DAG walk bulk-fetches both VTXO records and offchain txs per marker window, so the BFS loop runs entirely from cache:

sequenceDiagram
    participant W as walkVtxoChain
    participant MDAG as Marker DAG walk
    participant DB as Database

    Note over W,MDAG: Before BFS loop starts
    W->>MDAG: preloadByMarkers(startVtxos)

    loop For each marker window (N/100 iterations)
        MDAG->>DB: GetVtxoChainByMarkers(window)
        DB-->>MDAG: ~100 VTXO records
        MDAG->>DB: GetOffchainTxsByTxids(preconfirmed txids)
        DB-->>MDAG: ~100 OffchainTx records
        MDAG->>DB: GetMarkersByIds(window)
        DB-->>MDAG: parent marker IDs
    end

    MDAG-->>W: vtxoCache + offchainTxCache fully populated

    Note over W,DB: BFS loop runs entirely from cache
    loop N iterations
        W->>W: vtxoCache lookup (hit)
        W->>W: offchainTxCache lookup (hit)
    end
Loading

In-loop bulk fetch fallback (for VTXOs without markers)

For wide fanout trees or VTXOs without marker coverage, the BFS loop itself collects missing offchain txids per iteration and batch-fetches them:

graph TD
    subgraph "BFS iteration: 3 preconfirmed VTXOs at depth 1"
        C["VTXO C - txid: abc1"]
        D["VTXO D - txid: abc2"]
        E["VTXO E - txid: abc3"]
    end

    subgraph "Old: 3 individual queries"
        Q1["SELECT ... WHERE txid = 'abc1'"]
        Q2["SELECT ... WHERE txid = 'abc2'"]
        Q3["SELECT ... WHERE txid = 'abc3'"]
    end

    subgraph "New: 1 bulk query"
        QB["SELECT ... WHERE txid IN ('abc1','abc2','abc3')"]
    end

    C --> Q1
    D --> Q2
    E --> Q3
    C --> QB
    D --> QB
    E --> QB
Loading

Combined effect (10k-entry linear chain)

graph TD
    subgraph "Before (no markers, no bulk fetch)"
        BA["VTXO fetches: ~10000 individual queries"]
        BB["OffchainTx fetches: ~10000 individual queries"]
        BC["Total: ~20000 round-trips"]
        BA --> BC
        BB --> BC
    end

    subgraph "After markers only (no offchain tx preload)"
        MA["VTXO fetches: ~100 bulk queries (markers)"]
        MB["OffchainTx fetches: ~10000 individual queries"]
        MC["Total: ~10100 round-trips"]
        MA --> MC
        MB --> MC
    end

    subgraph "After markers + offchain tx preload"
        FA["VTXO fetches: ~100 bulk queries (markers)"]
        FB["OffchainTx fetches: ~100 bulk queries (markers)"]
        FC["Total: ~200 round-trips"]
        FA --> FC
        FB --> FC
    end
Loading

Benchmarks

Per-phase timing breakdown (10k-entry linear chain, 50µs simulated latency)

TestVtxoChainTimingBreakdown — in-process test with timing-decorated repos:

GetVtxoChain timing breakdown: linear chain n=10000, simulated repo latency=50µs
  wall clock (GetVtxoChain)        349ms
  Markers.GetMarkersByIds           92ms  (100 calls)
  Markers.GetVtxoChainByMarkers     57ms  (100 calls)
  OffchainTxs.GetOffchainTxsByTxids  53ms  (100 calls)
  Vtxos.GetVtxos                     1ms  (1 calls)
  sum of repo phases               203ms
  other (psbt parse + overhead)    145ms

All three repo phases (markers, VTXOs, offchain txs) make exactly 100 calls each — one per marker window. The remaining 145ms is PSBT deserialization and in-process overhead.

Call-counting test: 511-VTXO fanout tree

TestBulkOffchainTxReducesDBCalls — exercises the in-loop bulk fetch in isolation:

Bulk calls Individual calls Total round-trips
With bulk prefetch 9 0 9
Without (pre-optimization) 0 511 511

~57x reduction in DB round-trips. The 9 bulk calls correspond to 9 BFS depth levels.

Latency benchmark: 511-VTXO fanout tree

BenchmarkOffchainTxBulkVsSingle — 50µs simulated DB latency:

Time/op
Bulk prefetch 14.1ms
No bulk (fallback) 390ms
Speedup ~28x

Test coverage

  • TestVtxoChainTimingBreakdown — per-phase wall-clock breakdown for a 10k linear chain with timing-decorated repos. No prod instrumentation needed.
  • TestBulkOffchainTxReducesDBCalls — asserts bulk fetch eliminates individual GetOffchainTx calls on a fanout tree.
  • BenchmarkOffchainTxBulkVsSingle — wall-clock comparison with simulated latency.
  • BenchmarkGetVtxoChain — linear chains (1k-5k), fanout trees, and diamond patterns with/without markers.
  • Repo-level tests — multi-txid bulk fetch, marker CRUD, sweep operations, across Postgres, SQLite, and Badger.
  • Domain tests — marker boundary detection, inheritance, deep chains (20k+), multiple parents, deduplication.
  • Pagination e2e — cursor-based pagination with early termination.

DB schema changes

New tables

  • marker(id TEXT PK, depth INTEGER, parent_markers JSONB) — DAG checkpoints every 100 depths
  • swept_marker(marker_id TEXT PK FK→marker, swept_at BIGINT) — append-only sweep records

Modified tables

  • vtxo — added depth INTEGER NOT NULL DEFAULT 0, markers JSONB NOT NULL DEFAULT '[]' (GIN indexed); removed swept column

New indexes

  • idx_vtxo_markers — GIN index on vtxo.markers
  • idx_marker_depth — B-tree on marker.depth
  • idx_marker_parent_markers — GIN on marker.parent_markers
  • idx_checkpoint_tx_offchain_txid — B-tree on checkpoint_tx.offchain_txid

View changes

  • vtxo_vw rebuilt to compute swept dynamically: EXISTS (SELECT 1 FROM swept_marker sm WHERE v.markers @> jsonb_build_array(sm.marker_id))

Migration backfill

  • Creates one marker per existing VTXO (using outpoint as ID)
  • Assigns that marker to each VTXO's markers column
  • Migrates swept VTXOs to swept_marker table
  • Drops swept column from vtxo

Additional Notes:
The docker-compose.regtest.yml was updated such that the nbxplorer image tag was bumped up.

Summary by CodeRabbit

  • New Features

    • Token-based pagination for VTXO chain queries (pageToken / nextPageToken).
    • VTXO depth and marker metadata added to responses.
  • Improvements

    • Marker-based caching and prefetching to speed chain traversal.
    • Swept state tracking moved to marker/swept records and per-outpoint sweep support.
    • Batch offchain transaction fetch to reduce lookup overhead.
  • Database

    • Migrations to add depth, markers, and swept tracking.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds marker-based DAG scaling and cursored pagination to VTXO chain traversal: VTXOs gain depth and marker metadata; marker and swept_marker stores enable bulk sweeping; GetVtxoChain gains token-based pagination and a frontier cursor; traversal rewritten to walk marker skeletons with caching and bulk offchain fetches.

Changes

Cohort / File(s) Summary
API Schemas
api-spec/openapi/.../indexer.openapi.json, api-spec/openapi/.../service.openapi.json, api-spec/openapi/.../types.openapi.json, api-spec/protobuf/.../indexer.proto, api-spec/protobuf/.../types.proto
Added pageToken/nextPageToken for GetVtxoChain; added depth to Vtxo/IndexerVtxo schemas and protobuf messages.
Indexer service & pagination
internal/core/application/indexer.go, internal/core/application/types.go, internal/core/application/indexer_test.go, internal/core/application/indexer_bench_test.go
Replaced full-chain builder with cursored walkVtxoChain; added encode/decode HMAC-signed cursor tokens, per-page VTXO/offchain caches, marker prefetching, and related tests/benchmarks.
Domain model & auth
internal/core/domain/vtxo.go, internal/core/domain/marker.go, internal/core/domain/marker_repo.go, internal/core/domain/offchain_tx.go, internal/core/domain/offchain_tx_event.go, internal/core/domain/offchain_tx_repo.go
Added Depth and MarkerIDs to Vtxo; new Marker and SweptMarker types; MarkerRepository interface; OffchainTx carries depth/parent marker IDs; added batch offchain fetch API.
Repository manager & wiring
internal/core/ports/repo_manager.go, internal/infrastructure/db/service.go, internal/config/config.go
Added Markers() accessor to RepoManager; service now constructs/wires marker repositories and optionally offchainTxCache; service exposes Markers().
DB implementations (Postgres/SQLite/Badger)
internal/infrastructure/db/postgres/..., internal/infrastructure/db/sqlite/..., internal/infrastructure/db/badger/...
Implemented MarkerRepository for Postgres/SQLite/Badger; added VTXO depth/markers persistence; removed per-vtxo SweepVtxos; added batch swept_marker and swept_vtxo handling; implemented GetOffchainTxsByTxids across backends.
SQLC + queries + migrations
internal/infrastructure/db/postgres/sqlc/*, internal/infrastructure/db/sqlite/sqlc/*, internal/infrastructure/db/*/migration/*
Updated generated models and queries to include depth and markers, added UpsertMarker/InsertSweptMarker/BulkInsertSweptVtxos, descendant traversal, marker-based VTXO queries, and migrations to backfill markers and swept state.
Sweeper & projection changes
internal/core/application/sweeper.go, internal/core/application/service.go, internal/core/application/sweeper_test.go
Replaced per-VTXO sweeping with marker-based BulkSweepMarkers/SweepVtxoOutpoints flow; updated projection retry logic; added tests for checkpoint sweep path and swept-outpoint semantics.
VTXO repos & mapping
internal/infrastructure/db/postgres/vtxo_repo.go, internal/infrastructure/db/sqlite/vtxo_repo.go, internal/infrastructure/db/badger/vtxo_repo.go
UpsertVtxo now stores depth and markers; Vtxo mapping parses markers and depth; removed SweepVtxos method; added store accessors.
GRPC handlers & proto mapping
internal/interface/grpc/handlers/indexer.go, internal/interface/grpc/handlers/parser.go, internal/interface/grpc/handlers/parser_test.go
Forwarded pageToken to service, return NextPageToken; include Depth in proto conversions and tests.
Token cache
internal/core/application/token_cache.go
Added touch() and isActive() to support keep-alive semantics for pagination cursors.
Utilities & round ingestion
internal/core/application/utils.go, internal/core/application/utils_test.go
Initialize new VTXOs with Depth=0 and MarkerIDs containing their outpoint; tests added for PSBT-derived VTXO creation.
Tests & E2E
many test files under internal/... and internal/test/e2e/vtxo_chain_test.go
Extensive unit, integration, benchmark, and e2e tests covering marker creation, traversal, pagination, caching, sweeper behavior, and DB backends.
Misc
docker-compose.regtest.yml
Updated nbxplorer image tag.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Indexer as Indexer Service
    participant Cache as Per-Page Cache
    participant VtxoRepo as Vtxo Repo
    participant MarkerRepo as Marker Repo
    participant OffchainRepo as Offchain Tx Repo

    Client->>Indexer: GetVtxoChain(outpoint, pageToken)
    Indexer->>Indexer: decodeChainCursor(pageToken) / derive frontier
    Indexer->>Cache: ensureVtxosCached(frontier)
    Cache->>VtxoRepo: GetVtxos(missing outpoints)
    VtxoRepo-->>Cache: return vtxos (with Depth, MarkerIDs)
    Indexer->>MarkerRepo: GetMarkersByIds(markerIDs)  -- preload
    MarkerRepo-->>Indexer: return marker parents
    Indexer->>VtxoRepo: GetVtxosByMarker(markerID) -- gap-fill
    VtxoRepo-->>Indexer: return related vtxos
    alt preconfirmed txs present
        Indexer->>OffchainRepo: GetOffchainTxsByTxids(txids)
        OffchainRepo-->>Indexer: return offchain txs
    end
    Indexer->>Indexer: build response, encodeChainCursor(remaining)
    Indexer-->>Client: Chain[], NextPageToken
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • altafan
  • louisinger
  • sekulicd
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bob/dag-1

@bitcoin-coder-bob
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 10, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@internal/core/application/indexer.go`:
- Around line 408-413: The loop that walks parent markers (using
marker.ParentMarkerIDs and i.repoManager.Markers().GetMarker) can infinite-loop
on cyclic parent chains; add a visited set (map[string]struct{}) keyed by marker
ID and check it before appending IDs or calling GetMarker to break cycles, and
also deduplicate ParentMarkerIDs when appending to markerIDs so you don't re-add
the same ID; update the loop to mark the current marker ID as visited, skip any
parent IDs already visited, and stop traversal if the next parent is seen.

In `@internal/infrastructure/db/badger/marker_repo.go`:
- Around line 501-514: GetVtxoChainByMarkers currently does a full table scan
via r.vtxoStore.Find(&dtos, &badgerhold.Query{}) and filters in-memory; change
it to query by marker IDs to avoid loading all vtxos: iterate markerIDs (or
batch them) and call r.vtxoStore.Find with badgerhold.Where("MarkerID").Eq(id)
for each id (or badgerhold.Where("MarkerID").In(batch) if supported), collect
matched vtxoDTOs, convert dto.Vtxo into the vtxos slice and return; ensure you
still respect markerIDSet and handle errors per-query.

In
`@internal/infrastructure/db/postgres/migration/20260210100000_add_vtxo_depth.up.sql`:
- Around line 8-13: The Postgres view vtxo_vw currently returns NULL for the
commitments column when no rows exist because it uses
string_agg(vc.commitment_txid, ','); change the SELECT to wrap string_agg with
COALESCE (e.g., COALESCE(string_agg(...), '')) so commitments always yields an
empty string like the SQLite view; update the SELECT that references vtxo,
vtxo_commitment_txid and the commitments alias to use
COALESCE(string_agg(vc.commitment_txid, ','), '').

In `@internal/infrastructure/db/service.go`:
- Around line 752-821: sweepVtxosWithMarkers currently marks a marker swept
before guaranteeing the marker's VTXOs were successfully removed, risking
inconsistent state; change the ordering so you attempt to sweep the marker's
VTXOs first (use markerStore.SweepVtxosByMarker and fall back to
vtxoStore.SweepVtxos for markerVtxos[markerID]) and only if that sweep returns
success call markerStore.SweepMarker(markerID, sweptAt); on any sweep error keep
the marker unmarked, log the failure, and accumulate the fallback count as now
done — update the loop in sweepVtxosWithMarkers to perform
SweepVtxosByMarker/SweepVtxos before calling SweepMarker and adjust error
handling accordingly.
🧹 Nitpick comments (12)
internal/test/e2e/utils_test.go (1)

742-744: Acknowledge the TODO placeholder.

The TODO is clear about the dependency on the SDK proto package exposing Depth. Consider tracking this with a GitHub issue so it doesn't get lost.

Would you like me to open a GitHub issue to track re-enabling setupRawIndexerClient and getVtxoDepthByOutpoint once the SDK proto exposes Depth?

internal/core/domain/marker_test.go (2)

9-35: Good coverage of boundary cases.

The table-driven test covers a solid range including edges (0, 99, 100, 101). Consider using t.Run with a subtest name for each case to get more granular test output on failure.

♻️ Optional: use subtests for better diagnostics
 	for _, tt := range tests {
+		t.Run(fmt.Sprintf("depth_%d", tt.depth), func(t *testing.T) {
 		result := IsAtMarkerBoundary(tt.depth)
 		require.Equal(t, tt.expected, result,
 			"IsAtMarkerBoundary(%d) should be %v", tt.depth, tt.expected)
+		})
 	}

(You'd need to add "fmt" to imports.)


42-55: These tests only verify struct literal construction, not behavior.

TestMarkerStruct and TestSweptMarkerStruct test that Go struct fields hold the values you assign — they don't test any domain logic. They're fine as documentation of the data model but provide no regression protection. Consider adding tests for actual marker operations (creation, parent resolution, etc.) as the marker logic matures.

internal/infrastructure/db/postgres/migration/20260211020000_add_markers.up.sql (1)

2-6: Consider adding NOT NULL DEFAULT '[]'::jsonb to parent_markers.

The column currently allows NULL, which means application code must handle both NULL and empty array. Using a NOT NULL default simplifies queries and Go code that deserializes this field.

♻️ Suggested change
 CREATE TABLE IF NOT EXISTS marker (
     id TEXT PRIMARY KEY,
     depth INTEGER NOT NULL,
-    parent_markers JSONB  -- JSON array of parent marker IDs
+    parent_markers JSONB NOT NULL DEFAULT '[]'::jsonb  -- JSON array of parent marker IDs
 );
internal/core/domain/marker_repo.go (2)

5-44: Large interface — consider whether it could be split, and watch for unbounded queries.

The interface has 16 methods mixing marker lifecycle, sweep operations, and VTXO queries. This is functional but may violate the Interface Segregation Principle as it grows.

More concretely, methods like GetVtxosByMarker, GetVtxosByDepthRange, and GetVtxoChainByMarkers (lines 30, 37, 41) return unbounded []Vtxo slices. If marker/depth ranges can span many VTXOs, callers may hit memory pressure. Consider whether pagination or a limit parameter is warranted for these, especially GetVtxosByDepthRange which could span a very wide range.


6-7: Clarify upsert semantics in the doc comment.

The comment says "creates or updates a marker," but the method signature uses error as the only signal. It may be useful to document whether an update replaces ParentMarkerIDs entirely or merges, and whether updating a marker that has already been swept is allowed.

internal/infrastructure/db/sqlite/migration/20260211000000_add_markers.up.sql (1)

2-6: Consider adding an index on parent_markers for BFS descendant lookups.

The marker table stores parent_markers as a JSON text column. The Badger implementation does BFS by querying markers whose ParentMarkerIDs contains a given ID. If a similar query pattern is used in SQLite (e.g., using json_each to find children), performance could degrade without an index strategy. This is fine for now if the query load is low, but worth keeping in mind.

internal/infrastructure/db/badger/marker_repo.go (2)

42-106: Constructor uses interface{} variadic config — consider a typed options struct.

The NewMarkerRepository(config ...interface{}) pattern with positional interface{} arguments is fragile and hard to use correctly. While this matches the existing codebase pattern (e.g., NewVtxoRepository), a typed config struct would be safer. This is fine for now if consistency with the existing pattern is preferred.


116-136: Retry loop doesn't respect context cancellation.

The retry loops (here and in similar patterns at lines 243, 390, 435) sleep unconditionally without checking ctx.Done(). If the context is cancelled, the function will still retry up to maxRetries times with 100ms sleeps. This is a minor concern given the small retry count.

internal/infrastructure/db/sqlite/marker_repo.go (1)

141-166: Make descendant sweeping atomic to avoid partial state.

If an insert fails mid-loop, some markers are swept and others aren’t. Wrapping the inserts in a single transaction avoids partial sweeps and reduces round-trips.

♻️ Suggested transaction wrapper
 func (m *markerRepository) SweepMarkerWithDescendants(
 	ctx context.Context,
 	markerID string,
 	sweptAt int64,
 ) (int64, error) {
 	// Get all descendant marker IDs (including the root marker) that are not already swept
 	descendantIDs, err := m.querier.GetDescendantMarkerIds(ctx, markerID)
 	if err != nil {
 		return 0, fmt.Errorf("failed to get descendant markers: %w", err)
 	}
 
-	// Insert each descendant into swept_marker
-	var count int64
-	for _, id := range descendantIDs {
-		err := m.querier.InsertSweptMarker(ctx, queries.InsertSweptMarkerParams{
-			MarkerID: id,
-			SweptAt:  sweptAt,
-		})
-		if err != nil {
-			return count, fmt.Errorf("failed to sweep marker %s: %w", id, err)
-		}
-		count++
-	}
-
-	return count, nil
+	tx, err := m.db.BeginTx(ctx, nil)
+	if err != nil {
+		return 0, err
+	}
+	q := queries.New(tx)
+	defer tx.Rollback()
+
+	var count int64
+	for _, id := range descendantIDs {
+		if err := q.InsertSweptMarker(ctx, queries.InsertSweptMarkerParams{
+			MarkerID: id,
+			SweptAt:  sweptAt,
+		}); err != nil {
+			return count, fmt.Errorf("failed to sweep marker %s: %w", id, err)
+		}
+		count++
+	}
+	if err := tx.Commit(); err != nil {
+		return 0, err
+	}
+	return count, nil
 }
internal/infrastructure/db/postgres/marker_repo.go (1)

144-169: Make descendant sweeping atomic to avoid partial state.

Same concern as the sqlite implementation—if the loop fails mid-way, markers can be partially swept.

♻️ Suggested transaction wrapper
 func (m *markerRepository) SweepMarkerWithDescendants(
 	ctx context.Context,
 	markerID string,
 	sweptAt int64,
 ) (int64, error) {
 	// Get all descendant marker IDs (including the root marker) that are not already swept
 	descendantIDs, err := m.querier.GetDescendantMarkerIds(ctx, markerID)
 	if err != nil {
 		return 0, fmt.Errorf("failed to get descendant markers: %w", err)
 	}
 
-	// Insert each descendant into swept_marker
-	var count int64
-	for _, id := range descendantIDs {
-		err := m.querier.InsertSweptMarker(ctx, queries.InsertSweptMarkerParams{
-			MarkerID: id,
-			SweptAt:  sweptAt,
-		})
-		if err != nil {
-			return count, fmt.Errorf("failed to sweep marker %s: %w", id, err)
-		}
-		count++
-	}
-
-	return count, nil
+	tx, err := m.db.BeginTx(ctx, nil)
+	if err != nil {
+		return 0, err
+	}
+	q := queries.New(tx)
+	defer tx.Rollback()
+
+	var count int64
+	for _, id := range descendantIDs {
+		if err := q.InsertSweptMarker(ctx, queries.InsertSweptMarkerParams{
+			MarkerID: id,
+			SweptAt:  sweptAt,
+		}); err != nil {
+			return count, fmt.Errorf("failed to sweep marker %s: %w", id, err)
+		}
+		count++
+	}
+	if err := tx.Commit(); err != nil {
+		return 0, err
+	}
+	return count, nil
 }
internal/infrastructure/db/sqlite/sqlc/query.sql (1)

467-480: Replace LIKE-based JSON matching with json_each() for robustness.

The recursive CTE uses m.parent_markers LIKE '%"' || dm.id || '"%' to check if a marker ID exists in a JSON array. While this works in practice, it's fragile: marker IDs containing SQL LIKE wildcards (%, _) would cause incorrect matches since the code doesn't escape them. The Postgres version correctly uses @> jsonb_build_array(dm.id) (line 473); SQLite should use the equivalent json_each() for consistency and correctness:

Suggested replacement
-    SELECT m.id FROM marker m
-    INNER JOIN descendant_markers dm ON (
-        m.parent_markers LIKE '%"' || dm.id || '"%'
-    )
+    SELECT m.id FROM marker m
+    INNER JOIN descendant_markers dm ON EXISTS (
+        SELECT 1 FROM json_each(m.parent_markers) je WHERE je.value = dm.id
+    )

Comment thread internal/core/application/indexer.go Outdated
Comment thread internal/infrastructure/db/badger/marker_repo.go Outdated
Comment on lines +8 to +13
CREATE VIEW vtxo_vw AS
SELECT v.*, string_agg(vc.commitment_txid, ',') AS commitments
FROM vtxo v
LEFT JOIN vtxo_commitment_txid vc
ON v.txid = vc.vtxo_txid AND v.vout = vc.vtxo_vout
GROUP BY v.txid, v.vout;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent NULL handling between Postgres and SQLite views.

The SQLite vtxo_vw uses COALESCE(group_concat(...), '') (returns empty string when no commitments), but this Postgres view uses bare string_agg(...) (returns NULL when no commitments). This can cause behavioral differences across backends if downstream code doesn't uniformly handle both NULL and empty string.

Consider wrapping with COALESCE for consistency:

Suggested fix
 CREATE VIEW vtxo_vw AS
-SELECT v.*, string_agg(vc.commitment_txid, ',') AS commitments
+SELECT v.*, COALESCE(string_agg(vc.commitment_txid, ','), '') AS commitments
 FROM vtxo v
 LEFT JOIN vtxo_commitment_txid vc
 ON v.txid = vc.vtxo_txid AND v.vout = vc.vtxo_vout
 GROUP BY v.txid, v.vout;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CREATE VIEW vtxo_vw AS
SELECT v.*, string_agg(vc.commitment_txid, ',') AS commitments
FROM vtxo v
LEFT JOIN vtxo_commitment_txid vc
ON v.txid = vc.vtxo_txid AND v.vout = vc.vtxo_vout
GROUP BY v.txid, v.vout;
CREATE VIEW vtxo_vw AS
SELECT v.*, COALESCE(string_agg(vc.commitment_txid, ','), '') AS commitments
FROM vtxo v
LEFT JOIN vtxo_commitment_txid vc
ON v.txid = vc.vtxo_txid AND v.vout = vc.vtxo_vout
GROUP BY v.txid, v.vout;
🤖 Prompt for AI Agents
In
`@internal/infrastructure/db/postgres/migration/20260210100000_add_vtxo_depth.up.sql`
around lines 8 - 13, The Postgres view vtxo_vw currently returns NULL for the
commitments column when no rows exist because it uses
string_agg(vc.commitment_txid, ','); change the SELECT to wrap string_agg with
COALESCE (e.g., COALESCE(string_agg(...), '')) so commitments always yields an
empty string like the SQLite view; update the SELECT that references vtxo,
vtxo_commitment_txid and the commitments alias to use
COALESCE(string_agg(vc.commitment_txid, ','), '').

Comment thread internal/infrastructure/db/service.go
@bitcoin-coder-bob
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 12, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/infrastructure/db/badger/vtxo_repo.go (2)

101-119: ⚠️ Potential issue | 🟠 Major

GetVtxos returns nil, nil when any single outpoint is missing, discarding all previously collected vtxos.

getVtxo returns (nil, nil) for ErrNotFound (Line 476-477), so the strings.Contains(err.Error(), "not found") check on Line 108 is dead code for that path. Instead, execution falls through to Line 113 where vtxo == nil triggers return nil, nil, silently dropping all vtxos already appended. This should continue instead, matching the likely intent of skipping missing outpoints.

Proposed fix
 	for _, outpoint := range outpoints {
 		vtxo, err := r.getVtxo(ctx, outpoint)
 		if err != nil {
-			if strings.Contains(err.Error(), "not found") {
-				continue
-			}
 			return nil, err
 		}
 		if vtxo == nil {
-			return nil, nil
+			continue
 		}
 		vtxos = append(vtxos, *vtxo)
 	}

277-322: ⚠️ Potential issue | 🟡 Minor

Inconsistent filter: query uses Ge (>=) but post-filter uses > (strictly greater).

Lines 286-288 and 295-297 fetch vtxos with Amount >= amountFilter, but Lines 306 and 311 then exclude vtxos where Amount == amountFilter by checking vtxo.Amount > amountFilter. This means vtxos with amount exactly equal to the filter are fetched from the DB but silently dropped. Either the query should use Gt or the post-filter should use >=.

🤖 Fix all issues with AI agents
In `@internal/core/application/service_test.go`:
- Around line 149-154: The test case "no spent vtxos" expects depth 1 but the
service's logic leaves newDepth at 0 when spentOutpoints is empty; update the
test in service_test.go for the case named "no spent vtxos (empty)" to set
expectedDepth to 0 so it matches the actual behavior of the service (referencing
newDepth, maxDepth and spentOutpoints in the service implementation).

In `@internal/infrastructure/db/badger/marker_repo.go`:
- Around line 258-276: SweepMarker currently does a full table scan by calling
r.vtxoStore.Find(&allDtos, &badgerhold.Query{}) for every marker (and
BulkSweepMarkers calls SweepMarker in a loop), causing N full scans; change
SweepMarker to query only VTXOs that contain the marker by using
r.vtxoStore.Find(&filteredDtos,
badgerhold.Where("MarkerIDs").Contains(markerID)) (same pattern as
getDescendantMarkerIds), iterate filteredDtos (type vtxoDTO) and call
r.vtxoStore.Update(outpoint.String(), dto) to set Swept=true and UpdatedAt; this
ensures each marker triggers a targeted query instead of scanning all VTXOs and
avoids the N×full-scan behavior in BulkSweepMarkers.
- Around line 438-462: GetVtxosByMarker currently loads all VTXOs then filters
in memory; change the find to use an indexed query so Badger filters by
MarkerIDs: replace the badgerhold.Query{} call in
markerRepository.GetVtxosByMarker with
badgerhold.Where("MarkerIDs").Contains(markerID) (keeping the same
r.vtxoStore.Find(&dtos, query) pattern), then retain the existing loop to
compute vtxo.Swept via r.isAnyMarkerSwept(dto.MarkerIDs) and append matching
DTOs to the result slice.

In `@internal/infrastructure/db/postgres/marker_repo.go`:
- Around line 159-184: SweepMarkerWithDescendants does inserts in a loop without
a transaction, causing partial commits on failure; wrap the entire operation in
a DB transaction so either all descendant InsertSweptMarker calls succeed or
none do. Start a transaction (e.g., via m.db.BeginTx or your repo's transaction
helper), run GetDescendantMarkerIds and then perform each
queries.InsertSweptMarker using the transactional querier/context (or passing tx
into the querier methods), rollback on any error and commit at the end, and
return the count only after a successful commit; reference functions:
SweepMarkerWithDescendants, GetDescendantMarkerIds, and InsertSweptMarker.

In
`@internal/infrastructure/db/postgres/migration/20260210100000_add_depth_and_markers.down.sql`:
- Around line 1-14: Move the view drops to before any column/table drops: drop
views intent_with_inputs_vw and vtxo_vw first, then drop index idx_vtxo_markers,
drop columns markers and depth from table vtxo, and finally drop marker and
swept_marker tables; update the script so vtxo_vw and intent_with_inputs_vw are
removed prior to altering vtxo to avoid PostgreSQL dependency errors.

In
`@internal/infrastructure/db/postgres/migration/20260210100000_add_depth_and_markers.up.sql`:
- Around line 1-5: Change the new column definition for markers on table vtxo to
be non-nullable with a default empty JSON array by altering the ADD COLUMN
statement for markers to "ADD COLUMN IF NOT EXISTS markers JSONB NOT NULL
DEFAULT '[]'::jsonb" (keep the existing GIN index creation), and ensure any
separate backfill step that populates markers for existing rows is consistent
with this default (remove/adjust redundant backfill or ensure it uses
'[]'::jsonb for rows without markers).

In `@internal/infrastructure/db/postgres/sqlc/queries/query.sql.go`:
- Around line 1770-1776: The SQL in the constant selectVtxosByArkTxid used by
the method SelectVtxosByArkTxid filters on the wrong column (txid); update the
query string to use WHERE ark_txid = $1 (or WHERE ark_txid = `@ark_txid` in the
.sql source) so the function returns VTXOs created by the given ark transaction;
update the selectVtxosByArkTxid SQL in both the Postgres and SQLite query.sql
sources so the generated query and the Queries.SelectVtxosByArkTxid
implementation both filter on ark_txid instead of txid.
- Around line 118-131: The recursive CTE used by GetDescendantMarkerIds scans
marker.parent_markers with the jsonb containment operator (@>), causing repeated
sequential scans; add a migration that creates a GIN index on the parent_markers
column (marker.parent_markers) so the recurrence m.parent_markers @>
jsonb_build_array(dm.id) can use the index; implement the migration file that
executes CREATE INDEX IF NOT EXISTS idx_marker_parent_markers ON marker USING
GIN (parent_markers) and ensure it is applied in your migrations pipeline.

In `@internal/infrastructure/db/service.go`:
- Around line 210-219: The code appends badgerVtxoRepo.GetStore() onto
config.DataStoreConfig which can mutate the original slice's backing array;
instead create a new slice copy of config.DataStoreConfig before appending to
avoid side effects. Locate the block that builds markerConfig (using
config.DataStoreConfig, badgerVtxoRepo.GetStore() and markerStoreFactory) and
replace the direct append with creating a new slice sized to hold the elements,
copying config.DataStoreConfig into it, then append badgerVtxoRepo.GetStore() to
that new slice and pass the new slice to markerStoreFactory.
- Around line 492-496: CreateRootMarkersForVtxos failures are currently only
warned and can leave persisted VTXOs referencing missing markers; update the
block where s.markerStore.CreateRootMarkersForVtxos(ctx, newVtxos) is called to
either (a) retry the CreateRootMarkersForVtxos call with the same retry/backoff
strategy used for persisting VTXOs (mirror the loop around the VTXO
persistence), or (b) if retrying fails, return the error to fail-fast so the
caller can roll back or handle incomplete state; locate the call to
s.markerStore.CreateRootMarkersForVtxos and implement a retry loop (or propagate
the error) and ensure logs include context about the affected VTXO set when
giving up.
- Around line 538-577: The GetVtxos DB failure path leaves newDepth at 0 and
parentMarkerIDs nil which makes IsAtMarkerBoundary(0) treat chained VTXOs as
root markers; update the error path in the block that calls s.vtxoStore.GetVtxos
(inside the loop over offchainTx.CheckpointTxs and subsequent processing) to
avoid creating misleading root markers by either returning the error upward
(propagate the GetVtxos error) or setting newDepth to a sentinel (e.g., a
special unknown value) and ensuring downstream logic
(IsAtMarkerBoundary/newDepth handling) treats that sentinel as “unknown” (no
root marker creation) instead of depth 0, and document the chosen approach in
the same function where newDepth and parentMarkerIDs are computed.

In
`@internal/infrastructure/db/sqlite/migration/20260210000000_add_depth_and_markers.down.sql`:
- Around line 28-33: The down migration tries to copy the removed swept column
from vtxo into vtxo_temp causing "no such column: swept"; fix by reconstructing
swept or restoring the column before the INSERT: either add a temporary swept
column to vtxo (or vtxo_temp) prior to the INSERT (so INSERT INTO vtxo_temp
SELECT ... swept ... FROM vtxo succeeds), or change the INSERT SELECT to compute
swept from swept_marker (join vtxo with swept_marker and derive swept) so the
SELECT no longer references the missing swept column; look for symbols
vtxo_temp, vtxo, swept, swept_marker and vtxo_new when applying the change.

In `@internal/infrastructure/db/sqlite/sqlc/query.sql`:
- Around line 471-484: GetDescendantMarkerIds currently matches parent_markers
via m.parent_markers LIKE '%"' || dm.id || '"%' which is brittle (false
positives for '%'/'_' and overlapping prefixes) and forces full scans; replace
the LIKE with a JSON-aware check using SQLite's json_each (e.g., JOIN/EXISTS
over json_each(m.parent_markers) j WHERE j.value = dm.id) or, better, migrate
parent_markers to a normalized join table (parent_marker mapping) and update
descendant_markers to join that table; also add an integration test for
GetDescendantMarkerIds using marker IDs containing characters like '%'/'_' and
overlapping prefixes to ensure correctness, and document the current limitation
of the LIKE approach in the schema/query comments.
🧹 Nitpick comments (20)
internal/core/application/sweeper_test.go (2)

734-735: Non-obvious Txid values for i >= 26.

string(rune('A'+i)) for i in 0..49 produces ASCII letters A–Z for i < 26, but non-letter characters ([, \, ], …) for i >= 26. This doesn't break the test (uniqueness is preserved), but fmt.Sprintf("child-%d", i) would be clearer and consistent with TestCreateCheckpointSweepTask_LargeMarkerSet (line 1189).

Suggested fix
-		childOutpoints[i] = domain.Outpoint{Txid: "child" + string(rune('A'+i)), VOut: 0}
+		childOutpoints[i] = domain.Outpoint{Txid: fmt.Sprintf("child-%d", i), VOut: 0}

22-158: Consider generating mocks to reduce boilerplate.

~400 lines of hand-rolled mocks for WalletService, VtxoRepository, MarkerRepository, and TxBuilder. Most methods are stubs returning zero values. Using a tool like mockery or counterfeiter would auto-generate these, reduce maintenance burden as interfaces evolve, and keep the test file focused on test logic.

internal/core/domain/marker_repo.go (1)

41-47: VTXO retrieval methods on MarkerRepository blur the boundary with VtxoRepository.

GetVtxosByDepthRange, GetVtxosByArkTxid, and GetVtxoChainByMarkers return []Vtxo and are essentially VTXO queries. Placing them here is understandable since they're marker/depth-optimized, but it means callers now need to know which repository to ask for VTXOs depending on the query pattern. If the interface continues to grow, consider whether a dedicated chain-traversal service or moving these to VtxoRepository with marker-aware implementations would keep the boundaries cleaner.

internal/infrastructure/db/sqlite/sqlc/query.sql (1)

261-270: Liquidity queries now scan every vtxo row with a correlated LIKE subquery.

SelectExpiringLiquidityAmount and SelectRecoverableLiquidityAmount both use EXISTS (SELECT 1 FROM swept_marker sm WHERE v.markers LIKE '%"' || sm.marker_id || '"%'). This is essentially a cross join between vtxo and swept_marker with a LIKE predicate on every pair — O(vtxos × swept_markers) string scans per query. As the number of swept markers grows, these queries will degrade.

Consider caching swept status on the vtxo row itself (a denormalized swept flag updated during BulkSweepMarkers), or evaluating sweep status in the application layer where the marker set is already available.

Also applies to: 273-279

internal/infrastructure/db/postgres/sqlc/query.sql (1)

500-514: Inconsistent projection: SELECT * vs sqlc.embed(vtxo_vw) across vtxo queries.

SelectVtxosByDepthRange, SelectVtxosByArkTxid, and SelectVtxoChainByMarker use SELECT * FROM vtxo_vw, while all other vtxo queries (e.g., SelectAllVtxos, SelectVtxo, SelectSweepableUnrolledVtxos) use SELECT sqlc.embed(vtxo_vw) FROM vtxo_vw. This generates different Go return types — flat structs vs. nested struct { VtxoVw VtxoVw } — requiring different mapping code in the repository layer.

Consider using sqlc.embed(vtxo_vw) consistently so the generated Go types are uniform.

Suggested fix
 -- name: SelectVtxosByDepthRange :many
 -- Get all VTXOs within a depth range, useful for filling gaps between markers
-SELECT * FROM vtxo_vw
+SELECT sqlc.embed(vtxo_vw) FROM vtxo_vw
 WHERE depth >= `@min_depth` AND depth <= `@max_depth`
 ORDER BY depth DESC;
 
 -- name: SelectVtxosByArkTxid :many
 -- Get all VTXOs created by a specific ark tx (offchain tx)
-SELECT * FROM vtxo_vw WHERE txid = `@ark_txid`;
+SELECT sqlc.embed(vtxo_vw) FROM vtxo_vw WHERE txid = `@ark_txid`;
 
 -- name: SelectVtxoChainByMarker :many
 -- Get VTXOs whose markers JSONB array contains any of the given marker IDs
-SELECT * FROM vtxo_vw
+SELECT sqlc.embed(vtxo_vw) FROM vtxo_vw
 WHERE markers ?| `@marker_ids`::TEXT[]
 ORDER BY depth DESC;
internal/infrastructure/db/sqlite/vtxo_repo.go (1)

538-548: Silent error swallowing in parseMarkersJSONFromVtxo could mask data corruption.

If the JSON in the markers column is malformed, this function silently returns nil without any logging. While defensive, this could make it hard to diagnose data integrity issues.

Consider adding a log warning on parse failure, consistent with how other parse errors are handled elsewhere in the codebase.

Optional: add warning log on parse failure
 func parseMarkersJSONFromVtxo(markersJSON string) []string {
 	if markersJSON == "" {
 		return nil
 	}
 	var markerIDs []string
 	if err := json.Unmarshal([]byte(markersJSON), &markerIDs); err != nil {
+		// Log warning to help diagnose data corruption
+		log.WithError(err).Warn("failed to parse markers JSON from vtxo")
 		return nil
 	}
 	return markerIDs
 }
internal/core/application/indexer_test.go (1)

15-299: Consider using a mock generation tool to reduce boilerplate.

The manual mock implementations (~280 lines of stubs) are correct but add significant maintenance burden. Tools like mockery or moq could auto-generate these from the interfaces and keep them in sync as the repository interfaces evolve.

That said, the explicit nil-interface handling in Markers() (lines 288–294) is a valuable pattern worth keeping regardless.

internal/infrastructure/db/badger/marker_repo.go (1)

524-528: Dead error handling — assign-then-discard pattern is misleading.

Line 525 calls r.SweepMarker(...) and assigns to err, then line 528 discards it with _ = err. This is confusing — use _ = directly on the call.

Simplify the error discard
-	if err := r.SweepMarker(ctx, markerID, time.Now().Unix()); err != nil {
-		// Non-fatal - the vtxos are already marked as swept
-		_ = err
-	}
+	// Non-fatal - the vtxos are already marked as swept
+	_ = r.SweepMarker(ctx, markerID, time.Now().Unix())
internal/infrastructure/db/sqlite/marker_repo.go (2)

398-508: Four nearly identical rowToVtxoFrom* functions — consider a shared mapper.

rowToVtxoFromMarkerQuery, rowToVtxoFromDepthRangeQuery, rowToVtxoFromArkTxidQuery, and rowToVtxoFromChainQuery all perform the same mapping from VtxoVw embedded in different sqlc row types. Since the inner row.VtxoVw is the same type (queries.VtxoVw), you could extract a shared vtxoVwToDomain(vw queries.VtxoVw) domain.Vtxo and call it from each wrapper, reducing ~100 lines of duplication.

Note that vtxo_repo.go already has rowToVtxo(row queries.VtxoVw) which does essentially the same mapping — you could reuse that directly.

Consolidate using the existing rowToVtxo from vtxo_repo.go
 func rowToVtxoFromMarkerQuery(row queries.SelectVtxosByMarkerIdRow) domain.Vtxo {
-	var commitmentTxids []string
-	if commitments, ok := row.VtxoVw.Commitments.(string); ok && commitments != "" {
-		commitmentTxids = strings.Split(commitments, ",")
-	}
-	return domain.Vtxo{
-		Outpoint: domain.Outpoint{
-			Txid: row.VtxoVw.Txid,
-			VOut: uint32(row.VtxoVw.Vout),
-		},
-		// ... all fields ...
-	}
+	return rowToVtxo(row.VtxoVw)
 }

Apply the same pattern to all four functions.


510-519: Duplicate parseMarkersJSON — already exists as parseMarkersJSONFromVtxo in vtxo_repo.go.

Both functions in this package have identical logic. Consolidate into a single shared function.

internal/core/application/indexer.go (2)

416-431: Consider batching GetMarker calls to reduce DB round-trips during BFS.

Each iteration of the BFS loop issues an individual GetMarker DB call (line 420). For deep marker chains (e.g., depth 20000 with markers every 100 levels = ~200 markers), this results in ~200 sequential queries. A batch approach using GetMarkersByIds on the current queue batch would be significantly faster.

♻️ Sketch of batched BFS
 	for len(queue) > 0 {
-		currentID := queue[0]
-		queue = queue[1:]
-
-		marker, err := i.repoManager.Markers().GetMarker(ctx, currentID)
-		if err != nil || marker == nil {
+		// Fetch all markers in current queue batch at once
+		batch := queue
+		queue = nil
+		markers, err := i.repoManager.Markers().GetMarkersByIds(ctx, batch)
+		if err != nil {
 			continue
 		}
-
-		for _, parentID := range marker.ParentMarkerIDs {
-			if !visited[parentID] {
-				visited[parentID] = true
-				markerIDs = append(markerIDs, parentID)
-				queue = append(queue, parentID)
+		for _, marker := range markers {
+			for _, parentID := range marker.ParentMarkerIDs {
+				if !visited[parentID] {
+					visited[parentID] = true
+					markerIDs = append(markerIDs, parentID)
+					queue = append(queue, parentID)
+				}
 			}
 		}
 	}

465-474: Cache is mutated via the cache map parameter — document this side effect.

getVtxosFromCacheOrDB updates the caller's map in-place (line 473). This is correct for the current usage pattern, but the mutation is non-obvious. A brief doc note on the side effect would improve maintainability.

internal/infrastructure/db/postgres/migration/20260210100000_add_depth_and_markers.up.sql (2)

73-83: Correlated EXISTS subquery in the view may degrade as swept_marker grows.

The vtxo_vw view computes swept via EXISTS (SELECT 1 FROM swept_marker sm WHERE v.markers @> jsonb_build_array(sm.marker_id)). This scans swept_marker for each VTXO row. While the GIN index on markers helps the containment check, this is effectively a semi-join where the outer side (swept_marker) is iterated per-vtxo. As the number of swept markers grows, this scan may become expensive for queries that touch many VTXOs.

Consider whether a reverse lookup (joining vtxo markers against swept_marker PK) or a materialized approach would scale better for your expected data volumes.


25-30: Remove intermediate view creation — they are dropped and recreated without ever being used.

The views created at lines 25-30 and 32-40 are dropped at lines 65-66 before being recreated at lines 73-93. The backfill queries (lines 44-62) query vtxo directly, so these intermediate views are never referenced and can be removed to simplify the migration.

♻️ Simplified migration flow
-- Drop views before dropping the swept column (views depend on it via v.*)
-DROP VIEW IF EXISTS intent_with_inputs_vw;
-DROP VIEW IF EXISTS vtxo_vw;
-
-CREATE VIEW vtxo_vw AS
-SELECT v.*, string_agg(vc.commitment_txid, ',') AS commitments
-FROM vtxo v
-LEFT JOIN vtxo_commitment_txid vc
-ON v.txid = vc.vtxo_txid AND v.vout = vc.vtxo_vout
-GROUP BY v.txid, v.vout;
-
-CREATE VIEW intent_with_inputs_vw AS
-SELECT vtxo_vw.*,
-       intent.id,
-       intent.round_id,
-       intent.proof,
-       intent.message
-FROM intent
-LEFT OUTER JOIN vtxo_vw
-ON intent.id = vtxo_vw.intent_id;
-
 -- Backfill: Create a marker for every existing VTXO using its outpoint as marker ID
internal/infrastructure/db/sqlite/migration/20260210000000_add_depth_and_markers.up.sql (1)

22-41: Intermediate view recreation appears unused — backfill queries reference vtxo directly.

Same as the Postgres migration: the views created at lines 26-41 are dropped again at lines 96-97 without being referenced by the backfill statements (lines 45-62). They add migration complexity without benefit.

internal/infrastructure/db/postgres/marker_repo.go (2)

249-274: TOCTOU between count query and sweep insert in SweepVtxosByMarker.

CountUnsweptVtxosByMarkerId (line 260) and InsertSweptMarker (line 266) are not atomic. The returned count may not reflect the actual number of VTXOs affected by the sweep. Since the count is only used for logging/metrics, this isn't a correctness issue, but worth noting.


426-436: Silent error swallowing in parseMarkersJSONB — consider logging.

Unmarshal errors at line 432 are silently swallowed. If corrupted marker JSON ends up in the database, this would silently produce nil marker IDs, making affected VTXOs invisible to marker-based queries. A debug-level log would aid troubleshooting without adding noise.

internal/core/application/service_test.go (1)

562-567: outputs[0].MarkerIDs is re-sorted on every loop iteration.

The sort at line 563 mutates outputs[0].MarkerIDs in-place on each iteration. Move it before the loop.

♻️ Minor optimization
 			// All outputs must have the same marker IDs
+			sort.Strings(outputs[0].MarkerIDs)
 			for i := 1; i < len(outputs); i++ {
-				sort.Strings(outputs[0].MarkerIDs)
 				sort.Strings(outputs[i].MarkerIDs)
 				require.Equal(t, outputs[0].MarkerIDs, outputs[i].MarkerIDs,
 					"output %d has different markers than output 0", i)
 			}
internal/infrastructure/db/badger/vtxo_repo.go (2)

23-26: Duplicate accessors: GetStore() and Store() return the same value.

Both methods on Lines 23-26 and Lines 421-424 return r.store with identical signatures. Pick one and remove the other to avoid confusion about which to call.

Also applies to: 421-424


625-637: Redundant visited check in GetSweepableVtxosByCommitmentTxid.

Line 627 checks !visited[outpointKey], and Line 628 checks !seen on the same key. Since visited maps to bool, !visited[key] is true iff the key is absent (zero-value false), making the inner check always true when reached. This also means Line 633-635 (enqueue ArkTxid) is unreachable for already-visited outpoints — which is correct — but the double-check is confusing. Compare with the cleaner pattern in GetAllChildrenVtxos (Lines 668-676).

Simplify to match GetAllChildrenVtxos pattern
 		for _, vtxo := range vtxos {
 			outpointKey := vtxo.Outpoint.String()
-			if !visited[outpointKey] {
-				if _, seen := visited[outpointKey]; !seen {
-					visited[outpointKey] = true
-					outpoints = append(outpoints, vtxo.Outpoint)
-				}
-
-				if vtxo.ArkTxid != "" {
-					queue = append(queue, vtxo.ArkTxid)
-				}
+			if !visited[outpointKey] {
+				visited[outpointKey] = true
+				outpoints = append(outpoints, vtxo.Outpoint)
+				if vtxo.ArkTxid != "" {
+					queue = append(queue, vtxo.ArkTxid)
+				}
 			}
 		}

Comment thread internal/core/application/service_test.go Outdated
Comment thread internal/infrastructure/db/badger/marker_repo.go Outdated
Comment thread internal/infrastructure/db/badger/marker_repo.go
Comment thread internal/infrastructure/db/postgres/marker_repo.go
Comment thread internal/infrastructure/db/service.go
Comment thread internal/infrastructure/db/service.go
Comment thread internal/infrastructure/db/service.go Outdated
Comment thread internal/infrastructure/db/sqlite/sqlc/query.sql
# Conflicts:
#	internal/interface/grpc/handlers/parser_test.go
@arkanaai
Copy link
Copy Markdown
Contributor

arkanaai Bot commented Mar 21, 2026

👋 @bitcoin-coder-bob — this PR has been open for 39 days with outstanding change requests from @louisinger. Last push was over 30 days ago. Is this still being worked on, or is it blocked on something? Let us know if we can help unblock.

@arkanaai
Copy link
Copy Markdown
Contributor

arkanaai Bot commented Mar 23, 2026

🔍 Arkana PR Review (incremental — new commits since last review)

Changes since last review: Added comprehensive benchmark suite (indexer_bench_test.go) and new unit tests for fanout, diamond, marker-boundary, and overlapping-marker scenarios.

Assessment: ✅ Excellent test additions

Benchmarks (indexer_bench_test.go):

  • Covers linear chains (with/without markers), binary fanout trees, and diamond convergence patterns
  • Lightweight fake repos avoid testify/mock overhead — good choice for benchmarks
  • benchCheckpointPSBT creates real PSBT objects, so the benchmark captures actual serialization costs
  • The BenchmarkCheckpointPSBTParse microbenchmark isolates the PSBT decode cost, which the PR body identifies as the remaining bottleneck — useful for tracking optimization progress

New unit tests (indexer_test.go):

  • TestGetVtxoChain_Fanout — verifies correct traversal of 2-branch checkpoint fan-outs. Validates both item count and type distribution.
  • TestGetVtxoChain_Diamond — tests convergence deduplication (D appears exactly once despite two paths reaching it). Critical correctness test for the DAG traversal.
  • TestGetVtxoChain_MarkerBoundaryStart — validates marker preload at depth 0 with no parent markers. Asserts only 1 GetVtxos call (frontier fetch).
  • TestGetVtxoChain_OverlappingMarkers — tests deduplication when a VTXO has multiple markers and one is a parent of another. Correctly asserts only 1 batch of marker fetches.

The matchOutpoints helper is a solid addition — order-independent matching for outpoint slices, consistent with the existing matchIDs pattern.

No security concerns in this increment — all changes are test/bench code. The core DAG traversal and marker logic were reviewed in the previous pass.

@arkanaai
Copy link
Copy Markdown
Contributor

arkanaai Bot commented Mar 28, 2026

Hey @bitcoin-coder-bob@louisinger requested changes on Mar 16. Any update on this? Has the feedback been addressed in recent commits?

@arkanaai
Copy link
Copy Markdown
Contributor

arkanaai Bot commented Apr 2, 2026

Iterative review — new commit 453417ea (offchainTxCache race fix)

This commit addresses a real race condition: the DB projection updates asynchronously, so GetVtxos could return stale spendable VTXOs that are already pending-spent in an in-flight offchain tx. Clients would then build duplicate transactions.

The fix:

  • Adds offchainTxCache ports.OffChainTxStore to indexerService
  • In GetVtxos, after fetching from DB, cross-checks each unspent VTXO against the live offchain tx cache
  • If Includes() returns true, marks it Spent = true before returning

Review notes:

  1. Correctness ✅ — The check only marks VTXOs as spent, never unmarks them. This is safe: worst case is a VTXO appears spent slightly before the projection catches up, which is the conservative direction.

  2. Error handlingspent, _ := i.offchainTxCache.Includes(...) silently swallows errors. On error, spent defaults to false, meaning the VTXO remains spendable. This is the right default (fail-open for reads), but consider logging the error at debug level so cache failures are observable.

  3. Nil guard ✅ — if i.offchainTxCache != nil correctly handles the case where the live store isn't available.

  4. Config wiring ✅ — config.go correctly gates on c.liveStore != nil before accessing OffchainTxs().

  5. Performance — The loop iterates all returned VTXOs and calls Includes() per VTXO. For typical page sizes this is fine. If GetVtxos ever returns thousands, a batch Includes method would be more efficient, but that's a future optimization.

  6. Cross-repo note — SDK clients that cache VTXO lists locally should be aware that the server now reflects pending-spent status faster. This shouldn't break anything (it's strictly more accurate), but worth mentioning in release notes.

Good fix for a subtle but important race. 👍

@arkanaai
Copy link
Copy Markdown
Contributor

arkanaai Bot commented Apr 9, 2026

Changes were requested 6+ days ago. @bitcoin-coder-bob need any help addressing the feedback?

)

* bulk-fetch offchain txs in walkVtxoChain to reduce DB round-trips

* Add benchmark and test for bulk offchain tx fetch improvement

* index idx_checkpoint_tx_offchain_txid

* testing our performance times for GetOffchainTxsByTxids

* chunk SQLite bulk offchain tx fetch + multi-txid test

* lint

* preload offchain txs via marker DAG in walkVtxoChain, nbxplorer version bump

* offchainTxRepo check, may be nil in test helpers

* move walkVtxoChain timing out of prod code into in-process test

* clarify simulated latency in timing breakdown and panic on unwired repo accessors
Copy link
Copy Markdown
Contributor

@arkanaai arkanaai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Arkana PR Review — arkd#908 (incremental)

New commit since last review: 69879d04 — bulk-fetch offchain txs in walkVtxoChain (#1005). Plus 40ff9f82 (merge master, no logic).

This commit eliminates the remaining O(N) bottleneck I flagged in previous reviews: per-hop GetOffchainTx calls during chain traversal. Now offchain txs are bulk-loaded alongside VTXOs during the marker preload phase, and lazily bulk-fetched per-iteration for cache misses. Good.

Findings

1. 🟡 Badger GetOffchainTxsByTxids — fragile error matching
internal/infrastructure/db/badger/ark_repo.go — the new method uses strings.Contains(err.Error(), "not found") to skip missing txids. This is brittle: if the error message in getOffchainTx ever changes (or a wrapped error incidentally contains "not found"), behavior silently changes. Use errors.Is with a sentinel error, or at minimum errors.As with a typed error. The underlying getOffchainTx already checks badgerhold.ErrNotFound — expose that check properly rather than string-matching on the formatted message.

2. ✅ Postgres/SQLite bulk queries — safe

  • Postgres: txid = ANY($1::varchar[]) with pq.Array(txids) — parameterized, no injection risk.
  • SQLite: sqliteMaxBulkTxids = 500 correctly stays under SQLITE_MAX_VARIABLE_NUMBER. Batching loop is clean.
  • Both filter COALESCE(fail_reason, '') = '' consistently with the single-fetch query. Good.

3. ✅ New index migration
20260409140000_checkpoint_tx_offchain_txid_index — adds idx_checkpoint_tx_offchain_txid on checkpoint_tx(offchain_txid). This supports the JOIN inside offchain_tx_vw when bulk-fetching. Correct optimization — the view was already doing this join, just without the index to back the multi-row case.

4. ✅ Preload error semantics
preloadByMarkers hard-fails if GetOffchainTxsByTxids errors, while marker window errors log-and-continue. This asymmetry is actually correct: marker cache misses degrade to per-VTXO lookups (functional), but offchain tx failures would produce incomplete chain data (incorrect). Right call.

5. ✅ Cache key consistency
Offchain tx cache uses tx.ArkTxid as key, lookups use vtxo.Txid. These are the same field semantically (the offchain tx's ark txid is the VTXO's parent txid). Verified in domain model.

6. ✅ Fallback path
When bulk fetch returns empty or misses a txid, the walk loop falls back to individual GetOffchainTx and populates the cache (offchainTxCache[vtxo.Txid] = offchainTx). Correct — handles both pre-optimization repos and race conditions where a tx is inserted after the preload.

7. ✅ Test coverage

  • TestBulkOffchainTxReducesDBCalls — fanout tree, proves bulk calls = O(depth) vs individual calls = O(N). Correct assertion.
  • TestVtxoChainTimingBreakdown — 10k-chain wall-clock breakdown with simulated latency. Useful diagnostic.
  • BenchmarkOffchainTxBulkVsSingle — direct comparison with simulated 50µs latency. Good.
  • preconfirmed chain bulk-loads offchain txs + falls back to single fetch on cache miss — unit tests for both the happy and fallback paths.
  • DB integration test in service_test.go — multi-txid fetch with cross-contamination guard. Thorough.

8. Nit: docker-compose.regtest.yml bumps nbxplorer 2.5.302.5.30-1. Unrelated — should probably be a separate commit.

9. Security: No impact. This is purely a read-path optimization. No changes to exit paths, forfeit validation, transaction signing, or sweep correctness.

Summary

Only finding #1 (badger string-matching) is worth addressing — it's a latent bug vector. Everything else is clean. The optimization is well-structured with proper fallback paths and strong test coverage.

@bitcoin-coder-bob
Copy link
Copy Markdown
Collaborator Author

@arkanaai @coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

@bitcoin-coder-bob I'll trigger a review of the latest changes, including the milliseconds fix added in commit 86f9190.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@arkanaai
Copy link
Copy Markdown
Contributor

arkanaai Bot commented Apr 16, 2026

Arkana Review — Scale the DAG (#908)

Impressive engineering here. The marker DAG concept is sound and the benchmarks are compelling (20,000 → ~200 DB round-trips for a 10k chain). I went through the implementation carefully. A few issues need attention before merge, ranging from protocol-critical to minor.


🔴 Critical: Sweep over-reach (protocol-critical)

File: internal/core/application/sweeper.go

The sweep path changed from targeting specific VTXOs by outpoint to sweeping entire marker windows:

// OLD — precise: sweeps exactly childrenVtxos
_, err = s.repoManager.Vtxos().SweepVtxos(ctx, childrenVtxos)

// NEW — sweeps ALL VTXOs tagged with each marker, regardless of whether they're in childrenVtxos
markerStore.BulkSweepMarkers(ctx, markerIDs, sweptAt)

A marker covers a 100-depth window across the entire VTXO DAG — not just the subtree being swept. If VTXO-A and VTXO-B share a marker window but only VTXO-A's descendants are being swept in this checkpoint, sweeping the marker will incorrectly mark VTXO-B's descendants as swept too.

This could cause real fund loss: VTXOs incorrectly marked swept won't be claimable by users.

Fix needed: Either ensure markers are always subtree-scoped (never shared across independent VTXO subtrees), or track sweep at finer granularity. The migration backfill uses one marker per VTXO for legacy data — does the forward-creation path guarantee that a marker window is always contained within a single redemption subtree?


🔴 Critical: Unsigned pagination cursor enables auth bypass

File: internal/core/application/indexer.go, encodeChainCursor()

The cursor is plain base64-encoded JSON with no HMAC or signature:

func encodeChainCursor(frontier []domain.Outpoint) string {
    data, _ := json.Marshal(cur)
    return base64.RawURLEncoding.EncodeToString(data)
}

Auth validation in GetVtxoChain checks the token against vtxoKey (the original request outpoint), but BFS execution uses the frontier decoded from pageToken. In exposureWithheld/exposurePrivate modes, a client with a valid token for their own VTXO can craft a cursor containing arbitrary outpoints and walk unrelated parts of the chain without proving ownership.

Fix: HMAC the cursor with a server-side secret, or validate that each outpoint in a decoded cursor is reachable from the original vtxoKey.


🟡 Medium: Silent sweep failure when no markers present

File: internal/core/application/sweeper.go

if len(uniqueMarkers) == 0 {
    return nil  // silent no-op
}

If a VTXO has no markers (possible for legacy VTXOs despite the migration backfill, or if marker assignment failed), the sweep silently does nothing. The old code would have swept those VTXOs regardless. A log.Warn at minimum; ideally a fallback to the old SweepVtxos path for VTXOs with empty MarkerIDs.


🟡 Medium: isActive checks only one outpoint (fragile)

File: internal/core/application/token_cache.go

func (c *tokenCache) isActive(hash string) bool {
    for _, expiresAt := range outpoints {
        return time.Now().Before(expiresAt)  // returns on first map iteration
    }
    return false
}

Map iteration order is randomized in Go. If a hash ever has outpoints with different expiries this silently picks one at random. Currently add() sets all outpoints to the same expiry and touch() extends all of them uniformly, so this holds in practice — but it's fragile. Consider iterating all entries and checking all or any explicitly with a comment noting the invariant.


🟡 Medium: Preload ignores pageSize — memory unbounded per request

File: internal/core/application/indexer.go, walkVtxoChain()

preloadByMarkers eagerly loads the full chain into memory regardless of pageSize. A paginated request for the first 100 entries of a 10,000-VTXO chain still loads all 10,000 VTXOs. The PR notes this explicitly, but it means pagination doesn't actually reduce server memory pressure — it only reduces response size. For production workloads with many concurrent paginated requests on deep chains, this could be a memory cliff.

Worth at minimum adding a hard cap (e.g., MaxPreloadDepth) or making the preload lazy once the cursor approach is proven.


🟡 Medium: GetVtxoChainByIntent uses old-style pagination

GetVtxoChain returns NextPageToken (cursor-based). GetVtxoChainByIntent still uses paginate() with Page/PageResp (offset-based). These have different semantics and different response shapes. Client libraries will need to handle both differently. Is this intentional? If GetVtxoChainByIntent is only ever called for full chains (math.MaxInt32), fine — but the interface inconsistency is surprising.


🟢 Minor: Possible unused log import in indexer.go

The diff adds log "github.com/sirupsen/logrus" to indexer.go but I don't see any log. calls in the new indexer code. If it's unused this will fail to compile. Please verify.


✅ What looks good

  • touch() / isActive() design is clean and the TTL extension logic is correct for long pagination sessions.
  • verifyAuthTokenSignature correctly separates signature verification from timestamp checking — good separation of concerns.
  • Marker DAG domain model is well-designed with clear boundary detection (depth % MarkerInterval == 0).
  • Bulk offchain tx fallback (in-loop GetOffchainTxsByTxids) correctly handles VTXOs outside marker coverage.
  • GIN indexes on vtxo.markers and marker.parent_markers are the right index types for JSONB containment queries.
  • Down migrations present — good.
  • Test coverage is solid across domain, repo, and service layers.

The sweep over-reach and unsigned cursor are the blockers. Happy to look at revised versions of those two pieces.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
internal/core/application/service.go (1)

364-381: ⚠️ Potential issue | 🟠 Major

Reject partial parent loads before deriving child metadata.

This block assumes GetVtxos returned every parent. If the repository only returns a subset, the emitted newVtxos get a shallower Depth and may miss inherited Swept state. Mirror the completeness check already used in SubmitOffchainTx and bail out on partial reads.

🛠️ Suggested guard
 			spentVtxos, err := s.repoManager.Vtxos().GetVtxos(
 				context.Background(), spentVtxoKeys,
 			)
 			if err != nil {
 				log.WithError(err).Warn("failed to get spent vtxos")
 				return
 			}
+			if len(spentVtxos) != len(spentVtxoKeys) {
+				log.WithFields(log.Fields{
+					"expected": len(spentVtxoKeys),
+					"got":      len(spentVtxos),
+					"txid":     txid,
+				}).Warn("failed to load all spent vtxos")
+				return
+			}
 
 			// Calculate depth for new vtxos: max(parent depths) + 1
 			var maxDepth uint32
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/application/service.go` around lines 364 - 381, The code
assumes repoManager.Vtxos().GetVtxos returned all parents but doesn't verify
completeness: before computing maxDepth and setting newVtxos[].Depth, check that
the returned spentVtxos length matches the requested spentVtxoKeys and bail out
(return early with a warning/error) on partial reads, similar to the
completeness guard used in SubmitOffchainTx; use the same pattern around
GetVtxos / spentVtxoKeys / spentVtxos to detect missing parents and avoid
deriving incorrect Depth or Swept state for newVtxos.
internal/core/application/indexer.go (1)

708-716: ⚠️ Potential issue | 🟠 Major

Whitelist checkpoint txids when building chain auth tokens.

createAuthToken(allOutpoints) is what later authorizes GetVirtualTxs, but this branch never adds checkpoint txids to allOutpoints. That means private/withheld clients can receive a checkpoint txid from GetVtxoChain and then get rejected when they try to fetch the corresponding virtual tx.

Suggested fix
 					txid := ptx.UnsignedTx.TxID()
 					checkpointTxs = append(checkpointTxs, ChainTx{
 						Txid:      txid,
 						ExpiresAt: vtxo.ExpiresAt,
 						Type:      IndexerChainedTxTypeCheckpoint,
 						Spends:    []string{ptx.UnsignedTx.TxIn[0].PreviousOutPoint.String()},
 					})
+					allOutpoints = append(allOutpoints, Outpoint{
+						Txid: txid,
+						VOut: 0,
+					})
 
 					chainTx.Spends = append(chainTx.Spends, txid)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/application/indexer.go` around lines 708 - 716, This branch
builds checkpoint ChainTxs (see checkpointTxs append and chainTx.Spends update)
but never includes checkpoint txids in the allOutpoints set used by
createAuthToken(allOutpoints), so clients can be authorized to see chain entries
but then rejected when calling GetVirtualTxs; update the same branch that
constructs checkpointTxs (where txid := ptx.UnsignedTx.TxID()) to also add that
txid to the allOutpoints collection (or a dedicated whitelist passed into
createAuthToken) so createAuthToken(allOutpoints) includes checkpoint txids and
authorizes GetVirtualTxs for those checkpoint entries.
internal/infrastructure/db/service.go (2)

507-548: ⚠️ Potential issue | 🟠 Major

Stop after retry exhaustion instead of continuing with partial projection state.

If any of these loops exhausts all 5 attempts, the handler falls through as if the projection update succeeded. That can leave the round partially projected, and CreateRootMarkersForVtxos can still run after AddVtxos never persisted anything.

🛠️ Make retry success explicit
-		for attempt := range maxProjectionRetry {
+		success := false
+		for attempt := 0; attempt < maxProjectionRetry; attempt++ {
 			if err := repo.SettleVtxos(ctx, spentVtxos, round.CommitmentTxid); err != nil {
 				log.WithError(err).Warnf(
 					"failed to spend vtxos (attempt %d/%d)", attempt+1, maxProjectionRetry,
 				)
 				time.Sleep(100 * time.Millisecond)
 				continue
 			}
 			log.Debugf("spent %d vtxos", len(spentVtxos))
+			success = true
 			break
 		}
+		if !success {
+			log.WithField("round_id", round.Id).Error("failed to settle vtxos after retries")
+			return
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/service.go` around lines 507 - 548, The retry
loops for repo.SettleVtxos, repo.AddVtxos, and
s.markerStore.CreateRootMarkersForVtxos currently swallow failures after
maxProjectionRetry and allow subsequent steps to run, leading to partial
projection state; change each loop (the one iterating over maxProjectionRetry
for spentVtxos and the two loops for newVtxos) to explicitly detect exhaustion
and return or propagate an error (using ctx/round.CommitmentTxid for context)
instead of breaking silently so upstream handler stops processing when
SettleVtxos or AddVtxos or CreateRootMarkersForVtxos ultimately fail; reference
the functions repo.SettleVtxos, repo.AddVtxos and
s.markerStore.CreateRootMarkersForVtxos and the maxProjectionRetry/sleep logic
to implement an explicit failure path after retries.

590-729: ⚠️ Potential issue | 🔴 Critical

Already-swept finalized outputs lose their sweep state in the marker model.

txSwept / outputSwept only sets domain.Vtxo.Swept, but sweep status is now derived from marker membership. On non-boundary depths domain.NewMarker(...) can return nil, so these VTXOs just inherit parentMarkerIDs; nothing here sweeps those markers unless the output is dust. That means expired/already-swept finalized outputs are persisted as active again.

The dust path has the same problem on failures: it still appends dustMarkerID after AddMarker fails, and BulkSweepMarkers only warns on error. In both cases you can end up with VTXOs that should be swept but are not representable as swept.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/service.go` around lines 590 - 729, The code fails
to persist sweep state into the marker model: when NewMarker returns nil
(inherited parentMarkerIDs) or when outputs are expired/swept
(txSwept/outputSwept), you must explicitly mark the involved markers as swept
instead of only setting domain.Vtxo.Swept; call
s.markerStore.BulkSweepMarkers(ctx, markerIDs, sweptAt) (or for per-vtxo dust
use vtxoMarkerIDs) before persisting vtxos whenever txSwept or outputSwept is
true; also only append dustMarkerID to createdDustMarkerIDs and to vtxoMarkerIDs
after s.markerStore.AddMarker successfully returns (do not append on AddMarker
error), and treat BulkSweepMarkers failures as a hard error (return) or retry so
you don’t persist vtxos whose swept state isn’t recorded (adjust control flow
around s.vtxoStore.AddVtxos and the existing BulkSweepMarkers call accordingly).
🧹 Nitpick comments (4)
internal/infrastructure/db/sqlite/sqlc/query.sql (2)

495-504: LIKE-based marker matching is safe but not indexable.

The comment correctly notes that LIKE is used because sqlc cannot parse json_each with view columns, and that txid:vout format IDs are safe. However, queries using LIKE '%"' || @marker_id || '"%' cannot leverage indexes on the markers column. For very large VTXO tables, consider documenting this limitation in the query comments or adding a future optimization note.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/sqlite/sqlc/query.sql` around lines 495 - 504, The
LIKE-based marker matching in the SelectVtxosByMarkerId and
CountUnsweptVtxosByMarkerId queries on view vtxo_vw (using markers LIKE '%"' ||
`@marker_id` || '"%') is not indexable and may perform poorly on large VTXO
tables; update the SQL comment for both query definitions to explicitly document
this limitation and add a short "Future optimization" note suggesting migrating
to a JSON-aware index/search (or rewriting to use json_each on the underlying
table) when sqlc/view support permits so future maintainers know why LIKE is
used and how to improve it.

262-281: Performance concern with json_each in liquidity queries.

Both SelectExpiringLiquidityAmount and SelectRecoverableLiquidityAmount use json_each(v.markers) in correlated subqueries. For large VTXO tables, this will parse JSON for every row. Consider adding a comment documenting expected performance characteristics, or evaluate adding a GIN index on the markers column if query times become problematic at scale.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/sqlite/sqlc/query.sql` around lines 262 - 281,
Both SelectExpiringLiquidityAmount and SelectRecoverableLiquidityAmount
repeatedly call json_each(v.markers) which forces JSON parsing per vtxo row and
will degrade at scale; add a short comment above these queries documenting the
performance risk and expected data sizes, and either (a) denormalize markers
into a dedicated vtxo_marker table and join on sm.marker_id with an index for
fast lookup, or (b) if your SQLite build supports a JSON-indexing extension, add
an appropriate index on the markers column; reference v.markers,
swept_marker.sm.marker_id, and the two query names when applying the change so
reviewers can find and validate the optimization.
internal/infrastructure/db/sqlite/vtxo_repo.go (1)

573-583: Silent error suppression in parseMarkersJSONFromVtxo may hide data corruption.

The function returns nil on unmarshal errors without logging or surfacing the issue. If the markers JSON becomes malformed (e.g., due to a bug or data corruption), this would silently degrade functionality without any indication.

Consider logging a warning when unmarshal fails:

♻️ Proposed fix to add error logging
 func parseMarkersJSONFromVtxo(markersJSON string) []string {
 	if markersJSON == "" {
 		return nil
 	}
 	var markerIDs []string
 	if err := json.Unmarshal([]byte(markersJSON), &markerIDs); err != nil {
+		// Log but don't fail - treat malformed markers as empty
+		// This shouldn't happen in normal operation
+		log.Warnf("failed to parse markers JSON: %v", err)
 		return nil
 	}
 	return markerIDs
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/sqlite/vtxo_repo.go` around lines 573 - 583, The
parseMarkersJSONFromVtxo function currently swallows json.Unmarshal errors;
change it to log a warning when unmarshal fails so data corruption is visible:
in parseMarkersJSONFromVtxo, when json.Unmarshal returns an error, call the
project's logging facility (e.g., log.Printf or the repository logger) to emit a
warning that includes the error and a truncated/sanitized markersJSON value (to
aid debugging without leaking sensitive data), then continue to return nil (or
an empty slice if preferred by callers). Ensure the log message references
parseMarkersJSONFromVtxo and includes the error and the markersJSON snippet.
internal/core/application/indexer_test.go (1)

374-377: The catch-all bulk-fetch mock masks the bulk preload optimization.

The On("GetOffchainTxsByTxids", mock.Anything, mock.Anything).Return([]*domain.OffchainTx{}, nil).Maybe() expectation is registered first with unlimited repeatability (from .Maybe()) and matches any arguments via mock.Anything. In testify/mock, expectations are evaluated in declaration order and the first match wins. Since this catch-all can match indefinitely, all subsequent GetOffchainTxsByTxids calls—regardless of arguments—will match this expectation first, and the specific per-test expectations are never reached.

This means tests set up specific bulk-fetch behaviors but never actually exercise them. The code still works because empty results trigger fallback to GetOffchainTx, but the optimization path is never verified. A regression in the bulk-fetch logic would go undetected.

Consider registering the catch-all with limited repeatability or moving it after specific test expectations so tests can verify the bulk path is actually used.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/application/indexer_test.go` around lines 374 - 377, The
catch-all mock for GetOffchainTxsByTxids uses mock.Anything plus .Maybe(), which
matches first and prevents per-test expectations from being exercised; change
the registration so it doesn't override specific expectations by either removing
.Maybe() and setting a limited repeat count (e.g., Once/Times) or moving this
generic On("GetOffchainTxsByTxids", mock.Anything, mock.Anything) call to after
tests register their specific expectations, ensuring the per-test mocks for
GetOffchainTxsByTxids are matched and the bulk preload optimization is actually
exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/core/application/indexer_bench_test.go`:
- Around line 621-630: The benchmark zeroes the repo counters each iteration
because repo.reset() is called inside the for loop, so b.ReportMetric reads only
the final-iteration counts; move repo.reset() to just before the for i := 0; i <
b.N; i++ loop (so repo.bulkCalls and repo.singleCalls accumulate across all
iterations) and then keep the existing
b.ReportMetric(float64(repo.bulkCalls.Load())/float64(b.N), ...) and singleCalls
reporting after b.StopTimer(); apply the same change for the other similar
benchmark block (the one around lines 640-649) to ensure metrics represent
calls/op across the whole run rather than the last iteration.

In `@internal/core/application/indexer.go`:
- Around line 356-357: The auth check currently only verifies vtxoKey via
validateChainAuth(authToken, vtxoKey, ...), allowing a client to supply a forged
decoded pageToken whose frontier contains other outpoints; update the code in
walkVtxoChain and the pageToken handling paths (where pageToken is decoded and
replaces vtxoKey) to validate every outpoint in the decoded frontier against the
token whitelist by calling validateChainAuth(authToken, frontierOutpoint, true)
(or a new helper that accepts an outpoint) for each frontier entry before using
it, or alternatively compute and verify an HMAC/signature over the serialized
cursor when generating and reading pageToken so cursors cannot be tampered with;
ensure the same change is applied to the other affected call sites referenced
(the continuation branch and the other pageToken usage blocks) so decoded
frontiers are never trusted without per-outpoint validation or cryptographic
binding to the auth token.

In `@internal/infrastructure/db/badger/marker_repo.go`:
- Around line 258-266: The current code does a best-effort update of VTXO DTOs
(using vtxoStore.Find -> vtxoStore.Update on filteredDtos / vtxoDTO) after
inserting swept_marker, but readers still return the stale dto.Swept value if
those updates fail; fix by either making the VTXO updates part of the write
contract (perform the vtxoStore.Update calls inside the same transactional
operation and propagate/return errors from the marker insert/update so failures
abort) or stop storing Swept in the DTO and compute it dynamically on read
(modify the Badger getters that read dto.Vtxo/dto.Swept to consult
sweptMarkerStore for marker existence and set Swept accordingly), and apply the
same change for the other similar block at 269-284.

In `@internal/infrastructure/db/postgres/marker_repo.go`:
- Around line 442-451: parseMarkersJSONB currently returns nil on JSON unmarshal
errors which silently treats corrupted marker payloads as "no markers"; change
parseMarkersJSONB to return ([]string, error), return an explicit error when
json.Unmarshal fails (and return an empty slice for empty input if desired), and
update all call sites that call parseMarkersJSONB to handle the error (propagate
or fail fast) instead of assuming a nil/empty slice; search for
parseMarkersJSONB references to update their signatures and error handling
accordingly.
- Around line 233-246: The UpdateVtxoMarkers function can marshal a nil
markerIDs slice to JSON null which breaks JSON-array membership checks; modify
markerRepository.UpdateVtxoMarkers so that before marshalling you normalize nil
marker slices to an empty slice (e.g. if markerIDs == nil set markerIDs =
[]string{} or marshal an explicit empty slice) and then marshal to JSON and pass
that byte array into queries.UpdateVtxoMarkersParams.Markers so the DB always
stores "[]" for no markers instead of "null".

In
`@internal/infrastructure/db/postgres/migration/20260219082956_fix_vtxo_vw_perf.up.sql`:
- Around line 23-26: The current EXISTS subquery scans swept_marker for each v
row; replace it by unnesting v.markers (e.g. using
jsonb_array_elements_text(v.markers) or a LATERAL join) and directly joining the
resulting marker values to swept_marker.marker_id so the planner can use the
swept_marker index; update the view to compute swept as a boolean from that join
(e.g. existence via JOIN + DISTINCT or boolean aggregate like bool_or)
referencing v.markers, swept_marker.marker_id and the swept_marker table instead
of the current EXISTS(...) pattern.

In `@internal/infrastructure/db/postgres/sqlc/queries/query.sql.go`:
- Around line 118-131: The recursive CTE getDescendantMarkerIds can emit
duplicates because it uses UNION ALL; change the CTE to use set semantics
(replace UNION ALL with UNION) or deduplicate at the end (use SELECT DISTINCT on
descendant_markers.id) so each marker id appears only once; update the CTE named
descendant_markers / query named GetDescendantMarkerIds accordingly to prevent
duplicate marker rows from converging DAG paths.
- Around line 106-115: The query in the constant countUnsweptVtxosByMarkerId
(used by Queries.CountUnsweptVtxosByMarkerId) uses COUNT(*) which double-counts
VTXOs when vtxo_vw expands assets; change the SQL to count distinct outpoints
instead (e.g. SELECT COUNT(DISTINCT outpoint) FROM vtxo_vw ...), or if the view
does not expose a single outpoint column, use COUNT(DISTINCT (txid, vout)) or
equivalent to deduplicate by transaction id + output index, then keep the rest
of the function (QueryRowContext + Scan into int64) unchanged.

In `@internal/infrastructure/db/postgres/sqlc/query.sql`:
- Around line 475-488: GetDescendantMarkerIds' recursive CTE can cycle; modify
the CTE (descendant_markers) to carry a visited set/path (e.g., an array or
jsonb of marker ids) and only recurse into rows whose id is not already in that
visited set, appending the new id to visited on each recursion; keep the same
final SELECT filtering out ids in swept_marker and preserve the base case
(SELECT marker.id) but initialize visited with that id so cycles like A->B->A
are prevented.

---

Outside diff comments:
In `@internal/core/application/indexer.go`:
- Around line 708-716: This branch builds checkpoint ChainTxs (see checkpointTxs
append and chainTx.Spends update) but never includes checkpoint txids in the
allOutpoints set used by createAuthToken(allOutpoints), so clients can be
authorized to see chain entries but then rejected when calling GetVirtualTxs;
update the same branch that constructs checkpointTxs (where txid :=
ptx.UnsignedTx.TxID()) to also add that txid to the allOutpoints collection (or
a dedicated whitelist passed into createAuthToken) so
createAuthToken(allOutpoints) includes checkpoint txids and authorizes
GetVirtualTxs for those checkpoint entries.

In `@internal/core/application/service.go`:
- Around line 364-381: The code assumes repoManager.Vtxos().GetVtxos returned
all parents but doesn't verify completeness: before computing maxDepth and
setting newVtxos[].Depth, check that the returned spentVtxos length matches the
requested spentVtxoKeys and bail out (return early with a warning/error) on
partial reads, similar to the completeness guard used in SubmitOffchainTx; use
the same pattern around GetVtxos / spentVtxoKeys / spentVtxos to detect missing
parents and avoid deriving incorrect Depth or Swept state for newVtxos.

In `@internal/infrastructure/db/service.go`:
- Around line 507-548: The retry loops for repo.SettleVtxos, repo.AddVtxos, and
s.markerStore.CreateRootMarkersForVtxos currently swallow failures after
maxProjectionRetry and allow subsequent steps to run, leading to partial
projection state; change each loop (the one iterating over maxProjectionRetry
for spentVtxos and the two loops for newVtxos) to explicitly detect exhaustion
and return or propagate an error (using ctx/round.CommitmentTxid for context)
instead of breaking silently so upstream handler stops processing when
SettleVtxos or AddVtxos or CreateRootMarkersForVtxos ultimately fail; reference
the functions repo.SettleVtxos, repo.AddVtxos and
s.markerStore.CreateRootMarkersForVtxos and the maxProjectionRetry/sleep logic
to implement an explicit failure path after retries.
- Around line 590-729: The code fails to persist sweep state into the marker
model: when NewMarker returns nil (inherited parentMarkerIDs) or when outputs
are expired/swept (txSwept/outputSwept), you must explicitly mark the involved
markers as swept instead of only setting domain.Vtxo.Swept; call
s.markerStore.BulkSweepMarkers(ctx, markerIDs, sweptAt) (or for per-vtxo dust
use vtxoMarkerIDs) before persisting vtxos whenever txSwept or outputSwept is
true; also only append dustMarkerID to createdDustMarkerIDs and to vtxoMarkerIDs
after s.markerStore.AddMarker successfully returns (do not append on AddMarker
error), and treat BulkSweepMarkers failures as a hard error (return) or retry so
you don’t persist vtxos whose swept state isn’t recorded (adjust control flow
around s.vtxoStore.AddVtxos and the existing BulkSweepMarkers call accordingly).

---

Nitpick comments:
In `@internal/core/application/indexer_test.go`:
- Around line 374-377: The catch-all mock for GetOffchainTxsByTxids uses
mock.Anything plus .Maybe(), which matches first and prevents per-test
expectations from being exercised; change the registration so it doesn't
override specific expectations by either removing .Maybe() and setting a limited
repeat count (e.g., Once/Times) or moving this generic
On("GetOffchainTxsByTxids", mock.Anything, mock.Anything) call to after tests
register their specific expectations, ensuring the per-test mocks for
GetOffchainTxsByTxids are matched and the bulk preload optimization is actually
exercised.

In `@internal/infrastructure/db/sqlite/sqlc/query.sql`:
- Around line 495-504: The LIKE-based marker matching in the
SelectVtxosByMarkerId and CountUnsweptVtxosByMarkerId queries on view vtxo_vw
(using markers LIKE '%"' || `@marker_id` || '"%') is not indexable and may perform
poorly on large VTXO tables; update the SQL comment for both query definitions
to explicitly document this limitation and add a short "Future optimization"
note suggesting migrating to a JSON-aware index/search (or rewriting to use
json_each on the underlying table) when sqlc/view support permits so future
maintainers know why LIKE is used and how to improve it.
- Around line 262-281: Both SelectExpiringLiquidityAmount and
SelectRecoverableLiquidityAmount repeatedly call json_each(v.markers) which
forces JSON parsing per vtxo row and will degrade at scale; add a short comment
above these queries documenting the performance risk and expected data sizes,
and either (a) denormalize markers into a dedicated vtxo_marker table and join
on sm.marker_id with an index for fast lookup, or (b) if your SQLite build
supports a JSON-indexing extension, add an appropriate index on the markers
column; reference v.markers, swept_marker.sm.marker_id, and the two query names
when applying the change so reviewers can find and validate the optimization.

In `@internal/infrastructure/db/sqlite/vtxo_repo.go`:
- Around line 573-583: The parseMarkersJSONFromVtxo function currently swallows
json.Unmarshal errors; change it to log a warning when unmarshal fails so data
corruption is visible: in parseMarkersJSONFromVtxo, when json.Unmarshal returns
an error, call the project's logging facility (e.g., log.Printf or the
repository logger) to emit a warning that includes the error and a
truncated/sanitized markersJSON value (to aid debugging without leaking
sensitive data), then continue to return nil (or an empty slice if preferred by
callers). Ensure the log message references parseMarkersJSONFromVtxo and
includes the error and the markersJSON snippet.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 27894188-09fc-45f9-8457-7b9e080f0e53

📥 Commits

Reviewing files that changed from the base of the PR and between dc70231 and a266382.

⛔ Files ignored due to path filters (2)
  • api-spec/protobuf/gen/ark/v1/indexer.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • api-spec/protobuf/gen/ark/v1/types.pb.go is excluded by !**/*.pb.go, !**/gen/**
📒 Files selected for processing (55)
  • api-spec/openapi/swagger/ark/v1/indexer.openapi.json
  • api-spec/openapi/swagger/ark/v1/service.openapi.json
  • api-spec/openapi/swagger/ark/v1/types.openapi.json
  • api-spec/protobuf/ark/v1/indexer.proto
  • api-spec/protobuf/ark/v1/types.proto
  • docker-compose.regtest.yml
  • internal/config/config.go
  • internal/core/application/indexer.go
  • internal/core/application/indexer_bench_test.go
  • internal/core/application/indexer_exposure_test.go
  • internal/core/application/indexer_test.go
  • internal/core/application/service.go
  • internal/core/application/sweeper.go
  • internal/core/application/sweeper_test.go
  • internal/core/application/token_cache.go
  • internal/core/application/types.go
  • internal/core/application/utils.go
  • internal/core/application/utils_test.go
  • internal/core/domain/marker.go
  • internal/core/domain/marker_test.go
  • internal/core/domain/offchain_tx.go
  • internal/core/domain/offchain_tx_event.go
  • internal/core/domain/offchain_tx_repo.go
  • internal/core/domain/offchain_tx_test.go
  • internal/core/domain/vtxo.go
  • internal/infrastructure/db/badger/ark_repo.go
  • internal/infrastructure/db/badger/marker_repo.go
  • internal/infrastructure/db/badger/vtxo_repo.go
  • internal/infrastructure/db/postgres/marker_repo.go
  • internal/infrastructure/db/postgres/migration/20260210100000_add_depth_and_markers.up.sql
  • internal/infrastructure/db/postgres/migration/20260219082956_fix_vtxo_vw_perf.up.sql
  • internal/infrastructure/db/postgres/migration/20260409140000_checkpoint_tx_offchain_txid_index.down.sql
  • internal/infrastructure/db/postgres/migration/20260409140000_checkpoint_tx_offchain_txid_index.up.sql
  • internal/infrastructure/db/postgres/offchain_tx_repo.go
  • internal/infrastructure/db/postgres/sqlc/queries/models.go
  • internal/infrastructure/db/postgres/sqlc/queries/query.sql.go
  • internal/infrastructure/db/postgres/sqlc/query.sql
  • internal/infrastructure/db/postgres/vtxo_repo.go
  • internal/infrastructure/db/service.go
  • internal/infrastructure/db/service_test.go
  • internal/infrastructure/db/sqlite/marker_repo.go
  • internal/infrastructure/db/sqlite/migration/20260210000000_add_depth_and_markers.up.sql
  • internal/infrastructure/db/sqlite/migration/20260219082956_fix_vtxo_vw_perf.up.sql
  • internal/infrastructure/db/sqlite/migration/20260409140000_checkpoint_tx_offchain_txid_index.down.sql
  • internal/infrastructure/db/sqlite/migration/20260409140000_checkpoint_tx_offchain_txid_index.up.sql
  • internal/infrastructure/db/sqlite/offchain_tx_repo.go
  • internal/infrastructure/db/sqlite/round_repo.go
  • internal/infrastructure/db/sqlite/sqlc/queries/models.go
  • internal/infrastructure/db/sqlite/sqlc/queries/query.sql.go
  • internal/infrastructure/db/sqlite/sqlc/query.sql
  • internal/infrastructure/db/sqlite/vtxo_repo.go
  • internal/interface/grpc/handlers/indexer.go
  • internal/interface/grpc/handlers/parser.go
  • internal/interface/grpc/handlers/parser_test.go
  • internal/test/e2e/vtxo_chain_test.go
✅ Files skipped from review due to trivial changes (10)
  • internal/infrastructure/db/sqlite/migration/20260409140000_checkpoint_tx_offchain_txid_index.up.sql
  • internal/infrastructure/db/postgres/migration/20260409140000_checkpoint_tx_offchain_txid_index.down.sql
  • internal/infrastructure/db/sqlite/migration/20260409140000_checkpoint_tx_offchain_txid_index.down.sql
  • internal/infrastructure/db/postgres/migration/20260409140000_checkpoint_tx_offchain_txid_index.up.sql
  • internal/core/domain/offchain_tx_event.go
  • docker-compose.regtest.yml
  • internal/infrastructure/db/sqlite/round_repo.go
  • api-spec/openapi/swagger/ark/v1/service.openapi.json
  • api-spec/protobuf/ark/v1/types.proto
  • internal/infrastructure/db/sqlite/offchain_tx_repo.go
🚧 Files skipped from review as they are similar to previous changes (10)
  • internal/interface/grpc/handlers/parser.go
  • api-spec/openapi/swagger/ark/v1/types.openapi.json
  • internal/core/application/utils.go
  • api-spec/protobuf/ark/v1/indexer.proto
  • internal/core/domain/marker.go
  • internal/infrastructure/db/postgres/sqlc/queries/models.go
  • internal/infrastructure/db/postgres/migration/20260210100000_add_depth_and_markers.up.sql
  • internal/core/application/sweeper_test.go
  • internal/infrastructure/db/postgres/vtxo_repo.go
  • internal/infrastructure/db/sqlite/marker_repo.go

Comment thread internal/core/application/indexer_bench_test.go
Comment thread internal/core/application/indexer.go
Comment on lines +258 to +266
// Update Swept field on VTXOs that contain this marker.
// This keeps the stored Swept field in sync for query compatibility.
// Errors here are non-fatal since swept_marker is already recorded.
var filteredDtos []vtxoDTO
if err := r.vtxoStore.Find(
&filteredDtos,
badgerhold.Where("MarkerIDs").Contains(markerID),
); err != nil {
return nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This update cannot be best-effort while reads still trust dto.Swept.

Once swept_marker is inserted, the source of truth says the VTXO is swept. But if any of these follow-up vtxoStore.Update(...) calls fail, the Badger getters in this file still return the stale stored Swept=false value because they read dto.Vtxo directly. That makes Badger diverge from the Postgres/SQLite behavior on transient write conflicts.

Either make the VTXO update part of the write contract here, or compute Swept dynamically from sweptMarkerStore on read.

Also applies to: 269-284

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/badger/marker_repo.go` around lines 258 - 266, The
current code does a best-effort update of VTXO DTOs (using vtxoStore.Find ->
vtxoStore.Update on filteredDtos / vtxoDTO) after inserting swept_marker, but
readers still return the stale dto.Swept value if those updates fail; fix by
either making the VTXO updates part of the write contract (perform the
vtxoStore.Update calls inside the same transactional operation and
propagate/return errors from the marker insert/update so failures abort) or stop
storing Swept in the DTO and compute it dynamically on read (modify the Badger
getters that read dto.Vtxo/dto.Swept to consult sweptMarkerStore for marker
existence and set Swept accordingly), and apply the same change for the other
similar block at 269-284.

Comment thread internal/infrastructure/db/postgres/marker_repo.go
Comment on lines +442 to +451
// parseMarkersJSONB parses a JSONB array into a slice of strings
func parseMarkersJSONB(markers json.RawMessage) []string {
if len(markers) == 0 {
return nil
}
var markerIDs []string
if err := json.Unmarshal(markers, &markerIDs); err != nil {
return nil
}
return markerIDs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't silently treat corrupt marker JSON as “no markers”.

Returning nil on unmarshal failure makes a bad markers payload behave like an empty marker set, which can break chain preloading and swept resolution without surfacing any error. This helper should return an error so callers can fail fast instead of silently changing semantics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/postgres/marker_repo.go` around lines 442 - 451,
parseMarkersJSONB currently returns nil on JSON unmarshal errors which silently
treats corrupted marker payloads as "no markers"; change parseMarkersJSONB to
return ([]string, error), return an explicit error when json.Unmarshal fails
(and return an empty slice for empty input if desired), and update all call
sites that call parseMarkersJSONB to handle the error (propagate or fail fast)
instead of assuming a nil/empty slice; search for parseMarkersJSONB references
to update their signatures and error handling accordingly.

Comment on lines +23 to +26
EXISTS (
SELECT 1 FROM swept_marker sm
WHERE v.markers @> jsonb_build_array(sm.marker_id)
) AS swept,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid scanning swept_marker for every VTXO row.

This EXISTS is written from the swept_marker side, so PostgreSQL has to test each swept marker against the outer row’s JSONB array. On a large swept_marker table that becomes an O(rows × swept_markers) check inside the view. Expanding v.markers and joining on sm.marker_id keeps the lookup on the indexed key instead.

Suggested rewrite
-  EXISTS (
-    SELECT 1 FROM swept_marker sm
-    WHERE v.markers @> jsonb_build_array(sm.marker_id)
-  ) AS swept,
+  EXISTS (
+    SELECT 1
+    FROM jsonb_array_elements_text(v.markers) AS vm(marker_id)
+    JOIN swept_marker sm ON sm.marker_id = vm.marker_id
+  ) AS swept,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@internal/infrastructure/db/postgres/migration/20260219082956_fix_vtxo_vw_perf.up.sql`
around lines 23 - 26, The current EXISTS subquery scans swept_marker for each v
row; replace it by unnesting v.markers (e.g. using
jsonb_array_elements_text(v.markers) or a LATERAL join) and directly joining the
resulting marker values to swept_marker.marker_id so the planner can use the
swept_marker index; update the view to compute swept as a boolean from that join
(e.g. existence via JOIN + DISTINCT or boolean aggregate like bool_or)
referencing v.markers, swept_marker.marker_id and the swept_marker table instead
of the current EXISTS(...) pattern.

Comment thread internal/infrastructure/db/postgres/sqlc/queries/query.sql.go
Comment thread internal/infrastructure/db/postgres/sqlc/queries/query.sql.go
Comment on lines +475 to +488
-- name: GetDescendantMarkerIds :many
-- Recursively get a marker and all its descendants (markers whose parent_markers contain it)
WITH RECURSIVE descendant_markers(id) AS (
-- Base case: the marker being swept
SELECT marker.id FROM marker WHERE marker.id = @root_marker_id
UNION ALL
-- Recursive case: find markers whose parent_markers jsonb array contains any descendant
SELECT m.id FROM marker m
INNER JOIN descendant_markers dm ON (
m.parent_markers @> jsonb_build_array(dm.id)
)
)
SELECT descendant_markers.id AS marker_id FROM descendant_markers
WHERE descendant_markers.id NOT IN (SELECT sm.marker_id FROM swept_marker sm);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard recursive descendant traversal against marker cycles.

This CTE has no visited set. If bad data ever creates A -> B -> A, GetDescendantMarkerIds can recurse forever and wedge SweepMarkerWithDescendants.

🛡️ Add cycle protection to the recursive CTE
-WITH RECURSIVE descendant_markers(id) AS (
+WITH RECURSIVE descendant_markers(id, path) AS (
     -- Base case: the marker being swept
-    SELECT marker.id FROM marker WHERE marker.id = `@root_marker_id`
+    SELECT marker.id, ARRAY[marker.id]::text[]
+    FROM marker
+    WHERE marker.id = `@root_marker_id`
     UNION ALL
     -- Recursive case: find markers whose parent_markers jsonb array contains any descendant
-    SELECT m.id FROM marker m
+    SELECT m.id, dm.path || m.id
+    FROM marker m
     INNER JOIN descendant_markers dm ON (
         m.parent_markers @> jsonb_build_array(dm.id)
     )
+    WHERE NOT m.id = ANY(dm.path)
 )
-SELECT descendant_markers.id AS marker_id FROM descendant_markers
+SELECT DISTINCT descendant_markers.id AS marker_id
+FROM descendant_markers
 WHERE descendant_markers.id NOT IN (SELECT sm.marker_id FROM swept_marker sm);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
-- name: GetDescendantMarkerIds :many
-- Recursively get a marker and all its descendants (markers whose parent_markers contain it)
WITH RECURSIVE descendant_markers(id) AS (
-- Base case: the marker being swept
SELECT marker.id FROM marker WHERE marker.id = @root_marker_id
UNION ALL
-- Recursive case: find markers whose parent_markers jsonb array contains any descendant
SELECT m.id FROM marker m
INNER JOIN descendant_markers dm ON (
m.parent_markers @> jsonb_build_array(dm.id)
)
)
SELECT descendant_markers.id AS marker_id FROM descendant_markers
WHERE descendant_markers.id NOT IN (SELECT sm.marker_id FROM swept_marker sm);
-- name: GetDescendantMarkerIds :many
-- Recursively get a marker and all its descendants (markers whose parent_markers contain it)
WITH RECURSIVE descendant_markers(id, path) AS (
-- Base case: the marker being swept
SELECT marker.id, ARRAY[marker.id]::text[]
FROM marker
WHERE marker.id = `@root_marker_id`
UNION ALL
-- Recursive case: find markers whose parent_markers jsonb array contains any descendant
SELECT m.id, dm.path || m.id
FROM marker m
INNER JOIN descendant_markers dm ON (
m.parent_markers @> jsonb_build_array(dm.id)
)
WHERE NOT m.id = ANY(dm.path)
)
SELECT DISTINCT descendant_markers.id AS marker_id
FROM descendant_markers
WHERE descendant_markers.id NOT IN (SELECT sm.marker_id FROM swept_marker sm);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/postgres/sqlc/query.sql` around lines 475 - 488,
GetDescendantMarkerIds' recursive CTE can cycle; modify the CTE
(descendant_markers) to carry a visited set/path (e.g., an array or jsonb of
marker ids) and only recurse into rows whose id is not already in that visited
set, appending the new id to visited on each recursion; keep the same final
SELECT filtering out ids in swept_marker and preserve the base case (SELECT
marker.id) but initialize visited with that id so cycles like A->B->A are
prevented.

@bitcoin-coder-bob
Copy link
Copy Markdown
Collaborator Author

@arkanaai @coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

@bitcoin-coder-bob Sure, I'll kick off a review of the latest changes on this PR.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@arkanaai
Copy link
Copy Markdown
Contributor

arkanaai Bot commented Apr 17, 2026

Arkana Code Review — PR #908: Scale the DAG

Reviewed the full diff. This is a large and well-thought-out optimization. The marker DAG design is sound and the benchmark numbers are compelling. A few things need attention before merge.


🚨 Protocol-Critical — Requires Human Sign-Off

1. Sweep path change in sweeper.go

The checkpoint sweep now routes through SweepVtxoOutpoints (per-outpoint swept_vtxo table) instead of the old SweepVtxos. The justification — that marker-based sweeping over-reaches across independent subtrees when offchain txs consolidate inputs from multiple lineages — is correct and well-reasoned. But this directly modifies how VTXOs transition to swept state, which is part of the forfeit/exit path. Needs explicit human review and sign-off.

2. swept column → computed EXISTS subquery

Replacing the direct swept BOOLEAN column with a view-computed EXISTS (SELECT 1 FROM swept_marker ... OR swept_vtxo ...) is a protocol-level change. Every VTXO lifecycle check now hits this subquery. Two concerns:

  • Correctness at scale: the double-EXISTS in vtxo_vw on every read could be slow. Has this been EXPLAIN ANALYZE'd on production-scale data (millions of VTXOs, GIN index)?
  • Dual-path complexity: swept state is now written via two different tables (swept_marker for batch/round sweeps, swept_vtxo for checkpoint sweeps). Future contributors need to maintain both paths. Worth adding a comment to the view definition explaining this.

🐛 Bug: Down Migration Doesn't Restore swept Column

20260210100000_add_depth_and_markers.down.sql drops swept_marker and marker tables, but does not restore the swept BOOLEAN column on vtxo. The up migration drops that column; the down migration needs to add it back (and backfill from swept_marker before dropping it).

After a rollback, the vtxo table would have no swept column, the view does v.* so swept would be absent, and anything reading vtxo.swept would fail. This would corrupt state on rollback.

Fix needed in the down migration:

-- Before dropping swept_marker:
ALTER TABLE vtxo ADD COLUMN swept BOOLEAN NOT NULL DEFAULT false;
UPDATE vtxo v SET swept = true
  WHERE EXISTS (SELECT 1 FROM swept_marker sm WHERE v.markers @> jsonb_build_array(sm.marker_id));
-- Then drop swept_marker, marker, etc.

⚠️ Operational Concerns

3. Backfill migration may be very slow on large DBs

The INSERT INTO marker ... SELECT + UPDATE vtxo SET markers = ... runs over every VTXO in a single transaction. On a production DB with millions of rows, this will hold a lock for a long time and may OOM. Consider a batched backfill script outside the migration, or at minimum document the expected duration for operators.

4. preloadByMarkers loads full chain regardless of pageSize

PR notes acknowledge this. Just flagging it's a real OOM risk for very large chains (100k+ VTXOs) on a memory-constrained server. A process-level LRU by marker ID (noted as out of scope) would address this — worth tracking as a follow-up issue.

5. isActive iterates map non-deterministically

for _, expiresAt := range outpoints {
    return time.Now().Before(expiresAt) // returns on first entry
}

This returns on the first (arbitrary) map iteration. In practice all outpoints get the same expiry via touch, so this is fine — but it's fragile. A simple any(expiresAt) approach or checking the first outpoint's value explicitly would be clearer.


✅ Things That Look Good

  • HMAC-signed cursors for pagination prevent cursor forgery in exposurePrivate mode. Key derivation from privkey.Serialize() is clean.
  • validateChainAuth pagination continuation (expired-timestamp but cache-live) is a reasonable UX tradeoff and the fallback correctly re-verifies the Schnorr signature.
  • The in-loop bulk offchain tx fallback for VTXOs without marker coverage is a good defensive design.
  • The dual-marker + per-outpoint sweep split correctly avoids over-reaching. The comment in sweeper.go explaining this is clear.
  • Test coverage (timing breakdown, call-counting, benchmarks, pagination e2e) is solid.
  • SQLite + Postgres + Badger repo-level tests for new interfaces.

Summary: The optimization design is correct and the benchmarks justify the complexity. The down migration swept column restoration bug needs a fix before merge. The sweep path and swept computation changes are protocol-critical — flagging for mandatory human sign-off from one of the assigned reviewers (@Kukks, @altafan, @louisinger).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/infrastructure/db/postgres/sqlc/queries/query.sql.go (2)

1-15: ⚠️ Potential issue | 🟡 Minor

Reminder: generated file — propagate fixes via the .sql source.

Header marks this as sqlc-generated (DO NOT EDIT). Any adjustment suggested on this file (e.g., missing casts, query shape) must be made in internal/infrastructure/db/postgres/sqlc/query.sql and regenerated with sqlc generate, otherwise the next regeneration will revert manual edits.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/postgres/sqlc/queries/query.sql.go` around lines 1
- 15, This is a generated sqlc file (package queries) and must not be edited
directly; make any changes to the SQL source file
internal/infrastructure/db/postgres/sqlc/query.sql (adjust query shapes, casts,
or types that produced the missing/incorrect Go code) and then run sqlc generate
to regenerate query.sql.go so changes to functions/types in package queries are
preserved; do not modify query.sql.go manually.

2720-2743: ⚠️ Potential issue | 🟠 Major

Add ::jsonb cast to @markers parameter in UpsertVtxo query for consistency with UpdateVtxoMarkers.

The source query.sql shows UpsertVtxo passes @markers (json.RawMessage) without a cast, while the identically-typed UpdateVtxoMarkers explicitly casts with @markers::jsonb. Since both statements assign to the same jsonb markers column and use json.RawMessage, the inconsistency creates fragility: INSERT/UPSERT operations may fail when lib/pq's parameter-type inference does not handle the uncast json.RawMessage as jsonb.

Fix in internal/infrastructure/db/postgres/sqlc/query.sql: cast the markers parameter to jsonb in the VALUES clause (similar to line 2349 in UpdateVtxoMarkers).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/postgres/sqlc/queries/query.sql.go` around lines
2720 - 2743, The UpsertVtxo query is passing the markers parameter without an
explicit type which can break when lib/pq doesn't infer jsonb; update the VALUES
clause of the UpsertVtxo statement to cast the markers parameter to jsonb (i.e.,
use the same `@markers`::jsonb style used in UpdateVtxoMarkers) so the markers
column (jsonb) is always supplied as jsonb and the ON CONFLICT/UPDATE assignment
to markers continues to work reliably.
🧹 Nitpick comments (6)
internal/core/application/service.go (1)

1100-1125: Sort parentMarkerIDs before emitting the accepted event.

This slice is built from map iteration, so its order is nondeterministic. The same logical parent set can therefore serialize differently across runs, which is brittle once it is persisted in events/JSON.

♻️ Suggested change
 	parentMarkerIDs := make([]string, 0, len(parentMarkerSet))
 	for id := range parentMarkerSet {
 		parentMarkerIDs = append(parentMarkerIDs, id)
 	}
+	slices.Sort(parentMarkerIDs)
 
 	change, err := offchainTx.Accept(
 		fullySignedArkTx, signedCheckpointTxsMap,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/application/service.go` around lines 1100 - 1125, The
parentMarkerIDs slice is populated from parentMarkerSet (map iteration) and thus
has nondeterministic order; sort parentMarkerIDs (e.g., call
sort.Strings(parentMarkerIDs)) after building it and before calling
offchainTx.Accept so the emitted accepted event is deterministic, and add the
required import for the sort package if missing; reference variables:
parentMarkerSet, parentMarkerIDs and the call offchainTx.Accept.
internal/infrastructure/db/sqlite/marker_repo.go (2)

142-185: "Bulk" sweep methods are actually per-row inserts inside a transaction.

PR objectives explicitly call for "bulk INSERTs into swept_markers" to keep sweeping cheap at 20k+ depth. In this SQLite implementation both BulkSweepMarkers (142–162) and SweepVtxoOutpoints (164–185) loop one INSERT per element. For a window covering thousands of markers/outpoints this becomes thousands of round-trips even though it's a single transaction. SQLite supports multi-VALUES inserts — prefer a single statement per batch (chunked to SQLITE_MAX_VARIABLE_NUMBER / params-per-row):

  • Add a sqlc query such as BulkInsertSweptMarkers that takes sqlc.slice and performs INSERT ... VALUES (?, ?), (?, ?), ... ON CONFLICT DO NOTHING, or
  • Build the multi-row statement dynamically in Go and execute once per chunk.

Same guidance for CreateRootMarkersForVtxos (lines 314–340).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/sqlite/marker_repo.go` around lines 142 - 185,
Both BulkSweepMarkers and SweepVtxoOutpoints currently perform per-row INSERTs
inside a transaction; change them to perform batched multi-row INSERTs instead
(chunked to SQLITE_MAX_VARIABLE_NUMBER / params-per-row) to avoid thousands of
round-trips: either add sqlc queries like BulkInsertSweptMarkers that accept
slices and emit INSERT ... VALUES (...),(...),... ON CONFLICT DO NOTHING, or
dynamically build and exec a single multi-row statement per chunk in the
markerRepository methods (BulkSweepMarkers and SweepVtxoOutpoints); apply the
same batching approach to CreateRootMarkersForVtxos so all large sweeps use
chunked multi-value INSERTs rather than per-row inserts.

380-411: GetVtxoChainByMarkers issues one full-scan LIKE query per marker.

Combined with the SQLite query that already scans vtxo_vw, calling this with N markers produces N full table scans plus an in-memory dedup. For preloadByMarkers this scales with chain depth (N/MarkerInterval), so the "bulk fetch" benefit narrows on large chains.

Options, in order of effort:

  1. Add a sqlc query that accepts sqlc.slice('ids') and unions the per-marker sub-queries (still LIKE but one round-trip).
  2. Re-express using EXISTS (SELECT 1 FROM json_each(markers) j WHERE j.value IN (…)) inside a CTE — single statement, more accurate matching, still scan-bound in SQLite but avoids the Go-side fanout.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/sqlite/marker_repo.go` around lines 380 - 411,
GetVtxoChainByMarkers currently issues one LIKE query per marker causing N full
scans; replace the per-marker loop with a single SQL query and update the method
to call it. Add a new sqlc query (e.g., SelectVtxoChainByMarkers(ids
sqlc.slice('ids'))) that accepts a slice of marker IDs and returns distinct vtxo
rows (use UNION of per-marker subqueries or a single query with json_each/EXISTS
and SELECT DISTINCT txid, vout to dedupe). Then update
markerRepository.GetVtxoChainByMarkers to call querier.SelectVtxoChainByMarkers
once with the markerIDs slice, map the returned rows via
rowToVtxoFromChainQuery, and remove the per-marker loop and seen map.
internal/infrastructure/db/sqlite/sqlc/query.sql (1)

548-550: Parameter style inconsistency in InsertSweptVtxo.

Uses positional ? placeholders while every other statement in this file uses named @ parameters (e.g., InsertSweptMarker right above). sqlc will still generate it, but the positional form yields parameter names like Column1/Column2/Column3 in Go instead of Txid/Vout/SweptAt. Align with the rest of the file.

♻️ Proposed diff
 -- name: InsertSweptVtxo :exec
 INSERT OR IGNORE INTO swept_vtxo (txid, vout, swept_at)
-VALUES (?, ?, ?);
+VALUES (`@txid`, `@vout`, `@swept_at`);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/sqlite/sqlc/query.sql` around lines 548 - 550,
Replace the positional parameter placeholders in the SQL statement for
InsertSweptVtxo with named @ parameters to match the rest of the file;
specifically change the VALUES (?, ?, ?) usage to use `@txid`, `@vout`, `@swept_at`
(matching columns txid, vout, swept_at) so sqlc generates Go params named
Txid/Vout/SweptAt instead of Column1/2/3.
internal/infrastructure/db/postgres/sqlc/query.sql (1)

475-488: Cycle handling in the recursive CTE — functionally safe, but worth a comment.

Unlike SQLite where the past review comment was addressed via json_each, this PG version uses plain UNION (not UNION ALL), so Postgres' recursive-CTE semantics deduplicate the working set and a cycle like A → B → A self-terminates. That's subtle; a one-line SQL comment stating "UNION (set semantics) handles cycles by filtering rows already produced" would prevent a future refactor to UNION ALL from silently introducing an infinite loop.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/postgres/sqlc/query.sql` around lines 475 - 488,
The recursive CTE GetDescendantMarkerIds relies on UNION's set semantics to
deduplicate rows and thus prevent infinite loops on cycles; add a concise SQL
comment above the WITH RECURSIVE descendant_markers(...) noting that "UNION (set
semantics) handles cycles by filtering rows already produced, do not change to
UNION ALL" so future maintainers won't convert to UNION ALL and accidentally
reintroduce infinite recursion; reference descendant_markers and the use of
UNION in the comment.
internal/infrastructure/db/sqlite/sqlc/queries/query.sql.go (1)

100-113: NOT IN against swept_marker is fine today but fragile; prefer NOT EXISTS / anti-join.

GetDescendantMarkerIds terminates with WHERE descendant_markers.id NOT IN (SELECT sm.marker_id FROM swept_marker sm). Correctness today relies on swept_marker.marker_id being NOT NULL (PK) — if that ever changes or a view is substituted, a single NULL in the subquery collapses the result to empty silently. NOT EXISTS is also typically better for SQLite's planner on larger swept_marker tables. Same applies to the Postgres twin (Line 147 in postgres/sqlc/queries/query.sql.go).

Suggested query shape (apply in both `.sql` sources, regenerate)
-SELECT descendant_markers.id AS marker_id FROM descendant_markers
-WHERE descendant_markers.id NOT IN (SELECT sm.marker_id FROM swept_marker sm)
+SELECT dm.id AS marker_id
+FROM descendant_markers dm
+WHERE NOT EXISTS (
+    SELECT 1 FROM swept_marker sm WHERE sm.marker_id = dm.id
+)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/infrastructure/db/sqlite/sqlc/queries/query.sql.go` around lines 100
- 113, The recursive query GetDescendantMarkerIds uses "WHERE
descendant_markers.id NOT IN (SELECT sm.marker_id FROM swept_marker sm)" which
is fragile if sm.marker_id can ever be NULL; change this to an anti-join using
NOT EXISTS (e.g. WHERE NOT EXISTS (SELECT 1 FROM swept_marker sm WHERE
sm.marker_id = descendant_markers.id)) or a LEFT JOIN ... WHERE sm.marker_id IS
NULL to avoid NULL-subquery issues; update the getDescendantMarkerIds SQL in
query.sql.go (referencing descendant_markers, swept_marker, marker_id) and make
the corresponding change in the Postgres twin so you can regenerate the sqlc
artifacts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/core/application/indexer.go`:
- Around line 611-620: The preload step currently aborts on any marker-related
error; change preloadByMarkers usage so transient marker repo errors (from
MarkerRepo.GetVtxoChainByMarkers or MarkerRepo.GetMarkersByIds surfaced inside
preloadByMarkers) are logged and ignored rather than returned: inside
preloadByMarkers (or at the call site in indexer.go where preloadByMarkers is
invoked), catch errors from marker lookups, call the existing logger to record
the error and context, and continue without returning an error so the caller can
proceed relying on per-hop Vtxos().GetVtxos + ensureVtxosCached fallback; keep
real failures from Vtxos().GetVtxos propagated as before but ensure
marker-retrieval failures do not hard-abort GetVtxoChain.
- Around line 612-623: The code is redundantly calling
repoManager.Vtxos().GetVtxos(frontier) twice: once before the loop and again
inside ensureVtxosCached; to fix, have the initial GetVtxos result be used to
seed the vtxoCache/loadedMarkers so ensureVtxosCached can skip re-fetching the
same outpoints — specifically, when Markers() != nil capture startVtxos returned
by Vtxos().GetVtxos and pass those into preloadByMarkers and also insert them
into vtxoCache (and mark them in loadedMarkers) so that the subsequent call path
in ensureVtxosCached sees them as cached and does not call Vtxos().GetVtxos for
the same nextVtxos again.

In `@internal/core/application/service.go`:
- Around line 372-378: The current code in the block checking len(spentVtxos) !=
len(spentVtxoKeys) returns early, which silently drops propagation for a
finalized offchain tx — instead, on a short/partial parent read either
retry/fail this path (propagate an error so the caller can requeue/retry) or
fall back to the accepted event metadata and continue emitting the
TransactionEvent and starting the watcher; specifically, replace the early
return in the block that references spentVtxos, spentVtxoKeys and txid with
logic that (a) attempts a retry/read-backoff (or returns an explicit error) and
if retries are exhausted, (b) constructs/uses the metadata present on the
accepted event to emit the TransactionEvent and invoke the watcher startup path
so outputs remain tracked rather than dropped.

In `@internal/core/application/sweeper.go`:
- Around line 767-778: GetAllChildrenVtxos is being called with only vtxo.Txid
which returns descendants of the whole transaction and can include siblings;
change the lookup to be outpoint-scoped and pass only descendants for that
output. Replace the call to s.repoManager.Vtxos().GetAllChildrenVtxos(ctx,
vtxo.Txid) with an outpoint-aware query (e.g.,
s.repoManager.Vtxos().GetAllChildrenVtxosByOutpoint(ctx, vtxo.Txid, vtxo.Vout)
or extend GetAllChildrenVtxos to accept an outpoint struct), ensure the
resulting childrenVtxos only include descendants of that txid:vout, then call
s.repoManager.Markers().SweepVtxoOutpoints(ctx, childrenVtxos, sweptAt) as
before; update the Vtxos repo implementation accordingly to perform
txid+vout-scoped traversal.

In
`@internal/infrastructure/db/postgres/migration/20260416120000_add_swept_vtxo.down.sql`:
- Around line 1-54: The down migration currently drops swept_vtxo and recreates
vtxo_vw/intent_with_inputs_vw without restoring checkpoint-sweep state, silently
resurrecting swept VTXOs; instead either implement a translation that recreates
swept_vtxo from swept_marker/markers before dropping it or make the migration
explicitly irreversible. Modify 20260416120000_add_swept_vtxo.down.sql: remove
the DROP TABLE IF EXISTS swept_vtxo + view recreation path and replace with a
RAISE EXCEPTION 'irreversible migration: cannot restore swept_vtxo state' (or,
if you prefer a restore, create swept_vtxo and populate it by joining
swept_marker to vtxo via markers before recreating vtxo_vw and
intent_with_inputs_vw) so vtxo_vw, intent_with_inputs_vw and swept_vtxo
consistency is preserved.

In `@internal/infrastructure/db/sqlite/marker_repo.go`:
- Around line 284-312: Wrap the three DB calls in SweepVtxosByMarker (GetMarker,
CountUnsweptVtxosByMarkerId, InsertSweptMarker) inside a single transaction
using the repository's execTx helper: start a transaction, run GetMarker, then
CountUnsweptVtxosByMarkerId, then InsertSweptMarker within that tx context,
commit on success and rollback on error, and return the counted value; this
ensures the returned count matches the sweep insert atomically for
metrics/accounting while keeping the existing idempotent insert behavior.
- Around line 540-549: parseMarkersJSON currently swallows json.Unmarshal errors
and returns nil, making corrupt markers indistinguishable from empty ones;
change parseMarkersJSON(markersJSON string) to return ([]string, error) instead
of silently returning nil, have it return (nil, err) when json.Unmarshal fails
(or (nil, nil) when markersJSON==""), and update all callers of parseMarkersJSON
to handle the error (log at warn/error or propagate) so corrupt markers are
surfaced rather than treated as empty.

In `@internal/infrastructure/db/sqlite/sqlc/queries/query.sql.go`:
- Around line 93-98: The marker-ID nullability is inconsistent across SQLite
queries (CountUnsweptVtxosByMarkerId, SelectVtxosByMarkerId vs
SelectVtxoChainByMarker); make them consistent by ensuring the sql.NullString
parameter's Valid flag is set deterministically before use—update
CountUnsweptVtxosByMarkerId and SelectVtxosByMarkerId to set markerID.Valid =
len(markerID.String) > 0 (or markerID.Valid = markerID.String != "") before
calling q.db.QueryRowContext / q.db.QueryContext so all three functions treat
empty IDs the same, or alternatively change all three signatures to accept plain
string and adjust the SQL (CAST/COALESCE) so sqlc emits non-nullable strings;
pick one approach and apply it to the named functions to restore consistency.

---

Outside diff comments:
In `@internal/infrastructure/db/postgres/sqlc/queries/query.sql.go`:
- Around line 1-15: This is a generated sqlc file (package queries) and must not
be edited directly; make any changes to the SQL source file
internal/infrastructure/db/postgres/sqlc/query.sql (adjust query shapes, casts,
or types that produced the missing/incorrect Go code) and then run sqlc generate
to regenerate query.sql.go so changes to functions/types in package queries are
preserved; do not modify query.sql.go manually.
- Around line 2720-2743: The UpsertVtxo query is passing the markers parameter
without an explicit type which can break when lib/pq doesn't infer jsonb; update
the VALUES clause of the UpsertVtxo statement to cast the markers parameter to
jsonb (i.e., use the same `@markers`::jsonb style used in UpdateVtxoMarkers) so
the markers column (jsonb) is always supplied as jsonb and the ON
CONFLICT/UPDATE assignment to markers continues to work reliably.

---

Nitpick comments:
In `@internal/core/application/service.go`:
- Around line 1100-1125: The parentMarkerIDs slice is populated from
parentMarkerSet (map iteration) and thus has nondeterministic order; sort
parentMarkerIDs (e.g., call sort.Strings(parentMarkerIDs)) after building it and
before calling offchainTx.Accept so the emitted accepted event is deterministic,
and add the required import for the sort package if missing; reference
variables: parentMarkerSet, parentMarkerIDs and the call offchainTx.Accept.

In `@internal/infrastructure/db/postgres/sqlc/query.sql`:
- Around line 475-488: The recursive CTE GetDescendantMarkerIds relies on
UNION's set semantics to deduplicate rows and thus prevent infinite loops on
cycles; add a concise SQL comment above the WITH RECURSIVE
descendant_markers(...) noting that "UNION (set semantics) handles cycles by
filtering rows already produced, do not change to UNION ALL" so future
maintainers won't convert to UNION ALL and accidentally reintroduce infinite
recursion; reference descendant_markers and the use of UNION in the comment.

In `@internal/infrastructure/db/sqlite/marker_repo.go`:
- Around line 142-185: Both BulkSweepMarkers and SweepVtxoOutpoints currently
perform per-row INSERTs inside a transaction; change them to perform batched
multi-row INSERTs instead (chunked to SQLITE_MAX_VARIABLE_NUMBER /
params-per-row) to avoid thousands of round-trips: either add sqlc queries like
BulkInsertSweptMarkers that accept slices and emit INSERT ... VALUES
(...),(...),... ON CONFLICT DO NOTHING, or dynamically build and exec a single
multi-row statement per chunk in the markerRepository methods (BulkSweepMarkers
and SweepVtxoOutpoints); apply the same batching approach to
CreateRootMarkersForVtxos so all large sweeps use chunked multi-value INSERTs
rather than per-row inserts.
- Around line 380-411: GetVtxoChainByMarkers currently issues one LIKE query per
marker causing N full scans; replace the per-marker loop with a single SQL query
and update the method to call it. Add a new sqlc query (e.g.,
SelectVtxoChainByMarkers(ids sqlc.slice('ids'))) that accepts a slice of marker
IDs and returns distinct vtxo rows (use UNION of per-marker subqueries or a
single query with json_each/EXISTS and SELECT DISTINCT txid, vout to dedupe).
Then update markerRepository.GetVtxoChainByMarkers to call
querier.SelectVtxoChainByMarkers once with the markerIDs slice, map the returned
rows via rowToVtxoFromChainQuery, and remove the per-marker loop and seen map.

In `@internal/infrastructure/db/sqlite/sqlc/queries/query.sql.go`:
- Around line 100-113: The recursive query GetDescendantMarkerIds uses "WHERE
descendant_markers.id NOT IN (SELECT sm.marker_id FROM swept_marker sm)" which
is fragile if sm.marker_id can ever be NULL; change this to an anti-join using
NOT EXISTS (e.g. WHERE NOT EXISTS (SELECT 1 FROM swept_marker sm WHERE
sm.marker_id = descendant_markers.id)) or a LEFT JOIN ... WHERE sm.marker_id IS
NULL to avoid NULL-subquery issues; update the getDescendantMarkerIds SQL in
query.sql.go (referencing descendant_markers, swept_marker, marker_id) and make
the corresponding change in the Postgres twin so you can regenerate the sqlc
artifacts.

In `@internal/infrastructure/db/sqlite/sqlc/query.sql`:
- Around line 548-550: Replace the positional parameter placeholders in the SQL
statement for InsertSweptVtxo with named @ parameters to match the rest of the
file; specifically change the VALUES (?, ?, ?) usage to use `@txid`, `@vout`,
`@swept_at` (matching columns txid, vout, swept_at) so sqlc generates Go params
named Txid/Vout/SweptAt instead of Column1/2/3.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8faa340e-f86d-44fa-80ef-cb16a7d8d6d1

📥 Commits

Reviewing files that changed from the base of the PR and between a266382 and 2ff4b78.

📒 Files selected for processing (25)
  • internal/core/application/indexer.go
  • internal/core/application/indexer_bench_test.go
  • internal/core/application/indexer_test.go
  • internal/core/application/service.go
  • internal/core/application/sweeper.go
  • internal/core/application/sweeper_test.go
  • internal/core/domain/marker_repo.go
  • internal/infrastructure/db/badger/marker_repo.go
  • internal/infrastructure/db/postgres/marker_repo.go
  • internal/infrastructure/db/postgres/migration/20260416120000_add_swept_vtxo.down.sql
  • internal/infrastructure/db/postgres/migration/20260416120000_add_swept_vtxo.up.sql
  • internal/infrastructure/db/postgres/sqlc/queries/models.go
  • internal/infrastructure/db/postgres/sqlc/queries/query.sql.go
  • internal/infrastructure/db/postgres/sqlc/query.sql
  • internal/infrastructure/db/postgres/vtxo_repo.go
  • internal/infrastructure/db/service.go
  • internal/infrastructure/db/service_test.go
  • internal/infrastructure/db/sqlite/marker_repo.go
  • internal/infrastructure/db/sqlite/migration/20260416120000_add_swept_vtxo.down.sql
  • internal/infrastructure/db/sqlite/migration/20260416120000_add_swept_vtxo.up.sql
  • internal/infrastructure/db/sqlite/round_repo.go
  • internal/infrastructure/db/sqlite/sqlc/queries/models.go
  • internal/infrastructure/db/sqlite/sqlc/queries/query.sql.go
  • internal/infrastructure/db/sqlite/sqlc/query.sql
  • internal/infrastructure/db/sqlite/vtxo_repo.go
✅ Files skipped from review due to trivial changes (1)
  • internal/core/domain/marker_repo.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • internal/infrastructure/db/postgres/marker_repo.go
  • internal/infrastructure/db/postgres/sqlc/queries/models.go
  • internal/core/application/indexer_bench_test.go
  • internal/infrastructure/db/service.go
  • internal/infrastructure/db/badger/marker_repo.go

Comment on lines +378 to 400
// Determine page size.
// Backward compat: nil page + empty token → return full chain (no pagination).
pageSize := math.MaxInt32
if page != nil {
pageSize = int(page.PageSize)
if pageSize <= 0 {
pageSize = maxPageSizeVtxoChain
}
if _, ok := outpoints[outpoint.String()]; !ok {
return nil, fmt.Errorf("auth token is not for outpoint %s", outpoint)
} else if pageToken != "" {
pageSize = maxPageSizeVtxoChain
}

// Determine frontier: decode pageToken, or use [vtxoKey] for first page.
var frontier []domain.Outpoint
if pageToken != "" {
decoded, err := i.decodeChainCursor(pageToken)
if err != nil {
return nil, fmt.Errorf("invalid page_token: %w", err)
}
frontier = decoded
} else {
frontier = []domain.Outpoint{vtxoKey}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Backward-compat fallback leaves GetVtxoChain memory-unbounded.

When both page == nil and pageToken == "", pageSize becomes math.MaxInt32, so walkVtxoChain collects the full chain into chain/allOutpoints/visited/vtxoCache. For the 20k+ depths this PR targets, a public-exposure call without pagination can still OOM the server. The prior reviewer's thread already flagged the scalability concern for GetVtxoChainByIntent; the same hazard applies here for legacy clients. Consider defaulting to maxPageSizeVtxoChain and emitting a NextPageToken, then documenting the change, rather than preserving unbounded traversal.

Comment thread internal/core/application/indexer.go
Comment on lines +612 to +623
if i.repoManager.Markers() != nil {
startVtxos, err := i.repoManager.Vtxos().GetVtxos(ctx, nextVtxos)
if err != nil {
return nil, nil, err
return nil, nil, "", err
}
if err := i.preloadByMarkers(ctx, startVtxos, vtxoCache, offchainTxCache); err != nil {
return nil, nil, "", err
}
}

for len(nextVtxos) > 0 {
if err := i.ensureVtxosCached(ctx, nextVtxos, vtxoCache, loadedMarkers); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Redundant GetVtxos for the frontier on the miss path.

When Markers() != nil but the frontier VTXO isn't persisted (e.g., bad client input), line 613 issues GetVtxos(frontier) which returns empty, preloadByMarkers is a no-op, and then ensureVtxosCached at line 623 issues GetVtxos(frontier) a second time for the same outpoints before ultimately erroring with "vtxo not found". Minor, but easy to avoid by seeding the cache with whatever the first call returned even on the empty result path — or by reusing the pre-fetch results in the main loop.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/application/indexer.go` around lines 612 - 623, The code is
redundantly calling repoManager.Vtxos().GetVtxos(frontier) twice: once before
the loop and again inside ensureVtxosCached; to fix, have the initial GetVtxos
result be used to seed the vtxoCache/loadedMarkers so ensureVtxosCached can skip
re-fetching the same outpoints — specifically, when Markers() != nil capture
startVtxos returned by Vtxos().GetVtxos and pass those into preloadByMarkers and
also insert them into vtxoCache (and mark them in loadedMarkers) so that the
subsequent call path in ensureVtxosCached sees them as cached and does not call
Vtxos().GetVtxos for the same nextVtxos again.

Comment on lines +372 to +378
if len(spentVtxos) != len(spentVtxoKeys) {
log.Warnf(
"incomplete parent read: got %d of %d spent vtxos for tx %s",
len(spentVtxos), len(spentVtxoKeys), txid,
)
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t silently drop finalized offchain tx propagation on a short parent read.

Returning here means no TransactionEvent is emitted and no watcher is started for that finalized tx. A partial read now leaves those outputs untracked until some later recovery path happens to notice them. Please fail/retry this path, or fall back to metadata already carried by the accepted event instead of returning.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/application/service.go` around lines 372 - 378, The current
code in the block checking len(spentVtxos) != len(spentVtxoKeys) returns early,
which silently drops propagation for a finalized offchain tx — instead, on a
short/partial parent read either retry/fail this path (propagate an error so the
caller can requeue/retry) or fall back to the accepted event metadata and
continue emitting the TransactionEvent and starting the watcher; specifically,
replace the early return in the block that references spentVtxos, spentVtxoKeys
and txid with logic that (a) attempts a retry/read-backoff (or returns an
explicit error) and if retries are exhausted, (b) constructs/uses the metadata
present on the accepted event to emit the TransactionEvent and invoke the
watcher startup path so outputs remain tracked rather than dropped.

Comment thread internal/core/application/sweeper.go Outdated
Comment thread internal/infrastructure/db/sqlite/marker_repo.go
Comment thread internal/infrastructure/db/sqlite/marker_repo.go Outdated
Comment thread internal/infrastructure/db/sqlite/sqlc/queries/query.sql.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scale the DAG

2 participants