Add streaming workflow design doc, renumber design docs#677
Add streaming workflow design doc, renumber design docs#677karthikiyer56 wants to merge 4 commits intofeature/full-historyfrom
Conversation
karthikiyer56
commented
Apr 15, 2026
- Add 02-streaming-workflow.md: streaming mode design covering startup validation, first-start .bin loading, per-ledger ingestion loop, three independent sub-flow transitions (LFS, events, txhash), crash recovery invariants, backfill-to-streaming migration, and error handling
- Rename 03-backfill-workflow.md → 01-backfill-workflow.md
- Update README with new numbering, reading order, and completeness status
- Add _ref-old-* files to .gitignore (local reference only)
- Add 02-streaming-workflow.md: streaming mode design covering startup validation, first-start .bin loading, per-ledger ingestion loop, three independent sub-flow transitions (LFS, events, txhash), crash recovery invariants, backfill-to-streaming migration, and error handling - Rename 03-backfill-workflow.md → 01-backfill-workflow.md - Update README with new numbering, reading order, and completeness status - Add _ref-old-* files to .gitignore (local reference only)
There was a problem hiding this comment.
Pull request overview
Adds and renumbers Full History “design-docs” documentation to cover both backfill and streaming ingestion workflows, and updates supporting references/ignore rules.
Changes:
- Added a new streaming workflow design doc (startup validation, ingestion loop, transitions, recovery invariants).
- Renumbered/added the backfill workflow doc to
01-*and updated the design-docs README with new reading order/status. - Updated
.gitignoreto ignore local_ref-old-*reference files.
Reviewed changes
Copilot reviewed 2 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
full-history/design-docs/README.md |
Reworked overview, doc list/status, reading order, and shared concepts. |
full-history/design-docs/02-streaming-workflow.md |
New streaming-mode design covering ingestion, transitions, and crash recovery. |
full-history/design-docs/01-backfill-workflow.md |
Backfill workflow design doc under new numbering. |
.gitignore |
Ignores local _ref-old-* files under full-history/design-docs/. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if stored_cpi is None: | ||
| # First ever run writes the value. Backfill writes this on first run; | ||
| # if streaming runs first (no prior backfill), streaming writes it. | ||
| meta_store.put("config:chunks_per_txhash_index", config.chunks_per_txhash_index) |
There was a problem hiding this comment.
Step 1 says streaming can be the first-ever run and will write config:chunks_per_txhash_index, but Step 2 immediately fatals if no backfill chunk data exists (“run backfill first”). Please reconcile this (either document that backfill is required before streaming, or describe the supported streaming-from-scratch bootstrap behavior).
| # 4. If index boundary: trigger index-level transitions. | ||
| # The index boundary ledger is always also a chunk boundary ledger. | ||
| current_index = current_chunk // chunks_per_txhash_index | ||
| if ledger_seq == index_last_ledger(current_index): |
There was a problem hiding this comment.
In process_ledger, chunks_per_txhash_index is referenced but not defined in the pseudocode’s scope. To keep the design unambiguous, pass it in (e.g., via config) or reference config.chunks_per_txhash_index as done elsewhere in the doc.
- Replace DAG mermaid diagram with pseudocode dependency comments - Convert remaining prose paragraphs to bullet lists - Restructure crash recovery invariants as bullet sub-lists - Fix section heading from "First-Boot TxHash Store" to "Load Backfill TxHash Data into RocksDB" - Replace all "first boot" with "first start in streaming mode" - Add inline comments explaining boundary math (subtract 2, chunk_last_ledger formula) - Add concise main flow (run_streaming) near top of doc - Add dynamic vs static DAG explanation - Call out .bin loading as one-time cost
- .bin files only exist if backfill left a partial txhash index - If backfill ended on an index boundary, step 3 is a no-op - Remove language implying .bin loading always happens on first start
- Events system persists (term_key, event_id) pairs per ledger in the embedded DB for crash recovery. On restart, deltas are replayed to rebuild in-memory bitmaps. - WAL is an internal DB mechanism — nobody reads it directly. - Replace all "WAL-backed deltas" / "Events WAL" with "persisted deltas" / "persisted index deltas" for events-specific references. - RocksDB WAL references for ledger/txhash stores remain unchanged (correct usage).