Skip to content

Add streaming workflow design doc, renumber design docs#677

Draft
karthikiyer56 wants to merge 4 commits intofeature/full-historyfrom
karthik/streaming-design-doc
Draft

Add streaming workflow design doc, renumber design docs#677
karthikiyer56 wants to merge 4 commits intofeature/full-historyfrom
karthik/streaming-design-doc

Conversation

@karthikiyer56
Copy link
Copy Markdown
Contributor

  • Add 02-streaming-workflow.md: streaming mode design covering startup validation, first-start .bin loading, per-ledger ingestion loop, three independent sub-flow transitions (LFS, events, txhash), crash recovery invariants, backfill-to-streaming migration, and error handling
  • Rename 03-backfill-workflow.md → 01-backfill-workflow.md
  • Update README with new numbering, reading order, and completeness status
  • Add _ref-old-* files to .gitignore (local reference only)

- Add 02-streaming-workflow.md: streaming mode design covering startup
  validation, first-start .bin loading, per-ledger ingestion loop, three
  independent sub-flow transitions (LFS, events, txhash), crash recovery
  invariants, backfill-to-streaming migration, and error handling
- Rename 03-backfill-workflow.md → 01-backfill-workflow.md
- Update README with new numbering, reading order, and completeness status
- Add _ref-old-* files to .gitignore (local reference only)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds and renumbers Full History “design-docs” documentation to cover both backfill and streaming ingestion workflows, and updates supporting references/ignore rules.

Changes:

  • Added a new streaming workflow design doc (startup validation, ingestion loop, transitions, recovery invariants).
  • Renumbered/added the backfill workflow doc to 01-* and updated the design-docs README with new reading order/status.
  • Updated .gitignore to ignore local _ref-old-* reference files.

Reviewed changes

Copilot reviewed 2 out of 4 changed files in this pull request and generated 2 comments.

File Description
full-history/design-docs/README.md Reworked overview, doc list/status, reading order, and shared concepts.
full-history/design-docs/02-streaming-workflow.md New streaming-mode design covering ingestion, transitions, and crash recovery.
full-history/design-docs/01-backfill-workflow.md Backfill workflow design doc under new numbering.
.gitignore Ignores local _ref-old-* files under full-history/design-docs/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +172 to +175
if stored_cpi is None:
# First ever run writes the value. Backfill writes this on first run;
# if streaming runs first (no prior backfill), streaming writes it.
meta_store.put("config:chunks_per_txhash_index", config.chunks_per_txhash_index)
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 1 says streaming can be the first-ever run and will write config:chunks_per_txhash_index, but Step 2 immediately fatals if no backfill chunk data exists (“run backfill first”). Please reconcile this (either document that backfill is required before streaming, or describe the supported streaming-from-scratch bootstrap behavior).

Copilot uses AI. Check for mistakes.
Comment on lines +403 to +406
# 4. If index boundary: trigger index-level transitions.
# The index boundary ledger is always also a chunk boundary ledger.
current_index = current_chunk // chunks_per_txhash_index
if ledger_seq == index_last_ledger(current_index):
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In process_ledger, chunks_per_txhash_index is referenced but not defined in the pseudocode’s scope. To keep the design unambiguous, pass it in (e.g., via config) or reference config.chunks_per_txhash_index as done elsewhere in the doc.

Copilot uses AI. Check for mistakes.
- Replace DAG mermaid diagram with pseudocode dependency comments
- Convert remaining prose paragraphs to bullet lists
- Restructure crash recovery invariants as bullet sub-lists
- Fix section heading from "First-Boot TxHash Store" to "Load Backfill TxHash Data into RocksDB"
- Replace all "first boot" with "first start in streaming mode"
- Add inline comments explaining boundary math (subtract 2, chunk_last_ledger formula)
- Add concise main flow (run_streaming) near top of doc
- Add dynamic vs static DAG explanation
- Call out .bin loading as one-time cost
- .bin files only exist if backfill left a partial txhash index
- If backfill ended on an index boundary, step 3 is a no-op
- Remove language implying .bin loading always happens on first start
- Events system persists (term_key, event_id) pairs per ledger in the
  embedded DB for crash recovery. On restart, deltas are replayed to
  rebuild in-memory bitmaps.
- WAL is an internal DB mechanism — nobody reads it directly.
- Replace all "WAL-backed deltas" / "Events WAL" with "persisted deltas"
  / "persisted index deltas" for events-specific references.
- RocksDB WAL references for ledger/txhash stores remain unchanged
  (correct usage).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants