Skip to content

Latest commit

 

History

History
36 lines (32 loc) · 2.03 KB

File metadata and controls

36 lines (32 loc) · 2.03 KB

Development TODOs: Financial Report Processing API

M1: Project Bootstrap

  • Choose implementation stack (Node.js + Fastify OR Python + FastAPI)
  • Scaffold project (deps, scripts, formatter, linter, tests)
  • Implement config loader and validation for .env and ./config/values.json
  • Ensure directories exist: ./reports, ./preprocessing, ./processed, ./config
  • Add structured logging and basic metrics scaffolding
  • Create initial unit tests for utilities (env, fs, json helpers)

M2: Core Processing Pipeline

  • Implement Step 1: cache check vs .env LAST_MODIFIED
  • Implement Step 2: list PDFs in ./reports and parallelize tasks (bounded)
  • Implement Step 3.1–3.6 per-file sequential flow
    • 3.1 Processed file check (./processed/[filename].json)
    • 3.2 Preprocessing file check (./preprocessing/[filename])
    • 3.3 Appendix detection via Gemini (start/end pages)
    • 3.4 PDF appendix extraction to ./preprocessing/[filename]
    • 3.5 Field extraction via Gemini using ./config/values.json
    • 3.6 Save JSON to ./processed/[filename].json (validate company name)
  • Implement Step 4: completion synchronization and validation of expected outputs
  • Centralize error handling and map failures to 500 per PRD

M3: Consolidation and API

  • Implement Step 5: consolidate ./processed/*.json and write ./processed/result.json
  • Implement Step 6: API endpoint to serve consolidated JSON (omit LAST_MODIFIED)
  • Add end-to-end tests with fixture PDFs and stubbed Gemini
  • Validate idempotency with unchanged LAST_MODIFIED

M4: Hardening and Ops

  • Add retries with exponential backoff for Gemini calls
  • Tune concurrency, timeouts, memory use; stream PDFs where possible
  • Add observability: request/file correlation ids, step durations, counts
  • Extend negative tests (missing files, invalid PDFs, Gemini errors)
  • Document setup/run/deploy in README and create runbook
  • (Optional) Add CI workflow to run tests and lint on PRs