This repository contains a minimal, end‑to‑end pipeline to analyze sections in 10‑K filings for several retail companies over the past several years.
- Scripts to download 10‑K filings from SEC EDGAR, extract section intended, and save to disk
- LLM-based tagging via Anthropic Claude
- EDA helpers and plotting stubs
- A CLI (
run.py) to orchestrate the pipeline - A standalone script (
run_report.py) to generate a report from existing data
- Processed data that contains extracted section is stored in
data/processed/as JSONL files - Final results are under
outputs/, where plots and tables for individual companies and aggregated evolution summaries are stored
aibizops/
├── data/
│ └── processed/ # JSONL with extracted section per company-year
├── notebooks/
├── outputs/
│ ├── figures/ # charts
│ └── tables/ # CSVs/TSVs
├── src/aibizops/
│ ├── __init__.py
│ ├── edgar.py # download & locate filings
│ ├── parse.py # extract specific section
│ ├── llm.py # Claude classification
│ ├── pipeline.py # high-level pipeline steps
│ ├── eda/
│ │ ├── plots.py
│ │ ├── novelty.py
│ │ └── stats.py
│ └── utils/
│ ├── io.py
│ └── text.py
├── requirements.txt
└── run.py
└── run_report.py
- We focus on annual 10-K filings and only certain sections.
- We match multiple years by walking EDGAR submissions and selecting the last several 10-Ks.