Currently 4 models hardcoded in MODEL_REGISTRY (run.py):
- GLM-OCR (0.9B)
- DeepSeek-OCR (4B)
- LightOnOCR-2 (1B)
- dots.ocr (1.7B)
Candidates to add:
- Qwen2.5-VL-7B — general-purpose VLM with OCR capability
- GOT-OCR2 — dedicated OCR model
- Florence-2 — Microsoft's vision foundation model
- Larger models (7B+) to test whether size helps on hard documents
Also consider making the registry configurable (YAML/TOML file or CLI flag to point at a custom script URL) so users can add their own models without forking.
Currently 4 models hardcoded in
MODEL_REGISTRY(run.py):Candidates to add:
Also consider making the registry configurable (YAML/TOML file or CLI flag to point at a custom script URL) so users can add their own models without forking.