mdl — Model Downloader

A CLI toolkit for downloading, converting, quantizing, and uploading Hugging Face models. Supports full end-to-end pipelines from HF Hub to S3-compatible storage.

Note: The convert, quantize, and pipeline commands require llama.cpp. Run mdl bootstrap-llamacpp first to fetch and build it automatically (requires git, cmake, make).

Features

Batch downloads from Hugging Face Hub with resume and state tracking
GGUF conversion via llama.cpp (convert_hf_to_gguf.py)
Quantization to Q4_K_M, Q5_K_M, Q8_0, etc. via llama-quantize
Ollama Modelfile generation — auto-detects chat format from the model's actual config files (supports ChatML, LLaMA-3, Gemma, Phi, Mistral, DeepSeek, and more)
S3 upload to MinIO or any S3-compatible endpoint
Pipeline mode — download → convert → quantize → upload in one command
Bootstrap — fetch and build llama.cpp automatically
Per-model error handling, dry-run mode, disk-space checks, and YAML configuration

Requirements

Python 3.11+
uv (recommended) or pip
git, cmake, make (for bootstrap-llamacpp)

Installation

git clone https://github.com/fuzzylabs/mdl.git
cd mdl
uv sync            # or: pip install -e .

To enable faster Hugging Face downloads:

uv add hf_transfer   # or: pip install hf_transfer

Configuration

Copy the example files and fill in your values:

cp .env.example .env
cp models.yaml.example models.yaml
cp pipeline.yaml.example pipeline.yaml

See .env.example for all available environment variables.

CLI Reference

All commands are accessed through the mdl entry point:

mdl [COMMAND] [OPTIONS]

`mdl download`

Batch-download models from Hugging Face Hub.

mdl download --config models.yaml                # download all models in config
mdl download -r google/gemma-3-1b-it             # download a single model
mdl download -r org/model-a -r org/model-b       # download multiple by repo ID
mdl download --config models.yaml --dry-run      # preview without downloading
mdl download --clear-state                        # reset download tracking

Option	Description
`-r, --repo-id`	Repo ID to download (repeatable)
`-c, --config PATH`	YAML config file (models.yaml format)
`-n, --dry-run`	Preview what would be downloaded
`--clear-state`	Clear download state and exit
`--min-disk-space INT`	Minimum free disk space in GB (default: 10)
`--delete-after`	Delete model from HF cache after download
`-v, --verbose`	Enable debug logging

models.yaml format

google:
  - gemma-3-1b-it
  - gemma-3-4b-it

meta-llama:
  - Llama-3.2-1B
  - Llama-3.2-1B-Instruct

Each entry becomes the repo ID org/model when downloaded.

`mdl convert`

Convert a downloaded Hugging Face model to F16 GGUF format.

mdl convert -m /path/to/model-dir                           # output defaults to models/<org>/<model>/<model>.f16.gguf
mdl convert -m /path/to/model-dir -o custom/path/out.gguf   # explicit output path

Option	Description
`-m, --model-dir PATH`	Path to the downloaded HF model directory (required)
`-o, --output PATH`	Output F16 GGUF path (default: `models/<org>/<model>/<model>.f16.gguf`)
`--llama-cpp-dir PATH`	Override `LLAMA_CPP_DIR` env var
`-v, --verbose`	Enable debug logging

`mdl quantize`

Quantize an F16 GGUF file to a smaller representation.

mdl quantize -i model.f16.gguf                              # output defaults to model.Q4_K_M.gguf alongside input
mdl quantize -i model.f16.gguf -o out.gguf -t Q5_K_M        # explicit output and type
mdl quantize -i model.f16.gguf --model-dir /path/to/hf-model # also generates Modelfile + config files + README

Option	Description
`-i, --input PATH`	Input F16 GGUF file (required)
`-o, --output PATH`	Output quantized GGUF path (default: `<input_dir>/<model>.<type>.gguf`)
`-t, --type TEXT`	Quantization type — e.g. Q4_K_M, Q5_K_M, Q8_0 (default: Q4_K_M)
`--llama-cpp-dir PATH`	Override `LLAMA_CPP_DIR` env var
`--model-dir PATH`	HF model directory — generates Ollama Modelfile, copies config files, and creates a MODELFILE_README.md next to the output
`-v, --verbose`	Enable debug logging

`mdl upload`

Upload a file to MinIO / S3-compatible storage.

mdl upload -f model.Q4_K_M.gguf
mdl upload -f model.Q4_K_M.gguf -p models/gemma -b my-bucket

Option	Description
`-f, --file PATH`	Local file to upload (required)
`-k, --s3-key TEXT`	Explicit S3 object key (defaults to filename)
`-p, --s3-prefix TEXT`	Prefix (directory) in the bucket
`-b, --bucket TEXT`	Override `MINIO_BUCKET` env var
`-v, --verbose`	Enable debug logging

`mdl pipeline`

Run the full pipeline: download → convert → quantize → upload.

mdl pipeline                                     # uses pipeline.yaml by default
mdl pipeline -c pipeline.yaml --dry-run          # preview without executing
mdl pipeline --no-upload                         # skip the S3 upload step
mdl pipeline --force                             # reprocess completed models
mdl pipeline --keep-quantized                    # keep GGUF files locally after upload
mdl pipeline --clear-state                       # reset pipeline state and exit

Option	Description
`-c, --config PATH`	Pipeline config file (default: pipeline.yaml)
`-n, --dry-run`	Preview actions without executing
`--clear-state`	Clear pipeline state and exit
`--force`	Reprocess already-completed models
`--no-upload`	Skip S3 upload step
`--keep-download`	Keep downloaded model files after processing
`--keep-quantized`	Keep quantized GGUF files after upload

Note: --no-upload only skips the S3 upload — it does not keep files on disk. The pipeline works in a temporary directory that is deleted after each model. To retain the quantized GGUF and related files locally, pass --keep-quantized (e.g. mdl pipeline --no-upload --keep-quantized). | --min-disk-space INT | Minimum free disk space in GB (default: 10) | | -v, --verbose | Enable debug logging |

pipeline.yaml format

models:
  - repo_id: google/gemma-3-1b-it        # required
    # quantize: true                      # default: true
    # upload: true                        # default: true
    # quantization: Q4_K_M               # default: Q4_K_M
    # output_name: custom-name.gguf      # default: model.QTYPE.gguf
    # revision: main                     # pin a git revision / branch / tag

  - repo_id: meta-llama/Llama-3.2-1B
  - repo_id: microsoft/Phi-4-mini-instruct

Output paths

All output files are organised under models/<org>/<model_name>/:

Local (with --keep-quantized): models/google/gemma-3-1b-it/gemma-3-1b-it.Q4_K_M.gguf
S3: s3://<bucket>/models/google/gemma-3-1b-it/gemma-3-1b-it.Q4_K_M.gguf

Ollama Modelfile

The pipeline automatically generates an Ollama Modelfile alongside each quantized GGUF. It reads the model's actual config files — config.json, tokenizer_config.json, and generation_config.json — to derive everything dynamically:

FROM — path to the GGUF file
TEMPLATE — Ollama Go template, detected from the model's eos_token (not just model_type). This correctly handles fine-tunes that change the chat format (e.g. Dolphin-Mistral uses ChatML despite being model_type: mistral)
SYSTEM — default system prompt, extracted from the Jinja2 chat_template via regex
PARAMETER — num_ctx, temperature, top_p, top_k, repeat_penalty, and stop tokens — all from the model's own configs

The raw Jinja2 chat_template is also included as comments at the bottom of the Modelfile for cross-reference when editing.

Supported model families (via EOS token and model_type detection):

Format	Models
ChatML (`<\|im_end\|>`)	Qwen, Qwen2, Qwen3, Dolphin, Yi, InternLM2, DeepSeek-V2/V3/R1, Jamba
LLaMA-3 (`<\|eot_id\|>`)	LLaMA-3, LLaMA-3.1, LLaMA-3.2, LLaMA-3.3
Gemma (`<end_of_turn>`)	Gemma, Gemma 2, Gemma 3
Phi (`<\|end\|>`)	Phi-3, Phi-3.5, Phi-4
Mistral (`</s>`)	Mistral-7B, Mixtral
Command-R	Cohere Command-R, Command-R+
Completion	StarCoder2, Falcon

Unknown models fall back to a generic template with a warning.

Output files (alongside the GGUF):

File	Purpose
`Modelfile`	Ollama-ready model definition
`MODELFILE_README.md`	Guide explaining each Modelfile section
`config.json`	Model architecture reference (copied from HF)
`tokenizer_config.json`	Chat template & tokens reference (copied from HF)
`generation_config.json`	Generation params reference (copied from HF)
`special_tokens_map.json`	Special tokens reference (copied from HF)

These files are:

Uploaded to S3 at models/<org>/<model>/
Saved locally when using --keep-quantized
Logged at DEBUG level when using --verbose

To use the generated Modelfile with Ollama:

cd models/google/gemma-3-1b-it/
ollama create gemma3-1b -f Modelfile
ollama run gemma3-1b

The pipeline also writes a URL registry to model_urls.json locally and mirrors it to s3://<bucket>/metadata/model_urls.json. Each entry includes a download_url and a curl download reference.

`mdl bootstrap-llamacpp`

Clone, build, and extract the required llama.cpp binaries. Requires git, cmake, and make.

mdl bootstrap-llamacpp

This fetches llama.cpp from GitHub, builds it with cmake + make, and copies the required binaries and headers to a llama.cpp-dist/ directory. If llama.cpp/ already exists, the clone step is skipped.

This is a prerequisite for mdl convert, mdl quantize, and mdl pipeline.

`mdl --version`

Print the installed version.

mdl --version

Environment Variables

All variables are set in .env (see .env.example).

Hugging Face

Variable	Description	Default
`HF_TOKEN`	Auth token for private/gated models	—
`HF_ENDPOINT`	Custom HF endpoint (mirror/proxy)	`https://huggingface.co`
`HF_HOME`	HF cache directory	`~/.cache/huggingface/`
`HF_HUB_DOWNLOAD_TIMEOUT`	Download timeout in seconds	120
`HF_HUB_ETAG_TIMEOUT`	ETag timeout in seconds	10
`HF_HUB_ENABLE_HF_TRANSFER`	Enable fast transfers (requires `hf_transfer`)	0

MinIO / S3

Variable	Description	Default
`MINIO_ENDPOINT`	S3 endpoint (host:port)	—
`MINIO_ACCESS_KEY`	Access key	—
`MINIO_SECRET_KEY`	Secret key	—
`MINIO_BUCKET`	Target bucket	`models`
`MINIO_SECURE`	Use HTTPS	`true`
`MINIO_PUBLIC_URL`	Public base URL for downloads	—
`MINIO_PRESIGN_DAYS`	Presigned URL expiry in days	7

llama.cpp

Variable	Description	Default
`LLAMA_CPP_DIR`	Path to llama.cpp directory	`llama.cpp`

How It Works

Load environment — reads .env before any HF imports
Parse config — loads YAML and builds the model list
Validate — checks credentials, disk space, and config structure
Process models — download, convert, quantize, and upload each model
Track state — persists progress to .download_state.json / .pipeline_state.json
Handle errors — logs failures per model and continues with the rest
Summarise — prints totals for successful, failed, and skipped models

Resume is automatic. Completed models are skipped on re-run. Use --clear-state to start fresh or --force to reprocess.

Troubleshooting

Problem	Solution
`RepositoryNotFoundError` / `GatedRepoError`	Set `HF_TOKEN` in `.env`. For gated models, accept terms on the model page first.
Downloads timing out	Increase `HF_HUB_DOWNLOAD_TIMEOUT` in `.env`
Disk space errors	Set `HF_HOME` to a larger drive, or use `--min-disk-space`
Slow downloads	Install `hf_transfer` and set `HF_HUB_ENABLE_HF_TRANSFER=1`
Re-downloading completed models	Don't delete `.download_state.json`. Use `--clear-state` only intentionally.

Development

uv sync --all-extras          # install dev dependencies
uv run pytest                 # run tests
uv run pytest --cov=mdl       # run tests with coverage

Project Structure

src/mdl/
├── __init__.py               # package version
├── cli/
│   ├── __init__.py           # Click group & subcommand registration
│   ├── bootstrap.py          # mdl bootstrap-llamacpp
│   ├── convert.py            # mdl convert
│   ├── download.py           # mdl download
│   ├── pipeline.py           # mdl pipeline
│   ├── quantize.py           # mdl quantize
│   └── upload.py             # mdl upload
└── core/
    ├── config.py             # env loading & logging setup
    ├── downloader.py         # HF Hub download logic & state
    ├── modelfile.py          # Ollama Modelfile generator
    ├── quantizer.py          # llama.cpp convert & quantize
    ├── uploader.py           # MinIO / S3 upload client
    └── url_manager.py        # model URL registry

License

See LICENSE.

Contributing

Contributions welcome. Please follow the existing code style, add tests for new features, and verify with --dry-run before submitting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mdl — Model Downloader

Features

Requirements

Installation

Configuration

CLI Reference

`mdl download`

models.yaml format

`mdl convert`

`mdl quantize`

`mdl upload`

`mdl pipeline`

pipeline.yaml format

Output paths

Ollama Modelfile

`mdl bootstrap-llamacpp`

`mdl --version`

Environment Variables

Hugging Face

MinIO / S3

llama.cpp

How It Works

Troubleshooting

Development

Project Structure

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src/mdl		src/mdl
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
models.yaml.example		models.yaml.example
pipeline.yaml.example		pipeline.yaml.example
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

fuzzylabs/mdl

Folders and files

Latest commit

History

Repository files navigation

mdl — Model Downloader

Features

Requirements

Installation

Configuration

CLI Reference

mdl download

models.yaml format

mdl convert

mdl quantize

mdl upload

mdl pipeline

pipeline.yaml format

Output paths

Ollama Modelfile

mdl bootstrap-llamacpp

mdl --version

Environment Variables

Hugging Face

MinIO / S3

llama.cpp

How It Works

Troubleshooting

Development

Project Structure

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`mdl download`

`mdl convert`

`mdl quantize`

`mdl upload`

`mdl pipeline`

`mdl bootstrap-llamacpp`

`mdl --version`

Packages