Skip to content
/ mdl Public

tool for downloading models from hugginface and converting them into gguf format

License

Notifications You must be signed in to change notification settings

fuzzylabs/mdl

Repository files navigation

mdl — Model Downloader

A CLI toolkit for downloading, converting, quantizing, and uploading Hugging Face models. Supports full end-to-end pipelines from HF Hub to S3-compatible storage.

Note: The convert, quantize, and pipeline commands require llama.cpp. Run mdl bootstrap-llamacpp first to fetch and build it automatically (requires git, cmake, make).

Features

  • Batch downloads from Hugging Face Hub with resume and state tracking
  • GGUF conversion via llama.cpp (convert_hf_to_gguf.py)
  • Quantization to Q4_K_M, Q5_K_M, Q8_0, etc. via llama-quantize
  • Ollama Modelfile generation — auto-detects chat format from the model's actual config files (supports ChatML, LLaMA-3, Gemma, Phi, Mistral, DeepSeek, and more)
  • S3 upload to MinIO or any S3-compatible endpoint
  • Pipeline mode — download → convert → quantize → upload in one command
  • Bootstrap — fetch and build llama.cpp automatically
  • Per-model error handling, dry-run mode, disk-space checks, and YAML configuration

Requirements

  • Python 3.11+
  • uv (recommended) or pip
  • git, cmake, make (for bootstrap-llamacpp)

Installation

git clone https://github.com/fuzzylabs/mdl.git
cd mdl
uv sync            # or: pip install -e .

To enable faster Hugging Face downloads:

uv add hf_transfer   # or: pip install hf_transfer

Configuration

Copy the example files and fill in your values:

cp .env.example .env
cp models.yaml.example models.yaml
cp pipeline.yaml.example pipeline.yaml

See .env.example for all available environment variables.

CLI Reference

All commands are accessed through the mdl entry point:

mdl [COMMAND] [OPTIONS]

mdl download

Batch-download models from Hugging Face Hub.

mdl download --config models.yaml                # download all models in config
mdl download -r google/gemma-3-1b-it             # download a single model
mdl download -r org/model-a -r org/model-b       # download multiple by repo ID
mdl download --config models.yaml --dry-run      # preview without downloading
mdl download --clear-state                        # reset download tracking
Option Description
-r, --repo-id Repo ID to download (repeatable)
-c, --config PATH YAML config file (models.yaml format)
-n, --dry-run Preview what would be downloaded
--clear-state Clear download state and exit
--min-disk-space INT Minimum free disk space in GB (default: 10)
--delete-after Delete model from HF cache after download
-v, --verbose Enable debug logging

models.yaml format

google:
  - gemma-3-1b-it
  - gemma-3-4b-it

meta-llama:
  - Llama-3.2-1B
  - Llama-3.2-1B-Instruct

Each entry becomes the repo ID org/model when downloaded.


mdl convert

Convert a downloaded Hugging Face model to F16 GGUF format.

mdl convert -m /path/to/model-dir                           # output defaults to models/<org>/<model>/<model>.f16.gguf
mdl convert -m /path/to/model-dir -o custom/path/out.gguf   # explicit output path
Option Description
-m, --model-dir PATH Path to the downloaded HF model directory (required)
-o, --output PATH Output F16 GGUF path (default: models/<org>/<model>/<model>.f16.gguf)
--llama-cpp-dir PATH Override LLAMA_CPP_DIR env var
-v, --verbose Enable debug logging

mdl quantize

Quantize an F16 GGUF file to a smaller representation.

mdl quantize -i model.f16.gguf                              # output defaults to model.Q4_K_M.gguf alongside input
mdl quantize -i model.f16.gguf -o out.gguf -t Q5_K_M        # explicit output and type
mdl quantize -i model.f16.gguf --model-dir /path/to/hf-model # also generates Modelfile + config files + README
Option Description
-i, --input PATH Input F16 GGUF file (required)
-o, --output PATH Output quantized GGUF path (default: <input_dir>/<model>.<type>.gguf)
-t, --type TEXT Quantization type — e.g. Q4_K_M, Q5_K_M, Q8_0 (default: Q4_K_M)
--llama-cpp-dir PATH Override LLAMA_CPP_DIR env var
--model-dir PATH HF model directory — generates Ollama Modelfile, copies config files, and creates a MODELFILE_README.md next to the output
-v, --verbose Enable debug logging

mdl upload

Upload a file to MinIO / S3-compatible storage.

mdl upload -f model.Q4_K_M.gguf
mdl upload -f model.Q4_K_M.gguf -p models/gemma -b my-bucket
Option Description
-f, --file PATH Local file to upload (required)
-k, --s3-key TEXT Explicit S3 object key (defaults to filename)
-p, --s3-prefix TEXT Prefix (directory) in the bucket
-b, --bucket TEXT Override MINIO_BUCKET env var
-v, --verbose Enable debug logging

mdl pipeline

Run the full pipeline: download → convert → quantize → upload.

mdl pipeline                                     # uses pipeline.yaml by default
mdl pipeline -c pipeline.yaml --dry-run          # preview without executing
mdl pipeline --no-upload                         # skip the S3 upload step
mdl pipeline --force                             # reprocess completed models
mdl pipeline --keep-quantized                    # keep GGUF files locally after upload
mdl pipeline --clear-state                       # reset pipeline state and exit
Option Description
-c, --config PATH Pipeline config file (default: pipeline.yaml)
-n, --dry-run Preview actions without executing
--clear-state Clear pipeline state and exit
--force Reprocess already-completed models
--no-upload Skip S3 upload step
--keep-download Keep downloaded model files after processing
--keep-quantized Keep quantized GGUF files after upload

Note: --no-upload only skips the S3 upload — it does not keep files on disk. The pipeline works in a temporary directory that is deleted after each model. To retain the quantized GGUF and related files locally, pass --keep-quantized (e.g. mdl pipeline --no-upload --keep-quantized). | --min-disk-space INT | Minimum free disk space in GB (default: 10) | | -v, --verbose | Enable debug logging |

pipeline.yaml format

models:
  - repo_id: google/gemma-3-1b-it        # required
    # quantize: true                      # default: true
    # upload: true                        # default: true
    # quantization: Q4_K_M               # default: Q4_K_M
    # output_name: custom-name.gguf      # default: model.QTYPE.gguf
    # revision: main                     # pin a git revision / branch / tag

  - repo_id: meta-llama/Llama-3.2-1B
  - repo_id: microsoft/Phi-4-mini-instruct

Output paths

All output files are organised under models/<org>/<model_name>/:

  • Local (with --keep-quantized): models/google/gemma-3-1b-it/gemma-3-1b-it.Q4_K_M.gguf
  • S3: s3://<bucket>/models/google/gemma-3-1b-it/gemma-3-1b-it.Q4_K_M.gguf

Ollama Modelfile

The pipeline automatically generates an Ollama Modelfile alongside each quantized GGUF. It reads the model's actual config files — config.json, tokenizer_config.json, and generation_config.json — to derive everything dynamically:

  • FROM — path to the GGUF file
  • TEMPLATE — Ollama Go template, detected from the model's eos_token (not just model_type). This correctly handles fine-tunes that change the chat format (e.g. Dolphin-Mistral uses ChatML despite being model_type: mistral)
  • SYSTEM — default system prompt, extracted from the Jinja2 chat_template via regex
  • PARAMETERnum_ctx, temperature, top_p, top_k, repeat_penalty, and stop tokens — all from the model's own configs

The raw Jinja2 chat_template is also included as comments at the bottom of the Modelfile for cross-reference when editing.

Supported model families (via EOS token and model_type detection):

Format Models
ChatML (<|im_end|>) Qwen, Qwen2, Qwen3, Dolphin, Yi, InternLM2, DeepSeek-V2/V3/R1, Jamba
LLaMA-3 (<|eot_id|>) LLaMA-3, LLaMA-3.1, LLaMA-3.2, LLaMA-3.3
Gemma (<end_of_turn>) Gemma, Gemma 2, Gemma 3
Phi (<|end|>) Phi-3, Phi-3.5, Phi-4
Mistral (</s>) Mistral-7B, Mixtral
Command-R Cohere Command-R, Command-R+
Completion StarCoder2, Falcon

Unknown models fall back to a generic template with a warning.

Output files (alongside the GGUF):

File Purpose
Modelfile Ollama-ready model definition
MODELFILE_README.md Guide explaining each Modelfile section
config.json Model architecture reference (copied from HF)
tokenizer_config.json Chat template & tokens reference (copied from HF)
generation_config.json Generation params reference (copied from HF)
special_tokens_map.json Special tokens reference (copied from HF)

These files are:

  • Uploaded to S3 at models/<org>/<model>/
  • Saved locally when using --keep-quantized
  • Logged at DEBUG level when using --verbose

To use the generated Modelfile with Ollama:

cd models/google/gemma-3-1b-it/
ollama create gemma3-1b -f Modelfile
ollama run gemma3-1b

The pipeline also writes a URL registry to model_urls.json locally and mirrors it to s3://<bucket>/metadata/model_urls.json. Each entry includes a download_url and a curl download reference.


mdl bootstrap-llamacpp

Clone, build, and extract the required llama.cpp binaries. Requires git, cmake, and make.

mdl bootstrap-llamacpp

This fetches llama.cpp from GitHub, builds it with cmake + make, and copies the required binaries and headers to a llama.cpp-dist/ directory. If llama.cpp/ already exists, the clone step is skipped.

This is a prerequisite for mdl convert, mdl quantize, and mdl pipeline.


mdl --version

Print the installed version.

mdl --version

Environment Variables

All variables are set in .env (see .env.example).

Hugging Face

Variable Description Default
HF_TOKEN Auth token for private/gated models
HF_ENDPOINT Custom HF endpoint (mirror/proxy) https://huggingface.co
HF_HOME HF cache directory ~/.cache/huggingface/
HF_HUB_DOWNLOAD_TIMEOUT Download timeout in seconds 120
HF_HUB_ETAG_TIMEOUT ETag timeout in seconds 10
HF_HUB_ENABLE_HF_TRANSFER Enable fast transfers (requires hf_transfer) 0

MinIO / S3

Variable Description Default
MINIO_ENDPOINT S3 endpoint (host:port)
MINIO_ACCESS_KEY Access key
MINIO_SECRET_KEY Secret key
MINIO_BUCKET Target bucket models
MINIO_SECURE Use HTTPS true
MINIO_PUBLIC_URL Public base URL for downloads
MINIO_PRESIGN_DAYS Presigned URL expiry in days 7

llama.cpp

Variable Description Default
LLAMA_CPP_DIR Path to llama.cpp directory llama.cpp

How It Works

  1. Load environment — reads .env before any HF imports
  2. Parse config — loads YAML and builds the model list
  3. Validate — checks credentials, disk space, and config structure
  4. Process models — download, convert, quantize, and upload each model
  5. Track state — persists progress to .download_state.json / .pipeline_state.json
  6. Handle errors — logs failures per model and continues with the rest
  7. Summarise — prints totals for successful, failed, and skipped models

Resume is automatic. Completed models are skipped on re-run. Use --clear-state to start fresh or --force to reprocess.

Troubleshooting

Problem Solution
RepositoryNotFoundError / GatedRepoError Set HF_TOKEN in .env. For gated models, accept terms on the model page first.
Downloads timing out Increase HF_HUB_DOWNLOAD_TIMEOUT in .env
Disk space errors Set HF_HOME to a larger drive, or use --min-disk-space
Slow downloads Install hf_transfer and set HF_HUB_ENABLE_HF_TRANSFER=1
Re-downloading completed models Don't delete .download_state.json. Use --clear-state only intentionally.

Development

uv sync --all-extras          # install dev dependencies
uv run pytest                 # run tests
uv run pytest --cov=mdl       # run tests with coverage

Project Structure

src/mdl/
├── __init__.py               # package version
├── cli/
│   ├── __init__.py           # Click group & subcommand registration
│   ├── bootstrap.py          # mdl bootstrap-llamacpp
│   ├── convert.py            # mdl convert
│   ├── download.py           # mdl download
│   ├── pipeline.py           # mdl pipeline
│   ├── quantize.py           # mdl quantize
│   └── upload.py             # mdl upload
└── core/
    ├── config.py             # env loading & logging setup
    ├── downloader.py         # HF Hub download logic & state
    ├── modelfile.py          # Ollama Modelfile generator
    ├── quantizer.py          # llama.cpp convert & quantize
    ├── uploader.py           # MinIO / S3 upload client
    └── url_manager.py        # model URL registry

License

See LICENSE.

Contributing

Contributions welcome. Please follow the existing code style, add tests for new features, and verify with --dry-run before submitting.

About

tool for downloading models from hugginface and converting them into gguf format

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages