Benchmarks the speed of reading JPEG images and converting them to RGB numpy arrays across popular Python libraries. Targets machine learning training pipelines on macOS ARM64 (Apple M-series) using the ImageNet validation set.
| Performance on Apple Silicon (M-series) |
All decoders output (H, W, 3) uint8 RGB numpy arrays for a fair comparison. Libraries that default to other formats (OpenCV → BGR, torchvision → CHW tensor, TensorFlow → EagerTensor) include a conversion step. Note that in real ML pipelines the conversion is often unnecessary.
Memory mode (default): images are pre-loaded as bytes before the timed loop. This measures pure decode throughput with no disk I/O.
Disk mode: each decode call reads the file from disk. Includes I/O latency.
ImageNet validation set — 50,000 JPEG images, ~500×400px.
# Download
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
mkdir -p imagenet/val
tar -xf ILSVRC2012_img_val.tar -C imagenet/valbrew install jpeg-turbo # required by simplejpeg and turbojpeg
brew install vips # required by pyvips (NOT bundled in the pip wheel)# Install uv if needed
pip install uv
# Install the package in editable mode with dev dependencies
uv sync --extra dev# Make executable (first time)
chmod +x run_benchmarks.sh
# Show help
./run_benchmarks.sh --help
# Run all libraries — memory mode, 2000 images, 20 timed runs
./run_benchmarks.sh /path/to/imagenet/val
# Custom settings
./run_benchmarks.sh /path/to/imagenet/val 2000 20 memory
# Single library
BENCHMARK_LIBRARY=opencv python imread_benchmark/benchmark_single.py \
--data-dir /path/to/imagenet/val \
--output-dir output \
--mode memoryMeasures throughput inside a PyTorch DataLoader with varying worker counts — the most relevant metric for ML training pipelines.
BENCHMARK_LIBRARY=opencv python imread_benchmark/benchmark_dataloader.py \
--data-dir /path/to/imagenet/val \
--output-dir output \
--workers 0 1 2 4 8output/
└── darwin_Apple-M4-Max/
├── opencv_results.json
├── pillow_results.json
├── opencv_dataloader_results.json
└── ...
- simplejpeg — CFFI binding; zero-copy decode from bytes
- turbojpeg (PyTurboJPEG) — Python binding for libjpeg-turbo
- jpeg4py — direct libjpeg-turbo binding (Linux only)
- kornia-rs — Rust implementation using libjpeg-turbo
- OpenCV (opencv-python-headless)
- imagecodecs — uses libjpeg-turbo 3.x; prebuilt ARM64 wheels
- pyvips — libvips bindings (bundled in wheels)
- Pillow
- Pillow-SIMD (Linux x86-64 only)
- scikit-image
- imageio
- torchvision
- tensorflow
- All benchmarks run single-threaded unless using the DataLoader benchmark
- Memory mode is the recommended baseline — it isolates decode speed from storage
- Results based on ImageNet JPEG images (~500×400px)
- Use
simplejpeg,turbojpeg, orkornia-rsfor maximum single-thread decode speed - Use the DataLoader benchmark to find the best
num_workersfor your CPU
kornia-rsandopencvoffer the most consistent cross-platform performance
opencvremains the best choice when you need more than just JPEG decoding
# Run tests
uv run pytest tests/ -v
# Run linters
uv run pre-commit run --all-filesSee CONTRIBUTING.md for how to add a new decoder.
If you found this work useful, please cite:
@misc{iglovikov2025speed,
title={Need for Speed: A Comprehensive Benchmark of JPEG Decoders in Python},
author={Vladimir Iglovikov},
year={2025},
eprint={2501.13131},
archivePrefix={arXiv},
primaryClass={eess.IV},
doi={10.48550/arXiv.2501.13131}
}