Skip to content

ternaus/imread_benchmark

Repository files navigation

Image Loading Benchmark

Overview

Benchmarks the speed of reading JPEG images and converting them to RGB numpy arrays across popular Python libraries. Targets machine learning training pipelines on macOS ARM64 (Apple M-series) using the ImageNet validation set.

Performance on Apple Silicon (M-series)

GitAds Sponsored

Sponsored by GitAds

Important Note on Image Conversion

All decoders output (H, W, 3) uint8 RGB numpy arrays for a fair comparison. Libraries that default to other formats (OpenCV → BGR, torchvision → CHW tensor, TensorFlow → EagerTensor) include a conversion step. Note that in real ML pipelines the conversion is often unnecessary.

Benchmark Modes

Memory mode (default): images are pre-loaded as bytes before the timed loop. This measures pure decode throughput with no disk I/O.

Disk mode: each decode call reads the file from disk. Includes I/O latency.

Dataset

ImageNet validation set — 50,000 JPEG images, ~500×400px.

# Download
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
mkdir -p imagenet/val
tar -xf ILSVRC2012_img_val.tar -C imagenet/val

System Requirements (macOS)

brew install jpeg-turbo   # required by simplejpeg and turbojpeg
brew install vips         # required by pyvips (NOT bundled in the pip wheel)

Installation

# Install uv if needed
pip install uv

# Install the package in editable mode with dev dependencies
uv sync --extra dev

Running the Benchmark

# Make executable (first time)
chmod +x run_benchmarks.sh

# Show help
./run_benchmarks.sh --help

# Run all libraries — memory mode, 2000 images, 20 timed runs
./run_benchmarks.sh /path/to/imagenet/val

# Custom settings
./run_benchmarks.sh /path/to/imagenet/val 2000 20 memory

# Single library
BENCHMARK_LIBRARY=opencv python imread_benchmark/benchmark_single.py \
    --data-dir /path/to/imagenet/val \
    --output-dir output \
    --mode memory

DataLoader Benchmark

Measures throughput inside a PyTorch DataLoader with varying worker counts — the most relevant metric for ML training pipelines.

BENCHMARK_LIBRARY=opencv python imread_benchmark/benchmark_dataloader.py \
    --data-dir /path/to/imagenet/val \
    --output-dir output \
    --workers 0 1 2 4 8

Results Structure

output/
└── darwin_Apple-M4-Max/
    ├── opencv_results.json
    ├── pillow_results.json
    ├── opencv_dataloader_results.json
    └── ...

Libraries Benchmarked

Direct libjpeg-turbo (fastest)

  • simplejpeg — CFFI binding; zero-copy decode from bytes
  • turbojpeg (PyTurboJPEG) — Python binding for libjpeg-turbo
  • jpeg4py — direct libjpeg-turbo binding (Linux only)
  • kornia-rs — Rust implementation using libjpeg-turbo
  • OpenCV (opencv-python-headless)

Comprehensive codec libraries

  • imagecodecs — uses libjpeg-turbo 3.x; prebuilt ARM64 wheels
  • pyvips — libvips bindings (bundled in wheels)

Standard libjpeg

  • Pillow
  • Pillow-SIMD (Linux x86-64 only)
  • scikit-image
  • imageio

ML framework components

  • torchvision
  • tensorflow

Performance Considerations

  • All benchmarks run single-threaded unless using the DataLoader benchmark
  • Memory mode is the recommended baseline — it isolates decode speed from storage
  • Results based on ImageNet JPEG images (~500×400px)

Recommendations

High-throughput ML training

  • Use simplejpeg, turbojpeg, or kornia-rs for maximum single-thread decode speed
  • Use the DataLoader benchmark to find the best num_workers for your CPU

Cross-platform

  • kornia-rs and opencv offer the most consistent cross-platform performance

Feature-rich applications

  • opencv remains the best choice when you need more than just JPEG decoding

Development

# Run tests
uv run pytest tests/ -v

# Run linters
uv run pre-commit run --all-files

See CONTRIBUTING.md for how to add a new decoder.

Citation

If you found this work useful, please cite:

@misc{iglovikov2025speed,
      title={Need for Speed: A Comprehensive Benchmark of JPEG Decoders in Python},
      author={Vladimir Iglovikov},
      year={2025},
      eprint={2501.13131},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      doi={10.48550/arXiv.2501.13131}
}

Sponsor this project

 

Contributors