Skip to content

Add LZ4 compression via nvCOMP and python-lz4 #1051

@brendancol

Description

@brendancol

Author of Proposal: @brendan

Reason or problem

nvCOMP ships batch LZ4 compress/decompress, but the geotiff package doesn't use it. LZ4 decompresses 2-4x faster than deflate at the cost of lower compression ratios. That tradeoff is worth it for Dask chunked reads where decompression speed matters more than file size.

GDAL uses TIFF tag 50004 for LZ4-compressed GeoTIFFs. We don't support it yet.

Proposal

Design:

  1. _compression.py: COMPRESSION_LZ4 = 50004, CPU decompress/compress via lz4.frame (from python-lz4), with the usual LZ4_AVAILABLE flag
  2. _gpu_decode.py: Wire nvcompBatchedLZ4DecompressAsync / nvcompBatchedLZ4CompressAsync into the existing nvCOMP ctypes code. Just another elif next to deflate and ZSTD.
  3. _writer.py: Add 'lz4' to _compression_tag()
  4. Hook into gpu_decode_tiles() and gpu_compress_tiles()

The nvCOMP batch API for LZ4 uses the same calling convention as deflate/ZSTD, so this is mostly copy-paste with different function names.

Usage:

write_geotiff(data, "fast.tif", compression="lz4")
da = read_geotiff("fast.tif")

Stakeholders and impacts

Users with large rasters who want fast reads over small files. Useful for Dask workflows where tiles get decompressed on every chunk access. Additive, nothing existing changes.

Drawbacks

  • Lower compression ratio than deflate/ZSTD
  • Tag 50004 is a GDAL extension, not baseline TIFF. Files won't open in every TIFF reader.
  • python-lz4 is another optional dependency

Alternatives

  • ZSTD at a low compression level is a decent middle ground
  • Uncompressed is the fastest read but wastes disk

Unresolved questions

  • LZ4 frame format vs block format (GDAL uses frame for tag 50004)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgpuCuPy / CUDA GPU support

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions