Skip to content

Comments

Add non-power-of-2 shapes for Morton coding to benchmarks#3717

Open
mkitti wants to merge 4 commits intozarr-developers:mainfrom
mkitti:mkitti-morton-benchmarks
Open

Add non-power-of-2 shapes for Morton coding to benchmarks#3717
mkitti wants to merge 4 commits intozarr-developers:mainfrom
mkitti:mkitti-morton-benchmarks

Conversation

@mkitti
Copy link
Contributor

@mkitti mkitti commented Feb 20, 2026

  • tests: Add non-power-of-2 shard shapes to
    benchmarks
  • tests: Add near-miss power-of-2 shape (33

[Description of PR]

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

mkitti and others added 2 commits February 20, 2026 16:27
Add (30,30,30) to large_morton_shards and (10,10,10), (20,20,20),
(30,30,30) to morton_iter_shapes to benchmark the scalar fallback path
for non-power-of-2 shapes, which are not fully covered by the vectorized
hypercube path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the performance penalty when a shard shape is just above a
power-of-2 boundary, causing n_z to jump from 32,768 to 262,144.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 20, 2026
@mkitti
Copy link
Contributor Author

mkitti commented Feb 20, 2026

Benchmark Results

These benchmarks were run on this branch (which includes the vectorized get_chunk_slice from #3713) to characterize Morton order performance across power-of-2 and non-power-of-2 shard shapes.

test_morton_order_iter — pure Morton computation, no I/O, LRU cache cleared each round

Shape Elements Type Mean time
(8,8,8) 512 power-of-2 0.45 ms
(16,16,16) 4,096 power-of-2 3.6 ms
(32,32,32) 32,768 power-of-2 28.9 ms
(10,10,10) 1,000 non-power-of-2 9.6 ms
(20,20,20) 8,000 non-power-of-2 88.2 ms
(30,30,30) 27,000 non-power-of-2 125.6 ms
(33,33,33) 35,937 near-miss (+1 above 32³) 767 ms

The near-miss penalty is striking: (33,33,33) has only ~10% more elements than (32,32,32) but takes 27× longer. This is because the current floor-hypercube approach must scalar-decode many Morton codes beyond the guaranteed in-bounds region.

test_sharded_morton_write_single_chunk — write 1 chunk to a large shard, cache cleared each round

Shape Chunks/shard Mean time
(32,32,32) 32,768 35.7 ms
(30,30,30) 27,000 127.5 ms
(33,33,33) 35,937 767.8 ms

test_sharded_morton_single_chunk — read 1 chunk from a large shard (cached after first access)

Shape Mean time
(32,32,32) 0.73 ms
(30,30,30) 0.69 ms
(33,33,33) 0.71 ms

Reads are fast across all shapes once the Morton order cache is warm (the first call pays the penalty, subsequent reads are cached).

Interpretation

The benchmarks confirm that non-power-of-2 shard shapes carry a significant Morton computation penalty under the current implementation, with near-miss shapes (like (33,33,33)) being especially slow. These benchmarks provide a baseline to measure improvements from follow-on optimization work.

mkitti and others added 2 commits February 20, 2026 16:48
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant