Add non-power-of-2 shapes for Morton coding to benchmarks by mkitti · Pull Request #3717 · zarr-developers/zarr-python

mkitti · 2026-02-20T21:30:17Z

tests: Add non-power-of-2 shard shapes to
benchmarks
tests: Add near-miss power-of-2 shape (33

[Description of PR]

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.md
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

Add (30,30,30) to large_morton_shards and (10,10,10), (20,20,20), (30,30,30) to morton_iter_shapes to benchmark the scalar fallback path for non-power-of-2 shapes, which are not fully covered by the vectorized hypercube path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Documents the performance penalty when a shard shape is just above a power-of-2 boundary, causing n_z to jump from 32,768 to 262,144. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mkitti · 2026-02-20T21:43:35Z

Benchmark Results

These benchmarks were run on this branch (which includes the vectorized get_chunk_slice from #3713) to characterize Morton order performance across power-of-2 and non-power-of-2 shard shapes.

`test_morton_order_iter` — pure Morton computation, no I/O, LRU cache cleared each round

Shape	Elements	Type	Mean time
`(8,8,8)`	512	power-of-2	0.45 ms
`(16,16,16)`	4,096	power-of-2	3.6 ms
`(32,32,32)`	32,768	power-of-2	28.9 ms
`(10,10,10)`	1,000	non-power-of-2	9.6 ms
`(20,20,20)`	8,000	non-power-of-2	88.2 ms
`(30,30,30)`	27,000	non-power-of-2	125.6 ms
`(33,33,33)`	35,937	near-miss (+1 above 32³)	767 ms

The near-miss penalty is striking: (33,33,33) has only ~10% more elements than (32,32,32) but takes 27× longer. This is because the current floor-hypercube approach must scalar-decode many Morton codes beyond the guaranteed in-bounds region.

`test_sharded_morton_write_single_chunk` — write 1 chunk to a large shard, cache cleared each round

Shape	Chunks/shard	Mean time
`(32,32,32)`	32,768	35.7 ms
`(30,30,30)`	27,000	127.5 ms
`(33,33,33)`	35,937	767.8 ms

`test_sharded_morton_single_chunk` — read 1 chunk from a large shard (cached after first access)

Shape	Mean time
`(32,32,32)`	0.73 ms
`(30,30,30)`	0.69 ms
`(33,33,33)`	0.71 ms

Reads are fast across all shapes once the Morton order cache is warm (the first call pays the penalty, subsequent reads are cached).

Interpretation

The benchmarks confirm that non-power-of-2 shard shapes carry a significant Morton computation penalty under the current implementation, with near-miss shapes (like (33,33,33)) being especially slow. These benchmarks provide a baseline to measure improvements from follow-on optimization work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mkitti and others added 2 commits February 20, 2026 16:27

tests: Add near-miss power-of-2 shape (33,33,33) to benchmarks

1dfd71d

Documents the performance penalty when a shard shape is just above a power-of-2 boundary, causing n_z to jump from 32,768 to 262,144. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 20, 2026

mkitti and others added 2 commits February 20, 2026 16:48

style: Apply ruff format to benchmark file

403c50b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

changes: Add changelog entry for PR zarr-developers#3717

ffa3065

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Feb 21, 2026

mkitti mentioned this pull request Feb 21, 2026

perf: Fix near-miss penalty in _morton_order with hybrid ceiling+argsort strategy #3718

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Add non-power-of-2 shapes for Morton coding to benchmarks#3717

Add non-power-of-2 shapes for Morton coding to benchmarks#3717
mkitti wants to merge 4 commits intozarr-developers:mainfrom
mkitti:mkitti-morton-benchmarks

mkitti commented Feb 20, 2026

Uh oh!

mkitti commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Comments

Conversation

mkitti commented Feb 20, 2026

Uh oh!

mkitti commented Feb 20, 2026

Benchmark Results

test_morton_order_iter — pure Morton computation, no I/O, LRU cache cleared each round

test_sharded_morton_write_single_chunk — write 1 chunk to a large shard, cache cleared each round

test_sharded_morton_single_chunk — read 1 chunk from a large shard (cached after first access)

Interpretation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`test_morton_order_iter` — pure Morton computation, no I/O, LRU cache cleared each round

`test_sharded_morton_write_single_chunk` — write 1 chunk to a large shard, cache cleared each round

`test_sharded_morton_single_chunk` — read 1 chunk from a large shard (cached after first access)