Add support for structured dtypes to `zarr3` driver, open structs as void by BrianMichell · Pull Request #271 · google/tensorstore

BrianMichell · 2026-01-05T15:22:04Z

Supersedes #264

Resolves comments 1 and 2

…river

…t for raw bits dtype

Implement shim for `open_as_void` driver level flag

* Begin removing void field shim * Fully removed void string shim * Cleanup debug prints * Remove shimmed validation * Remove unnecessary comment * Prefer false over zero for ternary clarity

* Implement a more general and portable example set * Fix driver cache bug * Update example for template * Cleanup example * Remove testing examples from source

* Use the appropriate fill value for open_as_void structured data * Cleanup

laramiel · 2026-01-22T23:18:00Z

I'll try to get to this in about a week, before I look this one over, please double check that the prior PR works for you. Also look over this one and see if any of the suggestions from the other one applies.

Matches the pattern from zarr v2 driver (PR google#272). When both "field" and "open_as_void" are specified in the spec, return an error since these options are mutually exclusive - field selects a specific field from a structured array, while open_as_void provides raw byte access to the entire structure.

The zarr3 URL syntax cannot represent field selection or void access mode. Following the pattern from zarr v2 driver (PR google#272), ToUrl() now returns an error when either of these options is specified instead of silently ignoring them.

…trip Following the pattern from zarr v2 driver (PR google#272), override GetBoundSpecData in ZarrDataCache to set spec.open_as_void from ChunkCacheImpl::open_as_void_. This ensures that when you open a store with open_as_void=true and then call spec(), the resulting spec correctly has open_as_void=true set. Without this fix, opening a store with open_as_void=true and then getting its spec would lose the open_as_void flag, causing incorrect behavior if the spec is used to re-open the store.

Apply changes based on feedback from google#272

Add assertions in EncodeChunk and DecodeChunk to verify that arrays are C-contiguous before performing direct memcpy operations: - In EncodeChunk: verify component arrays are C-contiguous - In DecodeChunk: verify decoded byte arrays are C-contiguous These assertions validate assumptions about array layouts that the chunk cache relies on for correct operation. The chunk cache write path (AsyncWriteArray) allocates C-order arrays, and the codec chain produces C-contiguous decoded arrays. Also adds the necessary includes and BUILD dependencies for IsContiguousLayout and c_order.

Replace raw memcpy loops with CopyArray using strided ArrayViews for structured type encoding and decoding. This follows the standard TensorStore pattern (as used in zarr v2 with internal::EncodeArray) where array copies are done via IterateOverArrays which safely handles any source/destination strides. The key insight is creating an ArrayView with strides that represent the interleaved field positions within the struct layout: - For a field at byte_offset B within a struct of size S - The strides are [..., S] instead of [..., field_size] - This allows CopyArray to correctly interleave/deinterleave fields This approach: 1. Removes the need for contiguity assertions (CopyArray handles any layout) 2. Is consistent with zarr v2's use of internal::EncodeArray 3. Uses the standard IterateOverArrays iteration pattern The void access decode path retains its memcpy with assertion because it's a simple byte reinterpretation where both arrays are known to be C-contiguous (destination freshly allocated, source from codec chain).

Replace manual stride computation loops with ComputeStrides() from contiguous_layout.h. This is the standard TensorStore utility for computing C-order (or Fortran-order) byte strides given a shape and innermost element stride. The manual loop: Index stride = bytes_per_outer_element; for (DimensionIndex i = rank; i-- > 0;) { strides[i] = stride; stride *= shape[i]; } Is exactly equivalent to: ComputeStrides(c_order, bytes_per_outer_element, shape, strides);

Replace manual loops with standard library and TensorStore utilities: 1. DimensionSet::UpTo(rank) - Creates a DimensionSet with bits [0, rank) set to true. Replaces: DimensionSet s(false); for (i = 0; i < rank; ++i) s[i] = true; 2. std::fill_n for origins (all zeros) and std::copy_n for shape copy. This is more idiomatic and clearer than explicit index loops. These are standard patterns used throughout TensorStore for similar operations on dimension sets and shape vectors.

The sub-chunk cache in sharding mode uses a grid from the sharding codec state, which doesn't know about void access. This caused issues: 1. Shape mismatch: The grid's component shape was [4, 4] but decoded arrays had shape [4, 4, 4] (with bytes dimension) 2. Invalid key generation: The grid's chunk_shape affected cell indexing Fix by: - Add `grid_has_void_dimension_` flag to track whether the grid includes the bytes dimension (false for sub-chunk caches) - For sub-chunk caches with void access on non-structured types, create a modified grid with: - Component chunk_shape including bytes dimension [4, 4, 4] - Grid chunk_shape unchanged [4, 4] (for cell indexing) - Proper chunked_to_cell_dimensions mapping This enables void access to work correctly with sharding codecs.

The ZarrShardSubChunkCache template had duplicate member variables (open_as_void_, original_is_structured_, bytes_per_element_) that were already present in the base class ChunkCacheImpl (ZarrLeafChunkCache). Access these through ChunkCacheImpl:: prefix instead to follow DRY principle and maintain consistency with other TensorStore patterns.

Reviewed the code for potential inconsistencies and fixed some bugs

tensorstore/driver/zarr3/dtype.h

tensorstore/driver/zarr3/dtype.cc

tensorstore/driver/zarr3/dtype.h

tensorstore/driver/zarr3/dtype.cc

tensorstore/driver/zarr3/metadata.h

tensorstore/driver/zarr3/metadata.cc

tensorstore/driver/zarr3/chunk_cache.h

tensorstore/driver/zarr3/chunk_cache.cc

tensorstore/driver/zarr3/driver.cc

laramiel · 2026-02-03T07:50:13Z

FWIW I see a new assert failure in this PR:

[ RUN      ] StorageStatisticsTest.FullyLexicographicOrder
F0202 23:48:22.663513    9604 logging.cc:51] assert.h assertion failed at tensorstore/util/span.h:366 in span<element_type, dynamic_extent> tensorstore::span<const long>::subspan(ptrdiff_t, ptrdiff_t) const [T = const long, Extent = -1]: offset >= 0 && (count == dynamic_extent || (count >= 0 && offset + count <= size()))
*** Check failure stack trace: ***
...
    @     0x7fc7ec534374  __assert_fail
    @     0x7fca7366d7c3  tensorstore::span<>::subspan()
    @     0x7fca7366463b  tensorstore::internal_zarr3::(anonymous namespace)::DataCacheBase::FormatKey()
    @     0x7fca72931660  tensorstore::internal::GetChunkKeyRangesForRegularGridWithSemiLexicographicalKeys()::$_1::operator()()
    @     0x7fca72930eb0  absl::functional_internal::InvokeObject<>()
    @     0x7fca6b5f65fa  tensorstore::internal_grid_partition::(anonymous namespace)::GetGridCellRangesIterateHelper::IterateOverStridedSets()
    @     0x7fca6b5f5d57  tensorstore::internal_grid_partition::GetGridCellRanges()
    @     0x7fca72930c79  tensorstore::internal::GetChunkKeyRangesForRegularGridWithSemiLexicographicalKeys()
    @     0x7fca72d3c281  tensorstore::internal::GetStorageStatisticsForRegularGridWithSemiLexicographicalKeys()
    @     0x7fca73177d4c  tensorstore::internal_zarr3::GridStorageStatisticsChunkHandlerBase::Start()
    @     0x7fca7317737d  tensorstore::internal_zarr3::ZarrLeafChunkCache::GetStorageStatistics()

It's best to run all the tests as you're developing. I typically do bazelisk.py test -k //tensorstore/driver/zarr3/...

So far this is mostly build-based issues. I'll look at more of the structure later.

Resolves: google#271 (comment), google#271 (comment), google#271 (comment), google#271 (comment),

Resolves: google#271 (comment), google#271 (comment), google#271 (comment)

Resolves: google#271 (comment)

…ubspan does not exceed the grid size. This prevents potential out-of-bounds access when generating keys. Resolves: google#271 (comment)

tensorstore/driver/zarr3/dtype.cc

laramiel · 2026-02-03T21:28:14Z

tensorstore/driver/zarr3/dtype.cc

+    return base_dtype;
+  }
+  return absl::InvalidArgumentError(
+      tensorstore::StrCat("Data type not supported: ", dtype));


Still quite a few tensorstore::StrCat in error messages. Please check all the files for these.

BrianMichell added 24 commits November 24, 2025 20:55

Begin examining how to best add structured array support to Zarr v3 d…

44b4473

…river

Merge branch 'google:master' into v3_structs

e6df164

Updates to have proper reads

187f424

Local testing and examples

c2e73cd

Begin adding support for opening struct arrays as void and add suppor…

9e8ed94

…t for raw bits dtype

Fix failing tests

44c765e

Resolve issues with opening struct as void

547642d

Remove debug print

2a4c3d8

Add field for open as void

b0abb94

Add a shim for new open_as_void flag open option

fff0a5b

Revert some formatting changes

b6c24f9

revert gitignore changes

488b160

Merge pull request #1 from BrianMichell/v3_structs_compatibility

537d309

Implement shim for `open_as_void` driver level flag

V3 structs remove shim (#2)

54941a0

* Begin removing void field shim * Fully removed void string shim * Cleanup debug prints * Remove shimmed validation * Remove unnecessary comment * Prefer false over zero for ternary clarity

Fix structured fill value population

c9f58f9

V3 examples merge (#3)

7655cfd

* Implement a more general and portable example set * Fix driver cache bug * Update example for template * Cleanup example * Remove testing examples from source

Remove vestigial example build

8c4c4ca

V3 structs fix fills (#4)

4b590f8

* Use the appropriate fill value for open_as_void structured data * Cleanup

Merge branch 'google:master' into v3_structs

7691c83

Add new options to schema

c0082a0

Fix copyright header date

9a46c82

Cleanup (#5)

b9b5e41

Merge zarr3 structured dtype support

d5f6201

Remove default values

31e55ec

BrianMichell mentioned this pull request Jan 5, 2026

Add support for structured dtypes to zarr3 driver, open zarr 2 and 3 structs as void #264

Closed

BrianMichell added 4 commits January 23, 2026 15:05

Merge branch 'google:master' into v3_structs_and_void

aad0ee0

BrianMichell added 8 commits January 26, 2026 15:17

Merge pull request #7 from BrianMichell/v3_open_as_void_validation

e197bc6

Apply changes based on feedback from google#272

Merge pull request #8 from BrianMichell/v3_review

d139d87

Reviewed the code for potential inconsistencies and fixed some bugs

laramiel reviewed Feb 3, 2026

View reviewed changes

tensorstore/driver/zarr3/dtype.h Show resolved Hide resolved

laramiel reviewed Feb 3, 2026

View reviewed changes

tensorstore/driver/zarr3/dtype.cc Show resolved Hide resolved

laramiel reviewed Feb 3, 2026

View reviewed changes

tensorstore/driver/zarr3/dtype.h Outdated Show resolved Hide resolved

laramiel reviewed Feb 3, 2026

View reviewed changes

tensorstore/driver/zarr3/dtype.h Outdated Show resolved Hide resolved

laramiel reviewed Feb 3, 2026

View reviewed changes

tensorstore/driver/zarr3/dtype.cc Outdated Show resolved Hide resolved

laramiel reviewed Feb 3, 2026

View reviewed changes

tensorstore/driver/zarr3/metadata.h Show resolved Hide resolved

laramiel reviewed Feb 3, 2026

View reviewed changes

tensorstore/driver/zarr3/metadata.cc Show resolved Hide resolved

laramiel reviewed Feb 3, 2026

View reviewed changes

tensorstore/driver/zarr3/chunk_cache.h Show resolved Hide resolved

laramiel reviewed Feb 3, 2026

View reviewed changes

tensorstore/driver/zarr3/chunk_cache.cc Show resolved Hide resolved

laramiel reviewed Feb 3, 2026

View reviewed changes

tensorstore/driver/zarr3/chunk_cache.cc Outdated Show resolved Hide resolved

laramiel reviewed Feb 3, 2026

View reviewed changes

tensorstore/driver/zarr3/driver.cc Outdated Show resolved Hide resolved

BrianMichell added 7 commits February 3, 2026 17:16

Fix includes.

a20a686

Resolves: google#271 (comment), google#271 (comment), google#271 (comment), google#271 (comment),

Fix indentation

e9ac828

Fix imports.

858e40d

Resolves: google#271 (comment), google#271 (comment), google#271 (comment)

friend inline the equality and inequality operator overloads.

be1ab7c

Resolves: google#271 (comment)

Prefer absl::StrFormat over tensorstore::StrCat

24e16c2

Add return type annotation to lambdas.

f9c6750

Resolves: google#271 (comment)

Update key generation to handle grid indices safely by ensuring the s…

6a773e2

…ubspan does not exceed the grid size. This prevents potential out-of-bounds access when generating keys. Resolves: google#271 (comment)

laramiel reviewed Feb 3, 2026

View reviewed changes

tensorstore/driver/zarr3/dtype.cc Show resolved Hide resolved

laramiel reviewed Feb 3, 2026

View reviewed changes

tensorstore/driver/zarr3/dtype.cc Show resolved Hide resolved

laramiel reviewed Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for structured dtypes to `zarr3` driver, open structs as void#271

Add support for structured dtypes to `zarr3` driver, open structs as void#271
BrianMichell wants to merge 63 commits intogoogle:masterfrom
BrianMichell:v3_structs_and_void

BrianMichell commented Jan 5, 2026

Uh oh!

laramiel commented Jan 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

laramiel commented Feb 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

laramiel Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BrianMichell commented Jan 5, 2026

Uh oh!

laramiel commented Jan 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

laramiel commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

laramiel Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

laramiel commented Feb 3, 2026 •

edited

Loading