Add support for opening structured dtypes as void for zarr driver#272
Merged
copybara-service[bot] merged 21 commits intogoogle:masterfrom Jan 22, 2026
Merged
Conversation
Collaborator
|
Ok, I imported this locally and am looking at a few things.
|
laramiel
reviewed
Jan 6, 2026
laramiel
reviewed
Jan 6, 2026
laramiel
reviewed
Jan 6, 2026
…ussion_r2663351949` and extend test coverage
…ussion_r2663353574` and extend test coverage
Contributor
Author
My bad, I don't know how I missed the compilation issue for the spec test. I've fixed the issue and resolved your comments. Thanks for taking a look and getting back to this so fast! |
laramiel
reviewed
Jan 6, 2026
laramiel
reviewed
Jan 6, 2026
…ges/BASE..a42b6f511375dc1bd402ad525519f57210546735#r2666111665` Enforce schema validation for one-of `field` and `open_as_void`
…ussion_r2666195572` Use synthesized open_as_void field
laramiel
reviewed
Jan 7, 2026
laramiel
reviewed
Jan 7, 2026
laramiel
reviewed
Jan 7, 2026
laramiel
reviewed
Jan 7, 2026
jbms
reviewed
Jan 7, 2026
Contributor
Author
|
Sorry for the delay responding to the latest round of feedback. I believe everything raised so far has been addressed. |
laramiel
reviewed
Jan 12, 2026
laramiel
reviewed
Jan 12, 2026
laramiel
reviewed
Jan 12, 2026
laramiel
reviewed
Jan 12, 2026
laramiel
reviewed
Jan 12, 2026
jbms
reviewed
Jan 13, 2026
tensorstore/driver/zarr/driver.cc
Outdated
| // since we're treating the data as raw bytes regardless of the actual dtype. | ||
| // Shape is allowed to differ (handled by base class for resizing). | ||
| // Other fields like compressor, order, chunks must still match. | ||
| if (existing_metadata.dtype.bytes_per_outer_element != |
Collaborator
There was a problem hiding this comment.
I think we could just rely on the normal validate but applied to the void metadata.
jbms
reviewed
Jan 13, 2026
jbms
reviewed
Jan 13, 2026
jbms
reviewed
Jan 13, 2026
jbms
reviewed
Jan 13, 2026
…684331789` and `https://github.com/google/tensorstore/pull/272#discussion_r2684406823` Guard cache with `absl::call_once` for thread safety
…683758437` Document and enforce `selected_field` and `open_as_void` exclusivity
…684312496` Validate schema against `open_as_void` metadata but `partial_metadata` against regular metadata
…684307471` Allow original metadata to cache pointer on first access
…684302699` Rely on normal metadata validation for void metadata
Collaborator
|
I think that this is getting pretty close. I have a few minor edits in my import which I can just add in. Getting jeremy to look over it. Thanks. Edits:
|
BrianMichell
added a commit
to BrianMichell/tensorstore
that referenced
this pull request
Jan 26, 2026
Matches the pattern from zarr v2 driver (PR google#272). When both "field" and "open_as_void" are specified in the spec, return an error since these options are mutually exclusive - field selects a specific field from a structured array, while open_as_void provides raw byte access to the entire structure.
BrianMichell
added a commit
to BrianMichell/tensorstore
that referenced
this pull request
Jan 26, 2026
The zarr3 URL syntax cannot represent field selection or void access mode. Following the pattern from zarr v2 driver (PR google#272), ToUrl() now returns an error when either of these options is specified instead of silently ignoring them.
BrianMichell
added a commit
to BrianMichell/tensorstore
that referenced
this pull request
Jan 26, 2026
…trip Following the pattern from zarr v2 driver (PR google#272), override GetBoundSpecData in ZarrDataCache to set spec.open_as_void from ChunkCacheImpl::open_as_void_. This ensures that when you open a store with open_as_void=true and then call spec(), the resulting spec correctly has open_as_void=true set. Without this fix, opening a store with open_as_void=true and then getting its spec would lose the open_as_void flag, causing incorrect behavior if the spec is used to re-open the store.
BrianMichell
added a commit
to BrianMichell/tensorstore
that referenced
this pull request
Jan 26, 2026
Add comprehensive tests for open_as_void functionality following the patterns from zarr v2 driver (PR google#272): Tests that PASS: - OpenAsVoidSimpleType: Verifies simple type arrays can be opened with open_as_void, gaining an extra dimension for bytes - OpenAsVoidSpecRoundtrip: Verifies open_as_void preserved in spec JSON - OpenAsVoidGetBoundSpecData: Verifies spec() on void store returns open_as_void=true (tests the GetBoundSpecData fix) - OpenAsVoidCannotUseWithField: Verifies mutual exclusivity validation - OpenAsVoidUrlNotSupported: Verifies ToUrl() rejects open_as_void - FieldSelectionUrlNotSupported: Verifies ToUrl() rejects selected_field Tests marked TODO (pending codec chain implementation): - OpenAsVoidStructuredType - OpenAsVoidWithCompression - OpenAsVoidReadWrite - OpenAsVoidWriteRoundtrip Also fixes BUILD file: adds :metadata dependency to :chunk_cache target to provide the dtype.h header that chunk_cache.h includes.
BrianMichell
added a commit
to BrianMichell/tensorstore
that referenced
this pull request
Jan 26, 2026
The codec chain is prepared for the original dtype and chunk shape (without the extra bytes dimension). For void access: DecodeChunk: - Strip the bytes dimension from grid's chunk_shape to get original shape - Decode using the original codec shape - Reinterpret the decoded bytes as [chunk_shape..., bytes_per_elem] EncodeChunk: - Input has shape [chunk_shape..., bytes_per_elem] of byte_t - Create a view with the original chunk shape and element_size - Encode using the original codec This follows the pattern from zarr v2 (PR google#272) where the void metadata has the chunk_layout computed to match encoded/decoded layouts.
BrianMichell
added a commit
to BrianMichell/tensorstore
that referenced
this pull request
Jan 26, 2026
For void access, the codec handling differs between: - Non-structured types: codec prepared for [chunk_shape] with original dtype Need to decode/encode then reinterpret bytes. - Structured types: codec already prepared for [chunk_shape, bytes_per_elem] with byte dtype. Just decode/encode directly. Add original_is_structured parameter to cache constructors to properly distinguish these cases in DecodeChunk and EncodeChunk. This follows the pattern from zarr v2 (PR google#272) where CreateVoidMetadata() creates a modified metadata for void access.
BrianMichell
added a commit
to BrianMichell/tensorstore
that referenced
this pull request
Jan 26, 2026
Apply changes based on feedback from google#272
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #264
Removes
bool open_as_void = falsedefault in C++ function declarations and definitions.Uses a derived DataCache implementation similar to strategy used in
neuroglancer_precomputeddriver forUnshardedDataCacheandShardedDataCache