logically separate codec metadata from codec execution

I'm working on adding a feature to zarr-python that will allow us to declare the codec classes we want to use at array-access time. Meaning something like this: `zarr.open_array(..., config={"codec_class_map": {**default_codecs, "blosc": MyBloscCodec})`. This will make it MUCH easier to play around with different codecs. 

Here's a sketch of how this probably should work:

```
open_array(..., config=config_that_declares_codec_classes)
    # 1. get array metadata
    # 2. create Array class
    #   2.a create chunk encoding machinery based on array metadata
    #     2.a.1 resolve codec metadata into actual codecs that do encoding / decoding.
```

In this design, the codec classes that actually encode and decode chunks are only introduced when we create the chunk encoding / decode machinery. But in the current codebase, we create functional codec classes that do encoding / decoding when we create our model of array metadata documents (`ArrayV3Metadata` and `ArrayV2Metadata`), So the flow today looks like this:

```
open_array(..., config=config_that_declares_codec_classes)
    # 1. get array metadata
    #   1.a resolve codec metadata into actual codecs that do encoding / decoding.
    # 2. create Array class
    #   2.a create chunk encoding machinery based on array metadata
```

So the `Array` class doesn't own the codec classes that do encoding / decoding -- that's owned by the array metadata instead. IMO this is not correct. The array metadata classes should be narrowly scoped to representing array metadata, and the array class should be scoped to all the runtime stuff necessary for materializing the chunks made accessible via an array metadata document and a storage backend.

I'd like to figure out how we can move these chunk encoding / decoding classes off the array metadata documents. Minimally, for each codec we might define lightweight `CodecMetadata` classes that only exist to facilitate array creation + basic invariant checking. We could even have a totally generic `CodecMetadata` class that doesn't know anything about the set of codec implementations -- this would allow us to model array metadata documents where the chunks cant be decoded (because we don't have the right codec implementations), but the attributes can be read / written. BTW this is yet another example of why the v3 spec should _structurally_ define the separate codec types, instead of putting them all in flat array.

Open to feedback if people think this direction is not a good idea. We might want to consider doing the same thing for the `ZDType` classes as well. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

logically separate codec metadata from codec execution #3884

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

logically separate codec metadata from codec execution #3884

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions