Skip to content

Zarr v3 Struct Support & Tensorstore #241

@tasansal

Description

@tasansal

Hi @jbms @laramiel

Zarr-Python version 3.1.0 just got released and the v3 arrays now support flexible datatypes. One (not yet marked stable) implementation is the numpy structs as if they were in zarr v2. Below example shows a Zarr that's created with Xarray that holds 2 variables (one struct, one normal) as zarr v3 and sharding.

My understanding is the V3 driver in tensorstore doesn't support numpy structs either. However, I believe it should be trivial because the binaries didn't change between zarr v2/v3 but only metadata definition of the same struct changed. Which means we can make v3 driver parse the zarr metadata and use the same logic to read the structured fields as in v2 driver?

What are your recommendations for implementation?

import numpy as np
import xarray as xr

dtype = np.dtype(
    {
        "names": ["foo", "bar"],
        "formats": ["int32", "int64"],
    }
)

encoding = {
    "headers": {"chunks": (128, 128)},
    "seismic": {"chunks": (16, 16, 16), "shards": (128, 128, 128)}
}
seis = xr.DataArray(name="seismic", dims=["inline", "crossline", "depth"], data=np.zeros((512, 512, 512), dtype="float32"))
hdr = xr.DataArray(name="headers", dims=["inline", "crossline"], data=np.zeros((512, 512), dtype=dtype))

ds = xr.Dataset({"seismic": seis, "headers": hdr})
ds.to_zarr("tmp", mode="w", zarr_format=3, encoding=encoding)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions