zarr-python's consolidated metadata implementation violates the spec

[Previously](https://github.com/zarr-developers/zarr-specs/blob/350d87e6b55b1ab3440308b9533d28d33892d72e/docs/v3/core/v3.0.rst?plain=1#L688-L692), additional fields in metadata were allowed as long as they were JSON objects with a `must_understand: false` key: value pair. Zarr-python's consolidated metadata implementation complied with this requirement (see script below). 

The recent redefinition of extra fields in metadata documents added the [requirement](https://github.com/zarr-developers/zarr-specs/blob/dc3e95ed36060d9533361364ab7f54fe3e53f82b/docs/v3/core/index.rst?plain=1#L1528-L1531) that such extra fields have a `name` key which is a string. zarr-python's consolidated metadata does not contain a `name` key, and so it is out of spec.

As consolidated metadata is used heavily by xarray users, this is a very high-impact change. The recent spec refactor has thus made many  (most?) zarr v3 xarray datasets technically out of spec.

```python
# /// script
# dependencies = [
#   "zarr==3.1.0",
# ]
# ///

import zarr
from pprint import pprint
import json

store = {}
zarr.create_group(store)
consolidated = zarr.consolidate_metadata(store)
pprint(json.loads(store["zarr.json"].to_bytes()))
"""
{'attributes': {},
 'consolidated_metadata': {'kind': 'inline',
                           'metadata': {},
                           'must_understand': False},
 'node_type': 'group',
 'zarr_format': 3}
"""
``` 

I think we should treat this as a regression in the spec. A fix could be:
- clarify that readers may ignore _any_ additional field that is a JSON object with a `must_understand: false` key value pair, no matter what other keys that object has.
- remove the requirement that top-level extra fields in array / group metadata objects have a `name` field if they are JSON objects. 

Without these two changes, or changes that achieve the same effect, a large volume of zarr data is out of spec, and we need to fix that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zarr-python's consolidated metadata implementation violates the spec #371

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

zarr-python's consolidated metadata implementation violates the spec #371

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions