Previously, additional fields in metadata were allowed as long as they were JSON objects with a must_understand: false key: value pair. Zarr-python's consolidated metadata implementation complied with this requirement (see script below).
The recent redefinition of extra fields in metadata documents added the requirement that such extra fields have a name key which is a string. zarr-python's consolidated metadata does not contain a name key, and so it is out of spec.
As consolidated metadata is used heavily by xarray users, this is a very high-impact change. The recent spec refactor has thus made many (most?) zarr v3 xarray datasets technically out of spec.
# /// script
# dependencies = [
# "zarr==3.1.0",
# ]
# ///
import zarr
from pprint import pprint
import json
store = {}
zarr.create_group(store)
consolidated = zarr.consolidate_metadata(store)
pprint(json.loads(store["zarr.json"].to_bytes()))
"""
{'attributes': {},
'consolidated_metadata': {'kind': 'inline',
'metadata': {},
'must_understand': False},
'node_type': 'group',
'zarr_format': 3}
"""
I think we should treat this as a regression in the spec. A fix could be:
- clarify that readers may ignore any additional field that is a JSON object with a
must_understand: false key value pair, no matter what other keys that object has.
- remove the requirement that top-level extra fields in array / group metadata objects have a
name field if they are JSON objects.
Without these two changes, or changes that achieve the same effect, a large volume of zarr data is out of spec, and we need to fix that.
Previously, additional fields in metadata were allowed as long as they were JSON objects with a
must_understand: falsekey: value pair. Zarr-python's consolidated metadata implementation complied with this requirement (see script below).The recent redefinition of extra fields in metadata documents added the requirement that such extra fields have a
namekey which is a string. zarr-python's consolidated metadata does not contain anamekey, and so it is out of spec.As consolidated metadata is used heavily by xarray users, this is a very high-impact change. The recent spec refactor has thus made many (most?) zarr v3 xarray datasets technically out of spec.
I think we should treat this as a regression in the spec. A fix could be:
must_understand: falsekey value pair, no matter what other keys that object has.namefield if they are JSON objects.Without these two changes, or changes that achieve the same effect, a large volume of zarr data is out of spec, and we need to fix that.