Skip to content

Zarr store URL clarification #3

@emmanuelmathot

Description

@emmanuelmathot

This is a follow-up to a practical interoperability problem we encountered in the EOPF Explorer project (EOPF-Explorer/data-model#124), raised by @m-mohr when building a Zarr client in OpenLayers.

Reframing the problem: don't reverse-parse, open directly

The original question was "how do I split https://example.stac/path/something.zarr/my/group into a store root and a group path?". After discussion with @d-v-b and the EOPF team, we believe this is actually the wrong question. The right answer is: don't split at all.

The Zarr v3 specification defines a node by whether {path}/zarr.json exists, which means any Zarr group path is itself a valid store. A client like xarray, zarr-python, GDAL or OpenLayers should simply open asset href directly, treating it as its own store entry point.

Three open questions for the extension

1. Should zarr:consolidated: true be required on asset-level groups?

Consolidated metadata embeds the sub-hierarchy structure inside the group's own zarr.json, so a client opening the asset href gets everything it needs in a single request. Without it, a client pointing to a sub-group has no way to discover its contents without upward traversal.

Question: should the extension require zarr:consolidated: true for any asset with zarr:node_type: group, or keep it informational?

2. Should rel=store be a MUST?

The STAC Zarr Best Practices already recommend a rel=store link to the top-level hierarchy root. Single-group consumers can ignore it (they just open the asset href), but multi-group consumers have no other machine-readable way to recover the store root without URL parsing.

If we make it a MUST, it also enables a clean multi-group pattern: sibling assets sharing the same rel=store link can be treated by clients as co-resident in the same store, and opened together with shared consolidated metadata. This maps directly to the openlayers#17409 API shape: url = rel=store target, bands[].group = per-asset node path.

Question: is the interop gain (no URL parsing, multi-group pattern) worth the producer burden of requiring the link on every Zarr asset?

3. Should we add a naming hygiene SHOULD?

A lower-stakes question: should the extension recommend that .zarr appears at most once in a URL (at the store root) and not in group or array names? It doesn't affect how clients open URLs, but prevents human confusion and keeps any fallback URL inspection unambiguous.

Happy to open a PR for the README and schema changes if the direction is agreed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions