-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Caveat: I'm not a CF expert, so I would appreciate any corrections from folks who know the spec better.
CF defines semantics for a "grid mapping variable"
Please let me know if there's a better reference than this. From the linked text:
When the coordinate variables for a horizontal grid are not longitude and latitude, it is required that the true latitude and longitude coordinates be supplied via the coordinates attribute. If in addition it is desired to describe the mapping between the given coordinate variables and the true latitude and longitude coordinates, the attribute grid_mapping may be used to supply this description. This attribute is attached to data variables so that variables with different mappings may be present in a single file. The attribute takes a string value which is the name of another variable in the file that provides the description of the mapping via a collection of attached attributes. This variable is called a grid mapping variable and is of arbitrary type since it contains no data. Its purpose is to act as a container for the attributes that define the mapping. The one attribute that all grid mapping variables must have is grid_mapping_name which takes a string value that contains the mapping's name. The other attributes that define a specific mapping depend on the value of grid_mapping_name. The valid values of grid_mapping_name along with the attributes that provide specific map parameter values are described in Appendix F, Grid Mappings.
My naive understanding is that the grid mapping variable is really just a collection of attributes. It's only represented as variable because of limitations in the type system used by netcdf for attributes, namely the lack of a key: value data structure like JSON objects. For this reason CF says the data type of a grid mapping variable can be ignored, as it contains no data.
If we want to translate "grid mapping variable" semantics to GeoZarr, there are a few concerns:
- GeoZarr coordinate variables must be 1D, but in CF grid mapping variables are scalars (0D). Maybe that's not a hard CF requirement. We could totally use empty 1D arrays in GeoZarr here. Except for point number 2:
- Zarr uses JSON for attributes, which supports arbitrary key: value data structures (JSON objects). So, as far as I can tell, in Zarr there is never any need for a "no data, just attributes" array like the grid mapping variable. Instead, the grid mapping variable's attributes can be inlined exactly where that variable is declared.
So my recommendation for GeoZarr would be to describe how the CF Grid Mapping Variable semantics should be translated to Zarr, e.g.:
CF defines a special scalar variable called a "Grid Mapping Variable" which is referenced by name in the
"grid_mapping"
key of another variable in the same Dataset. The CF grid mapping variable contains no array data and is effectively just a collection of attributes. In Geozarr, this relationship is simplified: Instead of using a separate Zarr array to represent a grid mapping, all the attributes that would be associated with a CF grid mapping variable are defined directly in the attributes of the DataArray under the"grid_mapping"
key.
This would of course have examples and probably the language could be made more clear.
And I'm wondering how many things in CF take a different form when the attributes model has JSON's type system? That should be generally important for GeoZarr to explicitly declare. For example, in CF the "coordinates" field is a whitespace-separated list of strings. This only makes sense if the type system of the attributes does not support arrays of strings. Zarr's attributes type system does support arrays of strings, so IMO we should use this in GeoZarr.