Open
Description
Kerchunk doesn't properly decode the JSON for zarr array-level attributes, instead leaving dictionaries as long strings. For example:
# create example netCDF4 file
xr.tutorial.open_dataset('air_temperature').to_netcdf('air.nc')
kerchunk.backends.SingleHdf5ToZarr('air.nc', inline_threshold=300).translate()
{'version': 1,
'refs': {'.zgroup': '{"zarr_format":2}',
'.zattrs': '{"Conventions":"COARDS","description":"Data is from NMC initialized reanalysis\\n(4x\\/day). These are the 0.9950 sigma level values.","platform":"Model","references":"http:\\/\\/[www.esrl.noaa.gov\\/psd\\/data\\/gridded\\/data.ncep.reanalysis.html](https://www.esrl.noaa.gov///psd///data///gridded///data.ncep.reanalysis.html)","title":"4x daily NMC reanalysis (1948)"}',
'air/.zarray': '{"chunks":[2920,25,53],"compressor":null,"dtype":"<i2","fill_value":null,"filters":null,"order":"C","shape":[2920,25,53],"zarr_format":2}',
'air/.zattrs': '{"GRIB_id":11,"GRIB_name":"TMP","_ARRAY_DIMENSIONS":["time","lat","lon"],"actual_range":[185.16000366210938,322.1000061035156],"dataset":"NMC Reanalysis","level_desc":"Surface","long_name":"4xDaily Air temperature at sigma level 995","parent_stat":"Other","precision":2,"scale_factor":0.01,"statistic":"Individual Obs","units":"degK","var_desc":"Air temperature"}',
'air/0.0.0': ['air.nc', 15419, 7738000],
'lat/.zarray': '{"chunks":[25],"compressor":null,"dtype":"<f4","fill_value":"NaN","filters":null,"order":"C","shape":[25],"zarr_format":2}',
'lat/.zattrs': '{"_ARRAY_DIMENSIONS":["lat"],"axis":"Y","long_name":"Latitude","standard_name":"latitude","units":"degrees_north"}',
'lat/0': ['air.nc', 5179, 100],
'lon/.zarray': '{"chunks":[53],"compressor":null,"dtype":"<f4","fill_value":"NaN","filters":null,"order":"C","shape":[53],"zarr_format":2}',
'lon/.zattrs': '{"_ARRAY_DIMENSIONS":["lon"],"axis":"X","long_name":"Longitude","standard_name":"longitude","units":"degrees_east"}',
'lon/0': ['air.nc', 5279, 212],
'time/.zarray': '{"chunks":[2920],"compressor":null,"dtype":"<f4","fill_value":"NaN","filters":null,"order":"C","shape":[2920],"zarr_format":2}',
'time/.zattrs': '{"_ARRAY_DIMENSIONS":["time"],"calendar":"standard","long_name":"Time","standard_name":"time","units":"hours since 1800-01-01"}',
'time/0': ['air.nc', 7757515, 11680]}}
Notice that this is only partially decoded - the top two levels are nested python dictionaries, but below that the various zarr attributes are stored as long strings, e.g:
'{"chunks":[2920,25,53],"compressor":null,"dtype":"<i2","fill_value":null,"filters":null,"order":"C","shape":[2920,25,53],"zarr_format":2}'
This seems silly, why not just decode the whole thing properly at the beginning so you can always treat it like a nested python dictionary? (Or even better use a dedicated abstraction like suggested in #375)
Metadata
Metadata
Assignees
Labels
No labels