JSON not properly decoded by backends

Kerchunk doesn't properly decode the JSON for zarr array-level attributes, instead leaving dictionaries as long strings. For example:

```python
# create example netCDF4 file
xr.tutorial.open_dataset('air_temperature').to_netcdf('air.nc')

kerchunk.backends.SingleHdf5ToZarr('air.nc', inline_threshold=300).translate()
```
```python
{'version': 1,
 'refs': {'.zgroup': '{"zarr_format":2}',
  '.zattrs': '{"Conventions":"COARDS","description":"Data is from NMC initialized reanalysis\\n(4x\\/day).  These are the 0.9950 sigma level values.","platform":"Model","references":"http:\\/\\/[www.esrl.noaa.gov\\/psd\\/data\\/gridded\\/data.ncep.reanalysis.html](https://www.esrl.noaa.gov///psd///data///gridded///data.ncep.reanalysis.html)","title":"4x daily NMC reanalysis (1948)"}',
  'air/.zarray': '{"chunks":[2920,25,53],"compressor":null,"dtype":"<i2","fill_value":null,"filters":null,"order":"C","shape":[2920,25,53],"zarr_format":2}',
  'air/.zattrs': '{"GRIB_id":11,"GRIB_name":"TMP","_ARRAY_DIMENSIONS":["time","lat","lon"],"actual_range":[185.16000366210938,322.1000061035156],"dataset":"NMC Reanalysis","level_desc":"Surface","long_name":"4xDaily Air temperature at sigma level 995","parent_stat":"Other","precision":2,"scale_factor":0.01,"statistic":"Individual Obs","units":"degK","var_desc":"Air temperature"}',
  'air/0.0.0': ['air.nc', 15419, 7738000],
  'lat/.zarray': '{"chunks":[25],"compressor":null,"dtype":"<f4","fill_value":"NaN","filters":null,"order":"C","shape":[25],"zarr_format":2}',
  'lat/.zattrs': '{"_ARRAY_DIMENSIONS":["lat"],"axis":"Y","long_name":"Latitude","standard_name":"latitude","units":"degrees_north"}',
  'lat/0': ['air.nc', 5179, 100],
  'lon/.zarray': '{"chunks":[53],"compressor":null,"dtype":"<f4","fill_value":"NaN","filters":null,"order":"C","shape":[53],"zarr_format":2}',
  'lon/.zattrs': '{"_ARRAY_DIMENSIONS":["lon"],"axis":"X","long_name":"Longitude","standard_name":"longitude","units":"degrees_east"}',
  'lon/0': ['air.nc', 5279, 212],
  'time/.zarray': '{"chunks":[2920],"compressor":null,"dtype":"<f4","fill_value":"NaN","filters":null,"order":"C","shape":[2920],"zarr_format":2}',
  'time/.zattrs': '{"_ARRAY_DIMENSIONS":["time"],"calendar":"standard","long_name":"Time","standard_name":"time","units":"hours since 1800-01-01"}',
  'time/0': ['air.nc', 7757515, 11680]}}
```
Notice that this is only partially decoded - the top two levels are nested python dictionaries, but below that the various zarr attributes are stored as long strings, e.g:
```
'{"chunks":[2920,25,53],"compressor":null,"dtype":"<i2","fill_value":null,"filters":null,"order":"C","shape":[2920,25,53],"zarr_format":2}'
```

This seems silly, why not just decode the whole thing properly at the beginning so you can always treat it like a nested python dictionary? (Or even better use a dedicated abstraction like suggested in #375)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JSON not properly decoded by backends #415

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

JSON not properly decoded by backends #415

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions