Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimension mismatch in MARS data #398

Open
juntyr opened this issue Aug 15, 2024 · 2 comments
Open

Dimension mismatch in MARS data #398

juntyr opened this issue Aug 15, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@juntyr
Copy link

juntyr commented Aug 15, 2024

What happened?

xarray failed to open a GRIB file with xarray, erroring with a dimension mismatch

What are the steps to reproduce the bug?

import xarray as xr
xr.open_dataset("_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib", engine="cfgrib")

Version

0.9.14.0

Platform (OS and architecture)

MacOS, also occurs on Pyodide

Relevant log output

ecCodes provides no latitudes/longitudes for gridType='sh'
skipping variable: paramId==133 shortName='q'
Traceback (most recent call last):
  File "venv/lib/python3.10/site-packages/cfgrib/dataset.py", line 723, in build_dataset_components
    dict_merge(dimensions, dims)
  File "venv/lib/python3.10/site-packages/cfgrib/dataset.py", line 639, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='values' value=1639680 new_value=6599680
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "venv/lib/python3.10/site-packages/xarray/backends/api.py", line 588, in open_dataset
    backend_ds = backend.open_dataset(
  File "venv/lib/python3.10/site-packages/cfgrib/xarray_plugin.py", line 141, in open_dataset
    ds = xr.Dataset(vars, attrs=attrs)
  File "venv/lib/python3.10/site-packages/xarray/core/dataset.py", line 713, in __init__
    variables, coord_names, dims, indexes, _ = merge_data_and_coords(
  File "venv/lib/python3.10/site-packages/xarray/core/dataset.py", line 427, in merge_data_and_coords
    return merge_core(
  File "venv/lib/python3.10/site-packages/xarray/core/merge.py", line 705, in merge_core
    dims = calculate_dimensions(variables)
  File "venv/lib/python3.10/site-packages/xarray/core/variable.py", line 3009, in calculate_dimensions
    raise ValueError(
ValueError: conflicting sizes for dimension 'values': length 6599680 on 'latitude' and length 1639680 on {'step': 'step', 'hybrid': 'hybrid', 'values': 't'}

Accompanying data

https://faubox.rrze.uni-erlangen.de/dl/fiVj21QV6ihsyWC8UEZYTT/_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib

Organisation

University of Helsinki

@juntyr juntyr added the bug Something isn't working label Aug 15, 2024
@juntyr
Copy link
Author

juntyr commented Aug 15, 2024

CC @SF-N

@iainrussell
Copy link
Member

Hi @juntyr,

The reason for the problem is that there are two different variables here, whose geographical coordinates do not match (in fact q is on a reduced Gaussian grid, and t is a spectral field, not on a grid at all). Therefore they cannot form a nice hypercube.

% grib_ls ./_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib
./_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib
edition      centre       date         dataType     gridType     stepRange    typeOfLevel  level        shortName    packingType
2            ecmf         20240811     cf           sh           354          hybrid       1            t            spectral_complex
2            ecmf         20240811     cf           reduced_gg   354          hybrid       1            q            grid_ccsds
2            ecmf         20240811     cf           sh           354          hybrid       2            t            spectral_complex
2            ecmf         20240811     cf           reduced_gg   354          hybrid       2            q            grid_ccsds
2            ecmf         20240811     cf           sh           360          hybrid       1            t            spectral_complex
2            ecmf         20240811     cf           reduced_gg   360          hybrid       1            q            grid_ccsds
2            ecmf         20240811     cf           sh           360          hybrid       2            t            spectral_complex
2            ecmf         20240811     cf           reduced_gg   360          hybrid       2            q            grid_ccsds
8 of 8 messages in ./_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib

You can, however, use a bit of built-in functionality from cfgrib to split the data into two datasets - one for each variable:

import cfgrib
ds = cfgrib.open_datasets('_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib')

Alternatively, to get more control, you can use the backend kwargs to load just selected fields according to their properties, e.g.

fname = "_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib"
ds = xr.open_dataset(fname, engine="cfgrib", backend_kwargs={'filter_by_keys': {'gridType': 'reduced_gg'}})

I hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants