You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am grouping data in a Dataset and computing statistics. I wanted to take the median over (two) groups, but I got the following message:
>>>ds.groupby(['x', 'y']).median()
# NotImplementedError: The da.nanmedian function only works along an axis or a subset of axes. The full algorithm is difficult to do in parallel
while ds.groupby(['x']).median() works without any problem.
I noticed that this issue is because the DataArrays are dask arrays: if they are numpy arrays, there is no problem. In addition, if .median() is replaced by .quantile(0.5), there is no problem either. See below:
importdask.arrayasdaimportnumpyasnpimportxarrayasxrrng=da.random.default_rng(0)
ds=xr.Dataset(
{'a': (('x', 'y'), rng.random((10, 10)))},
coords={'x': np.arange(5).repeat(2), 'y': np.arange(5).repeat(2)}
)
# Raises:# NotImplementedError: The da.nanmedian function only works along an axis or a subset of axes. The full algorithm is difficult to do in paralleltry:
ds.groupby(['x', 'y']).median()
exceptNotImplementedErrorase:
print(e)
# No problems with the following:ds.groupby(['x']).median()
ds.groupby(['x', 'y']).quantile(0.5)
ds.compute().groupby(['x', 'y']).median() # Implicit conversion to numpy array
Describe the solution you'd like
A straightforward solution seems to be to use DatasetGroupBy.quantile(0.5) for DatasetGroupBy.median() if the median is to be computed over multiple groups.
Is your feature request related to a problem?
I am grouping data in a Dataset and computing statistics. I wanted to take the median over (two) groups, but I got the following message:
while
ds.groupby(['x']).median()
works without any problem.I noticed that this issue is because the DataArrays are dask arrays: if they are numpy arrays, there is no problem. In addition, if
.median()
is replaced by.quantile(0.5)
, there is no problem either. See below:Describe the solution you'd like
A straightforward solution seems to be to use
DatasetGroupBy.quantile(0.5)
forDatasetGroupBy.median()
if the median is to be computed over multiple groups.Describe alternatives you've considered
No response
Additional context
My
xr.show_versions()
:xarray: 2024.10.0
pandas: 2.2.3
numpy: 1.26.4
scipy: 1.14.1
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.4.1
h5py: 3.12.1
zarr: 2.18.3
cftime: 1.6.4.post1
nc_time_axis: None
iris: None
bottleneck: 1.4.2
dask: 2024.11.2
distributed: None
matplotlib: 3.9.2
cartopy: 0.24.0
seaborn: 0.13.2
numbagg: None
fsspec: 2024.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.5.0
pip: 24.3.1
conda: None
pytest: None
mypy: None
IPython: 8.29.0
sphinx: 7.4.7
The text was updated successfully, but these errors were encountered: