Splitting out from #62 this other approach to managing memory, to address the running-out-of-memory issue described by #57 (comment), complementing:
Dask Array
We also want to consider using dask.array within Xarray, which under-the-hood is just a chunked collection of numpy.ndarray objects. What this gets us is the ability to handle arrays that are larger than memory, by just accessing certain chunks at a time (i.e. by timestep). These docs (at the bottom of the section) provide an explanation how lazy writing works:
Once you’ve manipulated a Dask array, you can still write a dataset too big to fit into memory back to disk by using to_netcdf() in the usual way...
By setting the compute argument to False, to_netcdf() will return a dask.delayed object that can be computed later.
from dask.diagnostics import ProgressBar
delayed_obj = ds.to_netcdf("manipulated-example-data.nc", compute=False)
with ProgressBar():
results = delayed_obj.compute()
NOTE: We can use the Dataset.to_zarr() method the same way.
The solutions near the bottom of this thread describes a smart approach to do exactly what we need. Let's implement something similar (see the suggested code lines in response 8): https://discourse.pangeo.io/t/processing-large-too-large-for-memory-xarray-datasets-and-writing-to-netcdf/1724/8
Splitting out from #62 this other approach to managing memory, to address the running-out-of-memory issue described by #57 (comment), complementing:
xarraytime coordinates and variables #68Dask Array
We also want to consider using
dask.arraywithin Xarray, which under-the-hood is just a chunked collection ofnumpy.ndarrayobjects. What this gets us is the ability to handle arrays that are larger than memory, by just accessing certain chunks at a time (i.e. by timestep). These docs (at the bottom of the section) provide an explanation how lazy writing works:NOTE: We can use the
Dataset.to_zarr()method the same way.The solutions near the bottom of this thread describes a smart approach to do exactly what we need. Let's implement something similar (see the suggested code lines in response 8): https://discourse.pangeo.io/t/processing-large-too-large-for-memory-xarray-datasets-and-writing-to-netcdf/1724/8