Reduce memory usage: lazy writing to `dask.array`

Splitting out from #62 this other approach to managing memory, to address the running-out-of-memory issue described by https://github.com/EcohydrologyTeam/ClearWater-modules/issues/57#issuecomment-1833765631, complementing:
- #62 
- #68

## Dask Array

We also want to consider using [`dask.array`](https://docs.dask.org/en/stable/array.html) within Xarray, which under-the-hood is just a chunked collection of `numpy.ndarray` objects. What this gets us is the ability to handle arrays that are larger than memory, by just accessing certain chunks at a time (i.e. by timestep). These docs (at the bottom of the section) provide an explanation how lazy writing works:
- https://docs.xarray.dev/en/stable/user-guide/dask.html#reading-and-writing-data

> Once you’ve manipulated a Dask array, you can still write a dataset too big to fit into memory back to disk by using [`to_netcdf()`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.to_netcdf.html#xarray.Dataset.to_netcdf) in the usual way... 
> By setting the `compute` argument to `False`, [`to_netcdf()`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.to_netcdf.html#xarray.Dataset.to_netcdf) will return a `dask.delayed` object that can be computed later.
> 
> ```py
> from dask.diagnostics import ProgressBar
> 
> delayed_obj = ds.to_netcdf("manipulated-example-data.nc", compute=False)
> 
> with ProgressBar():
>     results = delayed_obj.compute()
> ```

NOTE: We can use the [`Dataset.to_zarr()`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.to_zarr.html#xarray.Dataset.to_zarr) method the same way.

The solutions near the bottom of this thread describes a smart approach to do exactly what we need. Let's implement something similar (see the suggested code lines in response 8): https://discourse.pangeo.io/t/processing-large-too-large-for-memory-xarray-datasets-and-writing-to-netcdf/1724/8


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage: lazy writing to `dask.array` #69

Dask Array

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reduce memory usage: lazy writing to dask.array #69

Description

Dask Array

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Reduce memory usage: lazy writing to `dask.array` #69