-
Notifications
You must be signed in to change notification settings - Fork 229
Cache gmt accessor info for non-virtualfile grid/image inputs #4008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Regression test based on code at #4005, checking that GMTDataArrayAccessor registration/gtype is preserved after temporary file is deleted.
Need to load the gmt accessor at the xr.Dataset level instead of the xr.DataArray level, otherwise the info gets lost at the `.to_dataset()` call. Had to add the `@xr.register_dataset_accessor("gmt")` decorator on `class GMTDataArrayAccessort` to prevent `AttributeError: 'Dataset' object has no attribute 'gmt'`.
raster_dataset: xr.Dataset = raster.to_dataset() | ||
_ = raster_dataset.gmt # Load GMTDataArray accessor information | ||
return raster_dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally posted by @seisman in #3932 (comment)
Actually, it's likely that the accessor information will be lost when converting via
to_dataset
.
So we can set the accessor info at the xr.Dataset
level, but not sure if it's a good idea because then xr.Dataset
would have GMT-functions enabled...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is still failing. I guess that the accessor information is lost in both dataarray->dataset and dataset->dataarray conversions.
When calling xr.open_dataarray
with the "gmt" engine, it actually calls the GMTBackendEntry's open_dataset
method first, so the conversion from dataset to dataarray cannot be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> import pygmt
>>> import xarray as xr
>>> grid = xr.open_dataarray("@static_earth_relief.nc", engine="gmt", raster_kind="grid")
>>> grid.encoding["source"]
np.str_('/home/seisman/.gmt/cache/static_earth_relief.nc')
>>> grid.gmt.registration, grid.gmt.gtype
(<GridRegistration.PIXEL: 1>, <GridType.GEOGRAPHIC: 1>)
>>> # Set source encoding to a fake path, similar to a temporary file being deleted
>>> grid.encoding["source"] = "/path/to/nothing"
>>> grid.gmt.registration, grid.gmt.gtype
(<GridRegistration.PIXEL: 1>, <GridType.GEOGRAPHIC: 1>)
>>> # Convert a dataarray to a dataset. The source encoding is not kept.
>>> da = grid.to_dataset()
>>> da.encoding
{}
>>> # Convert a dataset to a dataarray.
>>> # Xref: https://github.com/pydata/xarray/blob/fb49a3b0a83272467aa01ac189b95e9e7ebe01aa/xarray/backends/api.py#L950
>>> (ds, ) = da.data_vars.values()
>>> ds.encoding
{'source': '/path/to/nothing'}
>>> ds.gmt.registration, ds.gmt.gtype
(<GridRegistration.GRIDLINE: 0>, <GridType.CARTESIAN: 0>)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As shown above, the accessor information is lost during the dataset->dataarray conversion, and we rely on the source encoding to decide the registration and gtype, which default to 0 for temporary files being deleted.
I don't think we have any solutions here, and need to document the limitation and recommend:
- Avoiding using temporary files if possible
- If temporary files cannot be avoided, use
_ = grid.gmt
before deleting the temporary file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I found this upstream issue - pydata/xarray#3268 that accessors are not meant to cache a state. I do wish though that the @xr.register_dataarray_accessor
decorator could work by us returning an xr.DataArray
directly (instead of having to return an xr.Dataset
that then gets converted to an xr.DataArray
).
Agree that we should document that temporary files should be avoided and/or trigger caching using _ = grid.gmt
, then close #4005 as wontfix (with a workaround).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wish though that the
@xr.register_dataarray_accessor
decorator could work by us returning anxr.DataArray
directly (instead of having to return anxr.Dataset
that then gets converted to anxr.DataArray
).
Opened an upstream issue on xarray for this -> pydata/xarray#10562
Description of proposed changes
Ensure that using
load_dataarray/open_dataarray(..., engine="gmt", ...)
caches theGMTDataArrayAccessor
info, so that the correct registration/gtype is returned even if the original source NetCDF file is deleted.TODO:
Fixes #4005
Reminders
make format
andmake check
to make sure the code follows the style guide.doc/api/index.rst
.Slash Commands
You can write slash commands (
/command
) in the first line of a comment to performspecific operations. Supported slash command is:
/format
: automatically format and lint the code