-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fsspec-related issue or question #825
Comments
I had forgotten about open_local. Probably readers that might need this should have an extra flag in their kwargs - which might only be XArrayDatasetReader . It is highly unusual for a package to accept fsspec files sometimes, but require local file names (not file-like objects) in other situations! |
Is there such a flag I can use to get the behavior I need?
I am not sure what you mean about
Should I be taking a different route altogether? |
Something like this --- a/intake/readers/readers.py
+++ b/intake/readers/readers.py
@@ -1080,7 +1080,7 @@ class XArrayDatasetReader(FileReader):
other_urls = {"xarray:open_dataset": "filename_or_obj"}
url_arg = "paths"
- def _read(self, data, **kw):
+ def _read(self, data, open_local=False, **kw):
from xarray import open_dataset, open_mfdataset
if "engine" not in kw:
@@ -1100,6 +1100,9 @@ class XArrayDatasetReader(FileReader):
kw["group"] = data.path
if isinstance(data.url, (tuple, set, list)) or "*" in data.url:
# use fsspec.open_files? (except for zarr)
+ if open_local:
+ files = fsspec.open_local(data.url, **(data.storage_options or {}))
+ return open_mfdataset(files)
return open_mfdataset(data.url, **kw)
else: |
And then
(I haven't tried that this works, but it would be something similar) |
Oh ok sure, I can try making a PR like that. I wasn't quite getting your drift before. |
PR #828 addresses this. I wasn't careful with the url and in my intake catalog case above, the url wasn't in the "simplecache" encasing. Adding that as a case to the fsspec logic fixed my issue, but I also added the case for This code snippet didn't work before and now does:
So far I have been using the NetCDF3 reader for my netCDF4 files — is that your recommendation? |
I suppose they are all passing to xarray, which handles all these file types and is making a decent guess. netCDF3 is NOT the same as netCDF4, the latter of which is just a special-case of HDF5 file. So you probably wanted HDF5, and maybe then you didn't need a local copy at all.
|
I have been experimenting with HDF5 too, since like you said netCDF4 is a special case of HDF5. But, for example, without the code in the PR, the following again doesn't work with the same error as before:
But, sometimes one might want to have the option to get the local cache of the file too. I have been trying netCDF3 vs HDF5 in other cases too and haven't had a clear outcome in my head of what to do other than it seemed like I should use netCDF3 because it worked more regularly than HDF5. |
If you want the caching to be optional, you could make a user parameter, where url="{MAYBE_CACHE}https://researchworkspace.com/files/42712165/lower-ci_system-B_2006-2007.nc" and MAYBE_CACHE can have the values ["", "simplecache::"]. But you actual problem is "The HTTP server doesn't appear to support range requests.", which the remote reader needs to have random access. You are actually seeing fsspec/filesystem_spec#1631 , fsspec/filesystem_spec#1626 , which will be resolved somehow soon. I checked, your server does not explicitly say that it doesn't accept Range. Actually, the server sends the whole file every time, whatever you ask for; I wonder if we should explicitly handle that case when the file is small enough. |
Ah, interesting. Thanks for that idea. What I meant at the moment though is that I think it's worth adding the modification I made in the PR to intake because that way a user can have the "simplecache" prefix to their url and have it work; otherwise I don't think it will go through the logic in the xarray reader correctly.
I see! Yes, every error I have been hitting has been that one, and "simplecache" has been an accidental workaround. Thanks. |
|
(this caches in memory, not on disk) |
This looks like a good way to move forward with this catalog. Thank you! |
This might be user-error in which case I apologize in advance. I'm not very versed in the nuances of
fsspec
.I am able to open this file with xarray regularly but I am not able to figure out the right combination in my intake catalog, at least v2.
Hits error at
https://github.com/intake/intake/blob/cdea0c903948187784451f4a92804c349b4da700/intake/readers/readers.py#L1113C24-L1113C45
which uses
fsspec.open
instead offsspec.open_local
so I assume that is the issue. Can I create the same behavior usingdata.storage_options
?Thanks!
The text was updated successfully, but these errors were encountered: