Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set expand to True by default but make configurable #1558

Merged
merged 3 commits into from
Apr 4, 2024

Conversation

martindurant
Copy link
Member

Fixes fsspec/gcsfs#616 and perhaps others

cc @Skylion007 - this is what I was proposing, so you can override with argument to open(), for the whole session using the config, or temporarily using the module variable. The only question: should the same default chain exist in open_files?

@martindurant
Copy link
Member Author

cc @shobsi , @cofin

@Skylion007
Copy link
Contributor

I propose this should not be the default. expand should not be the default as it seems like a footgun to use this feature without explicitly enabling it.

fsspec/core.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Skylion007 Skylion007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should not have open_expand=True. It was a defect before and we shouldn't complicate future support just for the sake of bw compatibility for an unsupported edgecase in one popular library (pandas).

@Skylion007
Copy link
Contributor

open_files has always taken an expand=True, expand=False arg, there was just a bug where it wasn't respected before. I don't think we should support this old behavior as it was an error. It's difficult to throw a deprecation warning here either and the lack of semantic versioning in this repo complicates things. We could add some better logging maybe on the filenotfound path is a filename has magic chars maybe? We could provide a suggestion to the user to explicitly set the expand=True etc in that case.

fsspec.open's call structure is so similar to builtins.open that I often use it as a drop in replacement. There really only a couple of edge cases (append filemode etc), opening the file outside a with context, that requires any adjustment. As such, I'm strongly against reverting the default behavior to the old, buggy one.

Finally, the old behavior, even when working properly is a footgun. Even if the glob is expanded, it's unexpected that will it open the first file it matches without erroring. What if I want to open a glob of CSV files and have special handling for that if fsspec.open returns file not found? The old behavior would incorrectly only open the first csv file silently and that could lead silent data corruption. The new behavior would require me to explicitly handle globs myself.

@martindurant
Copy link
Member Author

We could add some better logging maybe on the filenotfound path is a filename has magic chars maybe?

This is reasonable: catch FileNotFound in open(), and if there are any special characters, add extra information to the exception saying they might want to try expand=True.

@martindurant
Copy link
Member Author

Everyone happy with the wording and test here?

@shobsi
Copy link

shobsi commented Apr 4, 2024

We could add some better logging maybe on the filenotfound path is a filename has magic chars maybe?

This is reasonable: catch FileNotFound in open(), and if there are any special characters, add extra information to the exception saying they might want to try expand=True.

I think this is the best trade-off having learnt that the old behavior was an unintended one.

fsspec/core.py Outdated Show resolved Hide resolved
@martindurant martindurant merged commit 518984d into fsspec:master Apr 4, 2024
10 checks passed
@martindurant martindurant deleted the no_glob branch April 4, 2024 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FileNotFoundError since 2024.3.1
3 participants