-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArrowFSWrapper should not use "/" root_marker for all filesystems #1467
Conversation
except BaseException: | ||
root_marker = "" | ||
|
||
wrapper = type(cls.__name__, (cls,), {"root_marker": root_marker}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is effectively a metaclass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no metaclasses involved, but it works similarly yes. For such a simple case, metaclasses might be overkill, but maybe a top-level function that isn't attached to ArrowFSWrapper
instead?
def wrap_arrow_fs(fs: pyarrow.fs.FileSystem, **kwargs) -> ArrowFSWrapper:
# build class dynamically based on fs
Because it's a class attribute, I don't think we can get around creating different classes per arrow filesystem. An alternative to the dynamic class construction would be to have wrapper subclasses for each root_marker
variant and a lookup table (dict) based on the provided arrow filesystem.
type_name = override.get(fs.type_name, fs.type_name) | ||
|
||
try: | ||
fs.from_uri(f"{type_name}:///") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the foolproof way to know whether the first character can/should be "/"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely works for (3/4) Local, S3, and GCS. I'm not sure about Hadoop, I'm never used it and am not sure how to test locally.
https://arrow.apache.org/docs/python/filesystems.html#filesystem-interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hadoop has a derived class anyway, HadoopFileSystem. We should make sure it has "/", and it doesn't need to go through the check in this PR.
How about b129860 , does this doe everything, but without |
Hey @martindurant yes that commit looks great. Dropping the |
I don't immediately see any downside, except maybe we should guard against being called as |
This one got lost... should we finish it off? |
Perhaps a bit too hacky to be merged, but Fixes #1464 , in theory. Basic idea is to check if the provided arrow filesystem supports a
/
path, then create anArrowFSWrapper
class dynamically based on the result.