-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Feature request
nested_pandas.read_parquet is going to have support for both local and cloud (e.g., S3, but not HTTP) directory reads after lincc-frameworks/nested-pandas#393 is merged. Since that implementation calls .is_dir on every cloud path if it doesn't end with "/", we may have extra round trips happening for each path we are going to read. However, in HATS we can distinguish leaf directories from leaf files. This makes a possible optimization when leaf directories are passed with trailing "/" and nested_pandas wouldn't call is_dir on it.
Unfortunately, nested_pandas would still call is_dir on leaf files, but currently HATS actually calls it anyway, so while the proposed re-implementation would not be the most optimal one, it would not increase the total amount of round trips when accessing cloud files.
Before submitting
Please check the following:
- I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
- I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
- If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status