Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use pyarrow in LazyReferenceMapper.write? #1563

Open
raybellwaves opened this issue Apr 4, 2024 · 2 comments
Open

use pyarrow in LazyReferenceMapper.write? #1563

raybellwaves opened this issue Apr 4, 2024 · 2 comments

Comments

@raybellwaves
Copy link
Contributor

Currently the LazyReferenceMapper.write uses fastparquet to write the parquet file (https://github.com/fsspec/filesystem_spec/blob/master/fsspec/implementations/reference.py#L467)

I came across this as I didn't have fastparquet in my env.

It would be nice to have a fallback to using pyarrow or even only use pyarrow as it's more commonly used e.g. dask has deprecated fastparquet AFAICT (https://github.com/dask/dask/blob/main/dask/dataframe/io/parquet/core.py#L259)

Thanks said I don't know if there would be any difficulty switching

@martindurant
Copy link
Member

It might not matter here, but by using fastparquet in the read, I have stricter control over the data flow. Since pandas will require pyarrow sometime soon, fastparquet is quitting pandas over a similar timeline. In the use here, we don't need pandas at all, so fastparquet will be more natural and more performant after the switch. Furthermore, pyarrow does not build for wasm, but fastparquet does, so although I am not aware of any browser use (because async and threads don't work), it would be nice to keep that option open.

It may be reasonable to have kerchunk depend on fastparquet, to prevent your specific hurdle?

@raybellwaves
Copy link
Contributor Author

It may be reasonable to have kerchunk depend on fastparquet, to prevent your specific hurdle?

I don't want to want to bloat the deps just for me. I can install it separately for now.

Thanks for the information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants