You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It might not matter here, but by using fastparquet in the read, I have stricter control over the data flow. Since pandas will require pyarrow sometime soon, fastparquet is quitting pandas over a similar timeline. In the use here, we don't need pandas at all, so fastparquet will be more natural and more performant after the switch. Furthermore, pyarrow does not build for wasm, but fastparquet does, so although I am not aware of any browser use (because async and threads don't work), it would be nice to keep that option open.
It may be reasonable to have kerchunk depend on fastparquet, to prevent your specific hurdle?
Currently the
LazyReferenceMapper.write
uses fastparquet to write the parquet file (https://github.com/fsspec/filesystem_spec/blob/master/fsspec/implementations/reference.py#L467)I came across this as I didn't have fastparquet in my env.
It would be nice to have a fallback to using pyarrow or even only use pyarrow as it's more commonly used e.g. dask has deprecated fastparquet AFAICT (https://github.com/dask/dask/blob/main/dask/dataframe/io/parquet/core.py#L259)
Thanks said I don't know if there would be any difficulty switching
The text was updated successfully, but these errors were encountered: