Skip to content

Access via Dask read_parquet #22

Answered by TomAugspurger
mikeskaug asked this question in Places
Discussion options

You must be logged in to vote

The filters argument doesn't do anything

I can't tell from the documentation at https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html whether referencing struct members like 'bbox.min' is supported. Maybe @jorisvandenbossche knows off hand? Perhaps you need to make some kind of pyarrow.compute.Expression, but I couldn't sort it out.

I haven't figured out how to load the "geometry" column. I understand that the geometry column is WKB, so I've tried loading it as bytes, but dask always tries to decode it as UTF-8, which fails.

If you load the data as binary, you can use geopandas.Geoseries.from_wkb to parse it into polygons:

import dask.dataframe as dd
import 

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@mikeskaug
Comment options

@ksmithNau
Comment options

@mikeskaug
Comment options

Answer selected by mikeskaug
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 participants