You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This sort of workflow without specifying a protocol finds that the parquet file is a directory and returns IsADirectory exception. So I am trying to figure out which protocol to use. Looking through the docs, two built-in implementations mention parquet files, but they both seem aimed at kerchunk files specifically. I'm not sure if this means I can use them for other uses or not? I tried with protocol="reference" and then I wasn't sure what to use for fo. I am using a local parquet file and I used that for fo, something like this:
but then it couldn't find my file, though it is sitting in the same directory and I had just given the file name in "path_to_file". I am using local files now but in general wouldn't always be.
Am I taking the wrong approach altogether? Any idea for how to approach this? Thanks.
The text was updated successfully, but these errors were encountered:
I am a little confused on what you want to do. As you say, a parquet dataset is (usually) a collection of files in a directory or tree. fsspec is for reading bytes or doing filesystem manipulations, so it makes no sense to "open" a directory.
fs = fsspec.filesystem()
fs.find(path) # list all files
fsspec.open(path+"/**/*.parquet", "rb") # "open" all matching data files
However, the parquet libraries understand the layout of parquet files, so you don't need to do this.
pd.read_parquet(path)
will call fsspec as needed (via arrow, which also has a concept of filesystems, or via fastparquet). Same goes for dask, polars, etc.
And of course Intake
data = intake.readers.datatypes.Parquet(path)
reader = data.to_reader("pandas")
# or
reader = intake.auto_pipeline(path, "pandas:DataFrame") # works if path matches *.parquet
This might be a naive question but I have spent a bit of time trying to figure it out and haven't made much progress.
I'm trying to do this workflow for a parquet file:
This sort of workflow without specifying a protocol finds that the parquet file is a directory and returns IsADirectory exception. So I am trying to figure out which protocol to use. Looking through the docs, two built-in implementations mention parquet files, but they both seem aimed at kerchunk files specifically. I'm not sure if this means I can use them for other uses or not? I tried with
protocol="reference"
and then I wasn't sure what to use forfo
. I am using a local parquet file and I used that forfo
, something like this:but then it couldn't find my file, though it is sitting in the same directory and I had just given the file name in "path_to_file". I am using local files now but in general wouldn't always be.
Am I taking the wrong approach altogether? Any idea for how to approach this? Thanks.
The text was updated successfully, but these errors were encountered: