Pass bytes object from remote to local

Currently, the way to transfer a dataframe from remote to local is with the command %%spark -o df. However, this is not an optimal solution as it is less efficient and not equivalent with respect to data types as the normal .toPandas() method of a spark dataframe.

The easiest workaround is to write the dataframe to a file system that both remote and local have access to. For example, you could do:

```
df = spark.table("data")
df.toPandas().to_parquet("/file/system/data.parquet")

%%local
df = pd.read_parquet("/file/system/data.parquet")
```

If you could transfer the serialized parquet file directly to local, you could avoid the external file system:

```
df = spark.table("data")
buf = df.toPandas().to_parquet(path=None)

%%send_bytes buf

%%local
import io
df = pd.read_parquet(io.BytesIO(buf))
```


How difficult would it be to add the %%send_bytes magic? From looking through the code, it seems as though it should be doable, but I may be missing something. I am happy to help as best I can with the implementation, but I am not very familiar with the codebase.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass bytes object from remote to local #954

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pass bytes object from remote to local #954

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions