How to return data as an arrow table instead of pandas dataframe? #6347
-
Hi All, I am using Ibis with MSSQL and by default I understand it returns data from the database as a Pandas Dataframe. I am keen to do some processing however thereafter in Polars (through Ibis). Is there anyway to change the default so it returns the data as an Arrow table? I am just wondering if I can avoid serialising from one format to another and if I can change the default to always return as arrow tables. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi @nikhilmakan02, I'd be interested in learning more about your use case here! I don't think you can set the default of You can output an Ibis table to Arrow with the # import ibis
import ibis
# configure ibis
ibis.options.interactive = True
ibis.set_backend("polars")
# connect to backend
con_cl = ibis.clickhouse.connect()
# get a "remote" table
test_remote = con_cl.tables["test"]
# load the remote table into Polars via Arrow
test_local = ibis.memtable(test_remote.to_pyarrow())
|
Beta Was this translation helpful? Give feedback.
-
@lostmygithubaccount and @gforsyth sorry for the super delayed reply on this, and thank you both for the answer. As noted, the above option is useful for small tables. If its of interest I can provide a bit more context on my use cases. We pull data through via Ibis from a cloud database which involves doing some crunching of big data and therefore is well suited to a distributed query engine. However once we have received the data there then can be several minor transformations that need to happen on this dataset for visualisations purposes or even interactive analyses. It is in this instance I would like to switch backends to something like Polars to execute these transformations and simply thought it would be inefficient to incur serialisation from Pandas to Polars. Noted this would usually be a small cost if the data is quite small, however response times can be quite strict in a dashboard or interactive setting for good user experience. Thank you for raising it as new feature request. What drove me to actually post this question in the first place was the docs stating by default it returns a Pandas DataFrame which gave me the impression that maybe it could be changed. |
Beta Was this translation helpful? Give feedback.
Hi @nikhilmakan02, I'd be interested in learning more about your use case here!
I don't think you can set the default of
.execute()
in any easy way -- I saw earlier that Ibis will likely move toward an explicitto_pandas()
method in docs & examples to avoid any confusion.You can output an Ibis table to Arrow with the
to_pyarrow()
(orto_pyarrow_batches()
) method and create an Ibis table from an Arrow table withibis.memtable()
. I wasn't able to validate with mssql easily (just due to local setup issues) but did with another backend (clickhouse). Note that this would pull the entire table into local memory and could be slow and/or costly depending on your setup and size of data:# import …