[WIP] `df.apply`: add support for `engine='bodo'` #60622

scott-routledge2 · 2024-12-30T01:27:01Z

Adds support for specifying engine="bodo" when executing UDF's on DataFrames via df.apply.

closes ENH: Add support for executing UDF's using Bodo as the engine #60668
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

WillAyd · 2024-12-30T23:05:47Z

Is there an issue open to discuss this? Adding another engine internally is a non-trivial amount of work and maintenance, so I'm not sure we would even add this without discussion up front.

scott-routledge2 · 2024-12-30T23:40:56Z

Is there an issue open to discuss this? Adding another engine internally is a non-trivial amount of work and maintenance, so I'm not sure we would even add this without discussion up front.

@WillAyd Hi Will, thank you for the feedback, I would be happy to open an issue and start a conversation. The intention of this draft was mainly experimentation.

datapythonista

Changes look good to me. I added some comments with ideas, but just minor things.

It'd still be good to have an issue for this (or maybe just update the description of the PR), so everybody understands in detail why this is being proposed, and what use cases this addresses. But in general, this seems good.

datapythonista · 2025-01-15T22:10:08Z

pandas/core/apply.py

@@ -870,13 +870,16 @@ def apply(self) -> DataFrame | Series:
                    "the 'numba' engine doesn't support using "
                    "a string as the callable function"
                )
+            if self.engine == "bodo":


Suggested change

if self.engine == "bodo":

elif self.engine == "bodo":

datapythonista · 2025-01-15T22:13:42Z

pandas/core/apply.py

            results, res_index = self.apply_series_numba()
+        else:


Suggested change

else:

elif self.engine == "else":

I know what you wrote is consistent with what we have now, where numba is the default for the if, but I think it's clearer to avoid a default (or make python the default). Maybe raise a ValueError if self.engine isn't known?

datapythonista · 2025-01-15T22:16:30Z

pandas/core/apply.py

@@ -1089,6 +1098,26 @@ def apply_series_numba(self):
        results = self.apply_with_numba()
        return results, self.result_index

+    def apply_series_bodo(self) -> DataFrame | Series:
+        bodo = import_optional_dependency("bodo")


Not sure if importing bodo is immediate, but maybe better to have the checks that raise the errors first, and only import if those pass?

datapythonista · 2025-01-15T22:17:14Z

pandas/core/frame.py

@@ -10288,6 +10288,8 @@ def apply(
            <https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html>`_
            in numba to learn what you can or cannot use in the passed function.

+            TODO: describe bodo


Adding this comment to the review, so the TODO is not forgotten

datapythonista · 2025-01-15T22:21:15Z

pandas/tests/apply/test_bodo.py

+    frame = pd.DataFrame(
+        {"a": [1, 2, 3], "b": [4, 5, 6], "c": [7.0, 8.0, 9.0]},
+    )
+    f = lambda x: x["c"]


Not sure why our linter doesn't complain, but I think it's considered a bad practice to use lambda when assigning to a variable. I'd use def instead.

datapythonista · 2025-01-15T22:23:09Z

ci/deps/actions-310-minimum_versions.yaml

@@ -3,6 +3,7 @@
 name: pandas-dev
 channels:
  - conda-forge
+  - bodo.ai


Do you have plans to package Bodo for conda-forge? Is there a reason not to? I think it'd be better for users and our CI if we could simply use conda-forge.

Yes, we currently have an in progress PR here: conda-forge/staged-recipes#28648

WillAyd · 2025-01-17T15:06:48Z

pandas/tests/apply/test_bodo.py

@@ -0,0 +1,107 @@
+import numpy as np


Rather than creating separate tests for bodo is there a way that we can create a fixture for the three different engines? Structuring the tests that way would be very helpful to ensure result consistency

= added 7 commits December 29, 2024 20:22

add basic support for engine=bodo, df.apply

1e62d38

Merge remote-tracking branch 'upstream/main' into scott/bodo_udf_engine

7e2c2c3

fix test

27fbc0a

adjust minimum version requirements

4c2e94a

update ci envs

4349d61

add channel

0872285

try skipping some tests

cd94be9

WillAyd added the Needs Discussion Requires discussion from core team before further action label Dec 31, 2024

scott-routledge2 mentioned this pull request Jan 6, 2025

ENH: Add support for executing UDF's using Bodo as the engine #60668

Open

3 tasks

datapythonista reviewed Jan 15, 2025

View reviewed changes

datapythonista and others added 2 commits January 15, 2025 23:26

Merge branch 'main' into scott/bodo_udf_engine

9a90fa0

apply feedback

dcdd00e

WillAyd requested changes Jan 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] `df.apply`: add support for `engine='bodo'` #60622

[WIP] `df.apply`: add support for `engine='bodo'` #60622

scott-routledge2 commented Dec 30, 2024 •

edited

Loading

WillAyd commented Dec 30, 2024

scott-routledge2 commented Dec 30, 2024

datapythonista left a comment

datapythonista Jan 15, 2025

datapythonista Jan 15, 2025

datapythonista Jan 15, 2025

datapythonista Jan 15, 2025

datapythonista Jan 15, 2025

datapythonista Jan 15, 2025

scott-routledge2 Jan 17, 2025

WillAyd Jan 17, 2025

[WIP] df.apply: add support for engine='bodo' #60622

Are you sure you want to change the base?

[WIP] df.apply: add support for engine='bodo' #60622

Conversation

scott-routledge2 commented Dec 30, 2024 • edited Loading

WillAyd commented Dec 30, 2024

scott-routledge2 commented Dec 30, 2024

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[WIP] `df.apply`: add support for `engine='bodo'` #60622

[WIP] `df.apply`: add support for `engine='bodo'` #60622

scott-routledge2 commented Dec 30, 2024 •

edited

Loading