Skip to content

Conversation

@philippjfr
Copy link
Member

@philippjfr philippjfr commented Nov 8, 2024

As we begin to broaden support for different DataFrame libraries in our ecosystem we need to start at the bottom and work our way up. This PR makes it possible for param.DataFrame to be extended with support for other DataFrame libraries and adds initial support for accepting polars DataFrame and LazyFrame objects. For now we maintain backwards compatibility here and require the user to explicitly define which libraries they want to support by providing a libraries argument.

@codecov
Copy link

codecov bot commented Nov 8, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 86.38%. Comparing base (31f5717) to head (80752ac).
Report is 41 commits behind head on main.

Files with missing lines Patch % Lines
param/parameters.py 75.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #975      +/-   ##
==========================================
- Coverage   86.70%   86.38%   -0.32%     
==========================================
  Files          10       10              
  Lines        5151     5201      +50     
==========================================
+ Hits         4466     4493      +27     
- Misses        685      708      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@philippjfr
Copy link
Member Author

@hoxbro @MarcSkovMadsen would love your thoughts. I'd like to think about Narwhals support but am wondering if that may be better implemented as a new parameter type.

Co-authored-by: Simon Høxbro Hansen <[email protected]>
Co-authored-by: Simon Høxbro Hansen <[email protected]>
@MarcSkovMadsen
Copy link
Collaborator

My POC for how this should be done from param over panel to bokeh is in https://github.com/panel-extensions/panel-graphic-walker/blob/89d6bac66d119d6ad1f75dfe27ec984d18d895ed/src/panel_gwalker/_tabular_data.py#L1

My broader vision for tabular support is in narwhals-dev/narwhals#1289 (comment).

My recommendation would by to first implement this e2e in panel-graphic-walker. That is a place where its really easy to get this working and experimenting e2e. Its also a practical example that will help us have a specific discussion solving real problems.

@philippjfr
Copy link
Member Author

@hoxbro @maximlt I know you two discussed this before and your initial suggestion was that we should introduce a new Parameter class, because this is potentially confusing and over-complicated. I'm wary of creating a new parameter given the history of situations like CalendarDate, Date and the ObjectSelector, Selector mess, which in my experience also just leads to confusion.

Overloading the Parameter does carry risk of confusion too and backward compatibility is a bit hard to reason about. So my concerns with this implementation are:

  • We now allow any (eager) DataFrame type by default. This does break backward compatibility.
  • If you don't have Narwhals installed then it falls back to pandas. This is potentially confusing since support for e.g. Polars, depends on another library being installed. At minimum I guess we need to improve the error messages to suggest installing Narwhals if it falls down the pandas validation path.
  • Should we even attempt to validate rows and columns if allow_lazy=True.
  • What do we do about (de)serialization? By default it'll always use pandas during deserialization now.

@maximlt
Copy link
Member

maximlt commented Nov 3, 2025

Well, Pandas and Polars DataFrames have pretty different APIs. So how useful is a DataFrame parameter if in the code I have to add if isinstance(self.df, pd.DataFrame): ... elif isinstance(self.df, pl.DataFrame) ... everywhere it's used? Could it be that the DataFrame as designed in the PR assumes the user will then leverage Narwhals and its unified API to interact with the dataframe? This implementation would be easier to deal with if polars didn't exist, as the other dataframe libraries have more or less copied pandas, but not polars 🙃

@maximlt
Copy link
Member

maximlt commented Nov 4, 2025

If we want to have a single DataFrame parameter without breaking backwards compatibility, we can do the kind of usual migration:

  • First, allow users to opt in to the new behavior (e.g. parameter attribute, global setting)
  • Second, later, emit a warning when the user has not opted in
  • Finally, the new behavior becomes the default

@jbednar
Copy link
Member

jbednar commented Nov 4, 2025

We now allow any (eager) DataFrame type by default. This does break backward compatibility.

I do not think we should do this. It breaks our contract with the user of Parameterized code. Users of a DataFrame parameter have been able to write code where they know they can invoke Pandas methods on the df they are provided, and thus they don't need to check that it's a Pandas object. Letting in other objects with different syntax and semantics violates our promises to the users.

That said, I'm concerned about a libraries argument because of how Narwhals works -- can't people then accept a wide variety of objects that should all work, via Narwhals? If so seems like we just have the choice between Pandas and Narwhals, rather than have people try to think of all the things that Narwhals might support now or in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants