Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51253][SQL][CONNECT] Mapped KVDS.agg should support sql-expressions #49999

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

hvanhovell
Copy link
Contributor

@hvanhovell hvanhovell commented Feb 19, 2025

What changes were proposed in this pull request?

In we fixed KeyValueGroupedDataset.agg(...) for KeyValueGroupedDatasets for which the values where mapped. We approached this by transforming the grouping key(s) and the aggregates to use different inputs. Unfortunately we forgot that a user is also allowed to define expressions in sql using the expr(..) functionality, this did not work in our approach because the sql expression contains an opaque (from the connect POV) string.

We opted to fix this in a way that is not to dissimilar to the classic KeyValueGroupedDataset implementation. If the KVDS contains a map values function we execute the function in an intermediate projection, and then we apply the aggregation.

Why are the changes needed?

It fixes a bug.
It cleans up a bunch of code in

Does this PR introduce any user-facing change?

No. It fixes a bug.

How was this patch tested?

Existing tests pass. I have modified a test to use expr(...) now.

Was this patch authored or co-authored using generative AI tooling?

No.

@hvanhovell
Copy link
Contributor Author

@xupefei PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant