fix(substrait): schema errors for Aggregates with no groupings #17909
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #16590
Rationale for this change
When consuming Substrait plans containing aggregates with no groupings, we would see the following error
The Substrait plan had one less field than DataFusion expected because DataFusion was adding an extra "__grouping_id" to the output of the Aggregate node. This happens when the
datafusion/datafusion/expr/src/logical_plan/plan.rs
Line 3418 in daeb659
condition is true.
A natural followup question to this is "Why are we creating an Aggregate with a single empty GroupingSet for the group by, instead of just leaving the group by entirely?".
What changes are included in this PR?
Instead of setting group_exprs to a vector with a single empty grouping set, let's just leave group_exprs empty entirely. This means that the
is_grouping_set
is not triggered, so the Datafusion schema matches the Substrait schema.Are these changes tested?
Yes
I have added direct tests via example Substrait plans
Are there any user-facing changes?
Substrait plans that were not consumable before are now consumable.