Skip to content

Conversation

vbarua
Copy link
Contributor

@vbarua vbarua commented Oct 3, 2025

Which issue does this PR close?

Closes #16590

Rationale for this change

When consuming Substrait plans containing aggregates with no groupings, we would see the following error

Error: Substrait("Named schema must contain names for all fields")

The Substrait plan had one less field than DataFusion expected because DataFusion was adding an extra "__grouping_id" to the output of the Aggregate node. This happens when the

let is_grouping_set = matches!(group_expr.as_slice(), [Expr::GroupingSet(_)]);

condition is true.

A natural followup question to this is "Why are we creating an Aggregate with a single empty GroupingSet for the group by, instead of just leaving the group by entirely?".

What changes are included in this PR?

Instead of setting group_exprs to a vector with a single empty grouping set, let's just leave group_exprs empty entirely. This means that the is_grouping_set is not triggered, so the Datafusion schema matches the Substrait schema.

Are these changes tested?

Yes

I have added direct tests via example Substrait plans

Are there any user-facing changes?

Substrait plans that were not consumable before are now consumable.

@github-actions github-actions bot added the substrait Changes to the substrait crate label Oct 3, 2025
@vbarua vbarua changed the title test: verify handling of groupings in Substrait fix(substrait): schema errors for Aggregates with no groupings Oct 4, 2025
When groupings is empty, we should set group_exprs to the empty vector
instead of a vector containing a single empty GroupingSet.

The issue with the latter is that it triggers the addition of the extra
"__grouping_id" column in the Aggregate node, which is redundant in this
case AND causes conflicts in the output schema because Substrait does
not expect it.
@vbarua vbarua force-pushed the vbarua/substrait/handle-empty-groupings branch from 0b3a022 to 59561f0 Compare October 4, 2025 00:38
@vbarua
Copy link
Contributor Author

vbarua commented Oct 4, 2025

I originally tried to add a test for the handling of plans with multiple grouping sets, however I encountered a different issues which I documented in #17910.

@vbarua
Copy link
Contributor Author

vbarua commented Oct 6, 2025

cc: @Blizzara

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Logical plan creation for Substrait plans with aggregate relations
1 participant