fix(substrait): schema errors for Aggregates with no groupings #17909

vbarua · 2025-10-03T23:34:43Z

Which issue does this PR close?

Rationale for this change

When consuming Substrait plans containing aggregates with no groupings, we would see the following error

Error: Substrait("Named schema must contain names for all fields")

The Substrait plan had one less field than DataFusion expected because DataFusion was adding an extra "__grouping_id" to the output of the Aggregate node. This happens when the

datafusion/datafusion/expr/src/logical_plan/plan.rs

Line 3418 in daeb659

let is_grouping_set = matches!(group_expr.as_slice(), [Expr::GroupingSet(_)]);

condition is true.

A natural followup question to this is "Why are we creating an Aggregate with a single empty GroupingSet for the group by, instead of just leaving the group by entirely?".

What changes are included in this PR?

Instead of setting group_exprs to a vector with a single empty grouping set, let's just leave group_exprs empty entirely. This means that the is_grouping_set is not triggered, so the Datafusion schema matches the Substrait schema.

Are these changes tested?

Yes

I have added direct tests via example Substrait plans

Are there any user-facing changes?

Substrait plans that were not consumable before are now consumable.

When groupings is empty, we should set group_exprs to the empty vector instead of a vector containing a single empty GroupingSet. The issue with the latter is that it triggers the addition of the extra "__grouping_id" column in the Aggregate node, which is redundant in this case AND causes conflicts in the output schema because Substrait does not expect it.

vbarua · 2025-10-04T00:58:46Z

I originally tried to add a test for the handling of plans with multiple grouping sets, however I encountered a different issues which I documented in #17910.

vbarua · 2025-10-06T17:09:46Z

cc: @Blizzara

test: verify handling of groupings in Substrait

2f7d7d3

github-actions bot added the substrait Changes to the substrait crate label Oct 3, 2025

vbarua changed the title ~~test: verify handling of groupings in Substrait~~ fix(substrait): schema errors for Aggregates with no groupings Oct 4, 2025

vbarua force-pushed the vbarua/substrait/handle-empty-groupings branch from 0b3a022 to 59561f0 Compare October 4, 2025 00:38

vbarua marked this pull request as ready for review October 4, 2025 01:06

chenkovsky mentioned this pull request Oct 6, 2025

feat: optimize grouping and introduced unparsing and substrait support #16161

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(substrait): schema errors for Aggregates with no groupings #17909

fix(substrait): schema errors for Aggregates with no groupings #17909

Uh oh!

vbarua commented Oct 3, 2025 •

edited

Loading

Uh oh!

vbarua commented Oct 4, 2025

Uh oh!

vbarua commented Oct 6, 2025

Uh oh!

Uh oh!

fix(substrait): schema errors for Aggregates with no groupings #17909

Are you sure you want to change the base?

fix(substrait): schema errors for Aggregates with no groupings #17909

Uh oh!

Conversation

vbarua commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

vbarua commented Oct 4, 2025

Uh oh!

vbarua commented Oct 6, 2025

Uh oh!

Uh oh!

vbarua commented Oct 3, 2025 •

edited

Loading