Multi-key groupby([names]).sum() densifies to a dense cartesian grid (memory)

> [!NOTE]
> AI-written (Claude Code, prompted by @FBumann). memray numbers verified on the #751 branch.

Follow-up to #753, splitting out the part that #751 did **not** solve. #753's fast-path half shipped in #751 (multi-key `groupby([names])` now takes the reindex path, one dimension per key, non-breaking); this issue tracks the memory half.

## Problem

`expr.groupby(["period","season"]).sum()` returns separate `period` × `season` dims. In **dense** xarray that's a full cartesian grid — every absent key combination is a real fill cell. For a sparse/correlated crossing it blows up. memray, diagonal crossing (N=1000 observed combos):

| grouping | output | peak memory |
|---|---|---|
| `groupby([names])` | `{period:1000, season:1000}` dense grid | **33.3 MB** |
| `groupby(df)` | `{group:1000}` MultiIndex, observed-only | **0.33 MB** |

~100×, scaling as N. The whole difference is the final densification.

## Why it's inherent (in dense xarray)

Separate dims *are* a dense grid; the only compact form is a stacked `MultiIndex` (what the `DataFrame` grouper returns). "Separate dims **and** compact" is impossible without a genuinely sparse store.

## What #751 shipped (mitigation, not a fix)

- A `UserWarning` when the grid ≫ observed combinations, nudging users to the `DataFrame` grouper.
- The `DataFrame` grouper as the compact (observed-only, stacked) escape hatch.

## Real fix → #756

The sparse / long-format `_term` kernel (umbrella **#756**, which lists `groupby` densification as an entry point). Under a long-format kernel, `groupby(k).sum()` is a relational `group_by().agg()` over observed combinations only — no grid, no padding.


grouping	output	peak memory
`groupby([names])`	`{period:1000, season:1000}` dense grid	33.3 MB
`groupby(df)`	`{group:1000}` MultiIndex, observed-only	0.33 MB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-key groupby([names]).sum() densifies to a dense cartesian grid (memory) #757

Problem

Why it's inherent (in dense xarray)

What #751 shipped (mitigation, not a fix)

Real fix → #756

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Multi-key groupby([names]).sum() densifies to a dense cartesian grid (memory) #757

Description

Problem

Why it's inherent (in dense xarray)

What #751 shipped (mitigation, not a fix)

Real fix → #756

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions