Skip to content

[FEA] Better control over the output dtype in aggregations #15852

Open
0 of 1 issue completed
Open
@wence-

Description

@wence-

Is your feature request related to a problem? Please describe.

For the cudf-polars work, I'd like to match dtypes with polars where possible, preferably without casting the result of a libcudf call post-hoc if the interface in theory supports specifying an output type.

For whole-frame aggregations (cudf::reduce) although one is able to specify an output_dtype, this is not obeyed for a number of aggregations. Specifically:

  • MEDIAN (always returns the datatype matching double)
  • NUNIQUE (always returns the datatype matching cudf::size_type)
  • QUANTILE (always returns the datatype matching double)

The same is true of many grouped aggregations.

Describe the solution you'd like

I'd like that aggregations could support output dtype as specified by the user.

Describe alternatives you've considered

post-hoc unary casting of the result, but this is yet another kernel launch, and produces more memory overhead.

Sub-issues

Metadata

Metadata

Assignees

Labels

PythonAffects Python cuDF API.cudf-polarsIssues specific to cudf-polarsfeature requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions