Round up small-type groupby outputs to 4-byte boundary #20455

PointKernel · 2025-10-31T23:06:59Z

Description

A temporary workaround of NVIDIA/cccl#6430

This PR updates the groupby logic to round up output buffer sizes to a 4-byte boundary when the column data type is smaller than 4 bytes. This prevents false-positive memcheck failures from CCCL, where 4B CAS loops are used to emulate 1B or 2B atomic operations.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…memcheck

cpp/src/groupby/hash/output_utils.cu

davidwendt · 2025-10-31T23:16:43Z

cpp/src/groupby/hash/output_utils.cu

      auto const mask_flag = nullable ? mask_state::ALL_NULL : mask_state::UNALLOCATED;
      return make_fixed_width_column(
-        cudf::detail::target_type(col_type, agg), output_size, mask_flag, stream, mr);
+        cudf::detail::target_type(col_type, agg), adjusted_size, mask_flag, stream, mr);


Does this mean we change the number of rows in the output column?

Yeah we should not do this. The size of output results should match exactly with the number of output keys. I think the only way for us to fix this issue is to modify the aggregation kernel instead, adding specializing code for the cases with output type having size smaller than 4.

Yeah, I did notice this issue, but all the C++ tests passed when I ran them locally, including the Compute Sanitizer checks. Let me take a closer look.

How about this instead? This will create the column manually instead of using the factory methods.

auto const col_type = is_dictionary(col.type()) ? dictionary_column_view(col).keys().type() : col.type(); auto const nullable = agg != aggregation::COUNT_VALID && agg != aggregation::COUNT_ALL && col.has_nulls(); auto const make_empty_column = [&](data_type dt, size_type size, mask_state state) { auto const type_size = cudf::size_of(dt); if (type_size < 4) { auto adjusted_size = cudf::util::round_up_safe(size, static_cast<size_type>(4)); auto buffer = rmm::device_buffer(adjusted_size * type_size, stream, mr); auto mask = cudf::detail::create_null_mask(size, state, stream, mr); auto null_count = state == mask_state::UNINITIALIZED ? 0 : state_null_count(state, size); return std::make_unique<column>(dt, size, std::move(buffer), std::move(mask), null_count); } return make_fixed_width_column(dt, size, state, stream, mr); }; if (agg != aggregation::SUM_WITH_OVERFLOW) { auto const target_type = cudf::detail::target_type(col_type, agg); auto const mask_flag = nullable ? mask_state::ALL_NULL : mask_state::UNALLOCATED; return make_empty_column(target_type, output_size, mask_flag); } auto make_children = [&make_empty_column](size_type size) { std::vector<std::unique_ptr<column>> children; children.push_back( make_empty_column(data_type{type_id::INT64}, size, mask_state::UNALLOCATED)); children.push_back( make_empty_column(data_type{type_id::BOOL8}, size, mask_state::UNALLOCATED)); return children; };

This passed all the tests as well.

I tried using a buffer directly with cursor instead of the column factory but didn’t have any luck. Thanks to @davidwendt for showing how to get it working.

Aminsed · 2025-11-02T16:56:27Z

@PointKernel Thanks for putting this workaround together! FYI I just opened NVIDIA/cccl#6442, which fixes directly by making cuda::atomic_ref for sub-4-byte types use a byte-granular path. Once that lands and RAPIDS pulls the new CCCL, compute-sanitizer no longer flags the groupby run, so the padding workaround shouldn’t be necessary anymore. Thanks again for keeping things unblocked in the meantime!

…memcheck

ttnghia · 2025-11-03T17:35:17Z

Once that lands and RAPIDS pulls the new CCCL, compute-sanitizer no longer flags the groupby run, so the padding workaround shouldn’t be necessary anymore.

@PointKernel Should we just close this PR and wait instead?

PointKernel · 2025-11-03T17:40:05Z

Once that lands and RAPIDS pulls the new CCCL, compute-sanitizer no longer flags the groupby run, so the padding workaround shouldn’t be necessary anymore.

@PointKernel Should we just close this PR and wait instead?

The CCCL PR was closed, concluding that silencing the warning rather than dispatching on smaller types is the right approach, so the proper fix may take longer than expected. IMO, we should make this work in the meantime, but I’m open to suggestions.

cpp/src/groupby/hash/output_utils.cu

…memcheck

PointKernel · 2025-11-04T00:59:35Z

/merge

PointKernel added 2 commits October 31, 2025 16:00

Fix the groupby memcheck failure with 1B and 2B cols

1852856

Merge remote-tracking branch 'upstream/main' into fix/groupby-atomic-…

817802b

…memcheck

PointKernel self-assigned this Oct 31, 2025

PointKernel requested a review from a team as a code owner October 31, 2025 23:07

PointKernel added libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function labels Oct 31, 2025

PointKernel requested review from karthikeyann and mythrocks October 31, 2025 23:07

PointKernel added non-breaking Non-breaking change bug Something isn't working and removed improvement Improvement / enhancement to an existing function labels Oct 31, 2025

davidwendt reviewed Oct 31, 2025

View reviewed changes

cpp/src/groupby/hash/output_utils.cu Outdated Show resolved Hide resolved

davidwendt reviewed Oct 31, 2025

View reviewed changes

PointKernel added 2 commits October 31, 2025 16:22

Update comments

3b7e4e5

Cleanups

143c9d3

PointKernel added 2 commits November 3, 2025 09:10

Merge remote-tracking branch 'upstream/main' into fix/groupby-atomic-…

f9e0ee4

…memcheck

Create groupby output column using adjust size buffer

f178623

PointKernel changed the title ~~Align groupby output buffers to 4 bytes for small data types~~ Round up small-type groupby outputs to 4-byte boundary Nov 3, 2025

davidwendt approved these changes Nov 3, 2025

View reviewed changes

PointKernel requested a review from ttnghia November 3, 2025 19:17

ttnghia reviewed Nov 3, 2025

View reviewed changes

cpp/src/groupby/hash/output_utils.cu Outdated Show resolved Hide resolved

ttnghia reviewed Nov 3, 2025

View reviewed changes

cpp/src/groupby/hash/output_utils.cu Outdated Show resolved Hide resolved

ttnghia mentioned this pull request Nov 3, 2025

Support signed integers and decimals in SUM_WITH_OVERFLOW groupby #19598

Merged

3 tasks

PointKernel added 3 commits November 3, 2025 14:06

Merge remote-tracking branch 'upstream/main' into fix/groupby-atomic-…

d03dd33

…memcheck

Renaming for clarity

a8232d0

Cleanup

db67f08

ttnghia approved these changes Nov 3, 2025

View reviewed changes

rapids-bot bot merged commit fae01c4 into rapidsai:main Nov 4, 2025
137 checks passed

PointKernel deleted the fix/groupby-atomic-memcheck branch November 4, 2025 00:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Round up small-type groupby outputs to 4-byte boundary #20455

Round up small-type groupby outputs to 4-byte boundary #20455

PointKernel commented Oct 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

davidwendt Oct 31, 2025

Uh oh!

ttnghia Oct 31, 2025

Uh oh!

PointKernel Oct 31, 2025

Uh oh!

davidwendt Nov 3, 2025

Uh oh!

PointKernel Nov 3, 2025

Uh oh!

Aminsed commented Nov 2, 2025

Uh oh!

ttnghia commented Nov 3, 2025

Uh oh!

PointKernel commented Nov 3, 2025

Uh oh!

Uh oh!

Uh oh!

PointKernel commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Round up small-type groupby outputs to 4-byte boundary #20455

Round up small-type groupby outputs to 4-byte boundary #20455

Conversation

PointKernel commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

Uh oh!

davidwendt Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

ttnghia Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

PointKernel Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

davidwendt Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

PointKernel Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Aminsed commented Nov 2, 2025

Uh oh!

ttnghia commented Nov 3, 2025

Uh oh!

PointKernel commented Nov 3, 2025

Uh oh!

Uh oh!

Uh oh!

PointKernel commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PointKernel commented Oct 31, 2025 •

edited

Loading