This sample is a simple code that illustrates binary partition cooperative groups and reduce within the thread block.
Cooperative Groups
SM 5.0 SM 5.2 SM 5.3 SM 6.0 SM 6.1 SM 7.0 SM 7.2 SM 7.5 SM 8.0 SM 8.6 SM 8.7 SM 8.9 SM 9.0
Linux, Windows
x86_64, armv7l
cudaStreamCreateWithFlags, cudaFree, cudaMallocHost, cudaFreeHost, cudaStreamSynchronize, cudaMalloc, cudaMemsetAsync, cudaMemcpyAsync, cudaOccupancyMaxPotentialBlockSize
Download and install the CUDA Toolkit 12.5 for your corresponding platform.