Skip to content

Blackwell cluster-level work-stealing#3908

Closed
gonzalobg wants to merge 10 commits intoNVIDIA:mainfrom
gonzalobg:try_cancel_cluster
Closed

Blackwell cluster-level work-stealing#3908
gonzalobg wants to merge 10 commits intoNVIDIA:mainfrom
gonzalobg:try_cancel_cluster

Conversation

@gonzalobg
Copy link
Contributor

@gonzalobg gonzalobg commented Feb 22, 2025

Description

This PR extends the work-stealing APIs with for_each_canceled_cluster to enable cluster-level work stealing (see work-stealing tracking issue: #3870).

It improves the example to show prologue and epilogue reuse, and moves the example to the examples directory so that it can be tested.

We should probably explore whether we actually need two APIs, or whether for_each_canceled_block could internally detect a cluster and just internally switch to do cluster-level work stealing. Exploring this is tracked in #3870.

Checklist

  • I am familiar with the Contributing Guidelines. -->
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@gonzalobg gonzalobg requested review from a team as code owners February 22, 2025 13:14
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@gonzalobg
Copy link
Contributor Author

pre-commit.ci autofix

@bernhardmgruber
Copy link
Contributor

pre-commit.ci autofix

@bernhardmgruber
Copy link
Contributor

/ok to test

@github-actions
Copy link
Contributor

🟨 CI finished in 1h 08m: Pass: 75%/158 | Total: 22h 42m | Avg: 8m 37s | Max: 40m 15s | Hits: 95%/151073
  • 🟨 libcudacxx: Pass: 11%/43 | Total: 4h 58m | Avg: 6m 56s | Max: 25m 21s | Hits: 97%/5880

    🔍 sm: 90 🔍
      🟩 75                 Pass: 100%/2   | Total: 30m 25s | Avg: 15m 12s | Max: 15m 47s | Hits:  90%/40    
      🔍 90                 Pass:  50%/2   | Total: 17m 36s | Avg:  8m 48s | Max: 13m 17s | Hits:  99%/2920  
      🟩 90;90a;100         Pass: 100%/1   | Total: 15m 45s | Avg: 15m 45s | Max: 15m 45s | Hits:  95%/2920  
    🟨 jobs
      🟨 Build              Pass:   5%/37  | Total:  4h 12m | Avg:  6m 49s | Max: 25m 21s | Hits:  97%/5840  
      🟩 NVRTC              Pass: 100%/2   | Total: 30m 25s | Avg: 15m 12s | Max: 15m 47s | Hits:  90%/40    
      🟥 Test               Pass:   0%/3   | Total: 13m 17s | Avg:  4m 25s | Max: 13m 17s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 16s | Avg:  2m 16s | Max:  2m 16s
    🟨 cpu
      🟨 amd64              Pass:  12%/41  | Total:  4h 50m | Avg:  7m 05s | Max: 25m 21s | Hits:  97%/5880  
      🟥 arm64              Pass:   0%/2   | Total:  8m 00s | Avg:  4m 00s | Max:  4m 11s
    🟨 ctk
      🟥 12.0               Pass:   0%/5   | Total: 37m 01s | Avg:  7m 24s | Max: 21m 59s
      🟥 12.5               Pass:   0%/2   | Total: 18m 19s | Avg:  9m 09s | Max:  9m 15s
      🟨 12.8               Pass:  13%/36  | Total:  4h 03m | Avg:  6m 45s | Max: 25m 21s | Hits:  97%/5880  
    🟨 cudacxx
      🟥 ClangCUDA18        Pass:   0%/2   | Total:  4m 48s | Avg:  2m 24s | Max:  2m 29s
      🟥 nvcc12.0           Pass:   0%/5   | Total: 37m 01s | Avg:  7m 24s | Max: 21m 59s
      🟥 nvcc12.5           Pass:   0%/2   | Total: 18m 19s | Avg:  9m 09s | Max:  9m 15s
      🟨 nvcc12.8           Pass:  14%/34  | Total:  3h 58m | Avg:  7m 00s | Max: 25m 21s | Hits:  97%/5880  
    🟨 cudacxx_family
      🟥 ClangCUDA          Pass:   0%/2   | Total:  4m 48s | Avg:  2m 24s | Max:  2m 29s
      🟨 nvcc               Pass:  12%/41  | Total:  4h 53m | Avg:  7m 10s | Max: 25m 21s | Hits:  97%/5880  
    🟨 cxx
      🟥 Clang14            Pass:   0%/4   | Total: 19m 26s | Avg:  4m 51s | Max:  6m 58s
      🟥 Clang15            Pass:   0%/2   | Total:  9m 27s | Avg:  4m 43s | Max:  4m 50s
      🟥 Clang16            Pass:   0%/2   | Total: 11m 58s | Avg:  5m 59s | Max:  7m 29s
      🟥 Clang17            Pass:   0%/2   | Total:  9m 18s | Avg:  4m 39s | Max:  4m 53s
      🟥 Clang18            Pass:   0%/6   | Total: 17m 48s | Avg:  2m 58s | Max:  4m 32s
      🟥 GCC7               Pass:   0%/2   | Total:  7m 26s | Avg:  3m 43s | Max:  3m 58s
      🟥 GCC8               Pass:   0%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s
      🟥 GCC9               Pass:   0%/2   | Total: 11m 43s | Avg:  5m 51s | Max:  7m 51s
      🟥 GCC10              Pass:   0%/2   | Total:  8m 10s | Avg:  4m 05s | Max:  4m 13s
      🟥 GCC11              Pass:   0%/2   | Total:  8m 13s | Avg:  4m 06s | Max:  4m 18s
      🟥 GCC12              Pass:   0%/2   | Total:  8m 32s | Avg:  4m 16s | Max:  4m 24s
      🟨 GCC13              Pass:  50%/10  | Total:  1h 19m | Avg:  7m 55s | Max: 15m 47s | Hits:  97%/5880  
      🟥 MSVC14.29          Pass:   0%/2   | Total: 47m 20s | Avg: 23m 40s | Max: 25m 21s
      🟥 MSVC14.42          Pass:   0%/2   | Total: 37m 38s | Avg: 18m 49s | Max: 25m 06s
      🟥 NVHPC24.7          Pass:   0%/2   | Total: 18m 19s | Avg:  9m 09s | Max:  9m 15s
    🟨 cxx_family
      🟥 Clang              Pass:   0%/16  | Total:  1h 07m | Avg:  4m 14s | Max:  7m 29s
      🟨 GCC                Pass:  23%/21  | Total:  2h 07m | Avg:  6m 04s | Max: 15m 47s | Hits:  97%/5880  
      🟥 MSVC               Pass:   0%/4   | Total:  1h 24m | Avg: 21m 14s | Max: 25m 21s
      🟥 NVHPC              Pass:   0%/2   | Total: 18m 19s | Avg:  9m 09s | Max:  9m 15s
    🟨 gpu
      🟨 h100               Pass:  50%/2   | Total: 17m 36s | Avg:  8m 48s | Max: 13m 17s | Hits:  99%/2920  
      🟨 rtx2080            Pass:   9%/41  | Total:  4h 41m | Avg:  6m 51s | Max: 25m 21s | Hits:  95%/2960  
    🟨 std
      🟨 17                 Pass:   4%/21  | Total:  2h 47m | Avg:  7m 58s | Max: 25m 21s | Hits:  90%/20    
      🟨 20                 Pass:  14%/21  | Total:  2h 08m | Avg:  6m 08s | Max: 15m 45s | Hits:  97%/5860  
    
  • 🟩 cub: Pass: 100%/45 | Total: 8h 19m | Avg: 11m 06s | Max: 30m 42s | Hits: 93%/53485

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  8h 08m | Avg: 11m 22s | Max: 30m 42s | Hits:  92%/51055 
      🟩 arm64              Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 44s | Hits:  99%/2430  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 51m 42s | Avg: 10m 20s | Max: 29m 21s | Hits:  85%/5908  
      🟩 12.5               Pass: 100%/2   | Total: 20m 06s | Avg: 10m 03s | Max: 10m 25s | Hits:  98%/2248  
      🟩 12.8               Pass: 100%/38  | Total:  7h 08m | Avg: 11m 15s | Max: 30m 42s | Hits:  94%/45329 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  5m 07s | Hits: 100%/2100  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 51m 42s | Avg: 10m 20s | Max: 29m 21s | Hits:  85%/5908  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 20m 06s | Avg: 10m 03s | Max: 10m 25s | Hits:  98%/2248  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  6h 58m | Avg: 11m 36s | Max: 30m 42s | Hits:  93%/43229 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  5m 07s | Hits: 100%/2100  
      🟩 nvcc               Pass: 100%/43  | Total:  8h 09m | Avg: 11m 23s | Max: 30m 42s | Hits:  92%/51385 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 23m 09s | Avg:  5m 47s | Max:  6m 19s | Hits: 100%/4868  
      🟩 Clang15            Pass: 100%/2   | Total: 12m 37s | Avg:  6m 18s | Max:  6m 23s | Hits: 100%/2430  
      🟩 Clang16            Pass: 100%/2   | Total: 12m 17s | Avg:  6m 08s | Max:  6m 10s | Hits: 100%/2430  
      🟩 Clang17            Pass: 100%/2   | Total: 12m 29s | Avg:  6m 14s | Max:  6m 22s | Hits: 100%/2430  
      🟩 Clang18            Pass: 100%/7   | Total:  1h 11m | Avg: 10m 11s | Max: 22m 33s | Hits: 100%/8175  
      🟩 GCC7               Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  6m 22s | Hits:  99%/2434  
      🟩 GCC8               Pass: 100%/1   | Total:  6m 22s | Avg:  6m 22s | Max:  6m 22s | Hits:  99%/1217  
      🟩 GCC9               Pass: 100%/2   | Total: 12m 13s | Avg:  6m 06s | Max:  6m 18s | Hits:  99%/2434  
      🟩 GCC10              Pass: 100%/2   | Total: 12m 40s | Avg:  6m 20s | Max:  6m 21s | Hits:  99%/2434  
      🟩 GCC11              Pass: 100%/2   | Total: 12m 32s | Avg:  6m 16s | Max:  6m 27s | Hits:  99%/2430  
      🟩 GCC12              Pass: 100%/2   | Total: 13m 15s | Avg:  6m 37s | Max:  6m 47s | Hits:  99%/2430  
      🟩 GCC13              Pass: 100%/11  | Total:  2h 40m | Avg: 14m 37s | Max: 23m 56s | Hits:  99%/13365 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 58m 46s | Avg: 29m 23s | Max: 29m 25s | Hits:  15%/2080  
      🟩 MSVC14.42          Pass: 100%/2   | Total: 59m 20s | Avg: 29m 40s | Max: 30m 42s | Hits:  15%/2080  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 20m 06s | Avg: 10m 03s | Max: 10m 25s | Hits:  98%/2248  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 11m | Avg:  7m 45s | Max: 22m 33s | Hits: 100%/20333 
      🟩 GCC                Pass: 100%/22  | Total:  3h 49m | Avg: 10m 26s | Max: 23m 56s | Hits:  99%/26744 
      🟩 MSVC               Pass: 100%/4   | Total:  1h 58m | Avg: 29m 31s | Max: 30m 42s | Hits:  15%/4160  
      🟩 NVHPC              Pass: 100%/2   | Total: 20m 06s | Avg: 10m 03s | Max: 10m 25s | Hits:  98%/2248  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total: 51m 48s | Avg: 17m 16s | Max: 23m 56s | Hits:  99%/3645  
      🟩 rtx2080            Pass: 100%/34  | Total:  5h 09m | Avg:  9m 05s | Max: 30m 42s | Hits:  91%/40120 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 18m | Avg: 17m 21s | Max: 23m 16s | Hits:  99%/9720  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 27m | Avg:  8m 50s | Max: 30m 42s | Hits:  91%/43765 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 47s | Avg: 20m 47s | Max: 20m 47s | Hits:  99%/1215  
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 41s | Avg: 17m 41s | Max: 17m 41s | Hits:  99%/1215  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 09m | Avg: 23m 15s | Max: 23m 56s | Hits:  99%/3645  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 04m | Avg: 21m 28s | Max: 22m 55s | Hits:  99%/3645  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 51m 48s | Avg: 17m 16s | Max: 23m 56s | Hits:  99%/3645  
      🟩 90;90a;100         Pass: 100%/1   | Total:  7m 14s | Avg:  7m 14s | Max:  7m 14s | Hits:  99%/1215  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 15m | Avg:  9m 45s | Max: 29m 25s | Hits:  88%/23535 
      🟩 20                 Pass: 100%/25  | Total:  5h 04m | Avg: 12m 11s | Max: 30m 42s | Hits:  96%/29950 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 6h 31m | Avg: 8m 42s | Max: 33m 31s | Hits: 96%/80136

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 56s | Avg:  8m 28s | Max: 11m 02s | Hits:  99%/3564  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  6h 22m | Avg:  8m 53s | Max: 33m 31s | Hits:  96%/76573 
      🟩 arm64              Pass: 100%/2   | Total:  9m 30s | Avg:  4m 45s | Max:  5m 03s | Hits:  99%/3563  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 41m 40s | Avg:  8m 20s | Max: 22m 03s | Hits:  94%/8901  
      🟩 12.5               Pass: 100%/2   | Total: 26m 35s | Avg: 13m 17s | Max: 13m 34s | Hits:  99%/3562  
      🟩 12.8               Pass: 100%/38  | Total:  5h 23m | Avg:  8m 30s | Max: 33m 31s | Hits:  96%/67673 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  5m 16s | Hits: 100%/3562  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 41m 40s | Avg:  8m 20s | Max: 22m 03s | Hits:  94%/8901  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 26m 35s | Avg: 13m 17s | Max: 13m 34s | Hits:  99%/3562  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  5h 13m | Avg:  8m 42s | Max: 33m 31s | Hits:  96%/64111 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  5m 16s | Hits: 100%/3562  
      🟩 nvcc               Pass: 100%/43  | Total:  6h 21m | Avg:  8m 52s | Max: 33m 31s | Hits:  96%/76574 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 03s | Avg:  5m 00s | Max:  5m 17s | Hits: 100%/7124  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 01s | Avg:  5m 30s | Max:  5m 42s | Hits: 100%/3562  
      🟩 Clang16            Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 55s | Hits: 100%/3562  
      🟩 Clang17            Pass: 100%/2   | Total: 11m 41s | Avg:  5m 50s | Max:  6m 15s | Hits: 100%/3562  
      🟩 Clang18            Pass: 100%/7   | Total: 43m 20s | Avg:  6m 11s | Max: 10m 10s | Hits: 100%/12467 
      🟩 GCC7               Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  5m 25s | Hits:  99%/3564  
      🟩 GCC8               Pass: 100%/1   | Total:  5m 46s | Avg:  5m 46s | Max:  5m 46s | Hits:  99%/1782  
      🟩 GCC9               Pass: 100%/2   | Total: 10m 47s | Avg:  5m 23s | Max:  5m 44s | Hits:  99%/3564  
      🟩 GCC10              Pass: 100%/2   | Total: 10m 46s | Avg:  5m 23s | Max:  5m 28s | Hits:  99%/3564  
      🟩 GCC11              Pass: 100%/2   | Total: 11m 43s | Avg:  5m 51s | Max:  5m 54s | Hits:  99%/3564  
      🟩 GCC12              Pass: 100%/2   | Total: 11m 40s | Avg:  5m 50s | Max:  6m 12s | Hits:  99%/3564  
      🟩 GCC13              Pass: 100%/10  | Total:  1h 17m | Avg:  7m 43s | Max: 11m 23s | Hits:  99%/17820 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 45m 52s | Avg: 22m 56s | Max: 23m 49s | Hits:  70%/3550  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  1h 23m | Avg: 27m 54s | Max: 33m 31s | Hits:  70%/5325  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 26m 35s | Avg: 13m 17s | Max: 13m 34s | Hits:  99%/3562  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 37m | Avg:  5m 42s | Max: 10m 10s | Hits: 100%/30277 
      🟩 GCC                Pass: 100%/21  | Total:  2h 18m | Avg:  6m 35s | Max: 11m 23s | Hits:  99%/37422 
      🟩 MSVC               Pass: 100%/5   | Total:  2h 09m | Avg: 25m 54s | Max: 33m 31s | Hits:  70%/8875  
      🟩 NVHPC              Pass: 100%/2   | Total: 26m 35s | Avg: 13m 17s | Max: 13m 34s | Hits:  99%/3562  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 16m 04s | Avg:  8m 02s | Max: 11m 23s | Hits:  99%/3564  
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 09m | Avg:  7m 32s | Max: 23m 49s | Hits:  97%/58769 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 06m | Avg: 12m 38s | Max: 33m 31s | Hits:  94%/17803 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  4h 57m | Avg:  7m 50s | Max: 26m 59s | Hits:  96%/67671 
      🟩 TestCPU            Pass: 100%/3   | Total: 49m 53s | Avg: 16m 37s | Max: 33m 31s | Hits:  90%/5338  
      🟩 TestGPU            Pass: 100%/4   | Total: 43m 52s | Avg: 10m 58s | Max: 11m 23s | Hits:  99%/7127  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 16m 04s | Avg:  8m 02s | Max: 11m 23s | Hits:  99%/3564  
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 18s | Avg:  6m 18s | Max:  6m 18s | Hits:  99%/1782  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  2h 51m | Avg:  8m 33s | Max: 23m 49s | Hits:  95%/35611 
      🟩 20                 Pass: 100%/23  | Total:  3h 23m | Avg:  8m 50s | Max: 33m 31s | Hits:  97%/40961 
    
  • 🟩 cudax: Pass: 100%/22 | Total: 1h 57m | Avg: 5m 20s | Max: 14m 07s | Hits: 97%/11264

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  1h 45m | Avg:  5m 50s | Max: 14m 07s | Hits:  97%/9036  
      🟩 arm64              Pass: 100%/4   | Total: 12m 12s | Avg:  3m 03s | Max:  3m 13s | Hits:  99%/2228  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  9m 42s | Avg:  9m 42s | Max:  9m 42s | Hits:  61%/262   
      🟩 12.5               Pass: 100%/2   | Total: 10m 15s | Avg:  5m 07s | Max:  5m 12s | Hits:  96%/710   
      🟩 12.8               Pass: 100%/19  | Total:  1h 37m | Avg:  5m 07s | Max: 14m 07s | Hits:  98%/10292 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 42s | Avg:  9m 42s | Max:  9m 42s | Hits:  61%/262   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 10m 15s | Avg:  5m 07s | Max:  5m 12s | Hits:  96%/710   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  1h 37m | Avg:  5m 07s | Max: 14m 07s | Hits:  98%/10292 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  1h 57m | Avg:  5m 20s | Max: 14m 07s | Hits:  97%/11264 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 22s | Avg:  3m 22s | Max:  3m 22s | Hits: 100%/559   
      🟩 Clang15            Pass: 100%/1   | Total:  3m 11s | Avg:  3m 11s | Max:  3m 11s | Hits: 100%/557   
      🟩 Clang16            Pass: 100%/1   | Total:  3m 20s | Avg:  3m 20s | Max:  3m 20s | Hits: 100%/557   
      🟩 Clang17            Pass: 100%/1   | Total:  3m 29s | Avg:  3m 29s | Max:  3m 29s | Hits: 100%/557   
      🟩 Clang18            Pass: 100%/4   | Total: 21m 32s | Avg:  5m 23s | Max: 11m 49s | Hits:  99%/2228  
      🟩 GCC10              Pass: 100%/1   | Total:  3m 18s | Avg:  3m 18s | Max:  3m 18s | Hits:  99%/559   
      🟩 GCC11              Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s | Hits:  99%/557   
      🟩 GCC12              Pass: 100%/2   | Total: 17m 25s | Avg:  8m 42s | Max: 14m 07s | Hits:  99%/1114  
      🟩 GCC13              Pass: 100%/6   | Total: 29m 37s | Avg:  4m 56s | Max: 13m 48s | Hits:  99%/3342  
      🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 42s | Avg:  9m 42s | Max:  9m 42s | Hits:  61%/262   
      🟩 MSVC14.42          Pass: 100%/1   | Total:  8m 59s | Avg:  8m 59s | Max:  8m 59s | Hits:  61%/262   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 10m 15s | Avg:  5m 07s | Max:  5m 12s | Hits:  96%/710   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 34m 54s | Avg:  4m 21s | Max: 11m 49s | Hits:  99%/4458  
      🟩 GCC                Pass: 100%/10  | Total: 53m 35s | Avg:  5m 21s | Max: 14m 07s | Hits:  99%/5572  
      🟩 MSVC               Pass: 100%/2   | Total: 18m 41s | Avg:  9m 20s | Max:  9m 42s | Hits:  61%/524   
      🟩 NVHPC              Pass: 100%/2   | Total: 10m 15s | Avg:  5m 07s | Max:  5m 12s | Hits:  96%/710   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 18s | Avg:  8m 39s | Max: 13m 48s | Hits:  98%/1114  
      🟩 rtx2080            Pass: 100%/20  | Total:  1h 40m | Avg:  5m 00s | Max: 14m 07s | Hits:  97%/10150 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  1h 17m | Avg:  4m 05s | Max:  9m 42s | Hits:  97%/9593  
      🟩 Test               Pass: 100%/3   | Total: 39m 44s | Avg: 13m 14s | Max: 14m 07s | Hits:  99%/1671  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 20m 52s | Avg:  6m 57s | Max: 13m 48s | Hits:  98%/1671  
      🟩 90a                Pass: 100%/1   | Total:  2m 58s | Avg:  2m 58s | Max:  2m 58s | Hits:  99%/557   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 14m 43s | Avg:  3m 40s | Max:  5m 03s | Hits:  98%/2026  
      🟩 20                 Pass: 100%/18  | Total:  1h 42m | Avg:  5m 42s | Max: 14m 07s | Hits:  97%/9238  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 15m 09s | Avg: 7m 34s | Max: 12m 46s | Hits: 98%/308

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 15m 09s | Avg:  7m 34s | Max: 12m 46s | Hits:  98%/308   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 15m 09s | Avg:  7m 34s | Max: 12m 46s | Hits:  98%/308   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 15m 09s | Avg:  7m 34s | Max: 12m 46s | Hits:  98%/308   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 15m 09s | Avg:  7m 34s | Max: 12m 46s | Hits:  98%/308   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 15m 09s | Avg:  7m 34s | Max: 12m 46s | Hits:  98%/308   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 15m 09s | Avg:  7m 34s | Max: 12m 46s | Hits:  98%/308   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 09s | Avg:  7m 34s | Max: 12m 46s | Hits:  98%/308   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 23s | Avg:  2m 23s | Max:  2m 23s | Hits:  98%/154   
      🟩 Test               Pass: 100%/1   | Total: 12m 46s | Avg: 12m 46s | Max: 12m 46s | Hits:  98%/154   
    
  • 🟩 python: Pass: 100%/1 | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 40m 15s | Avg: 40m 15s | Max: 40m 15s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 158)

# Runner
111 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
5 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

#include <cuda/std/__utility/unreachable.h>
#include <cuda/std/cstdint>

#include <cooperative_groups.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cooperative groups does not work with nvc++ currently so this needs to be guarded and we need to ensure that we might build without it

@jrhemstad
Copy link
Collaborator

@gonzalobg are we still interested in carrying this forward?

@jrhemstad jrhemstad added the needs info Cannot make progress without more information. label Feb 4, 2026
@gonzalobg
Copy link
Contributor Author

Yes, but i don't know when I can get to it.
Let me close this for now.

@gonzalobg gonzalobg closed this Feb 4, 2026
@github-project-automation github-project-automation bot moved this from In Progress to Done in CCCL Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs info Cannot make progress without more information.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants