Blackwell compile error: “Register allocation failed with register count of ‘7’” — where should I report? #7071
-
|
Hi! I’m not sure if this belongs in the CCCL forum, but I’m looking for guidance / the right place to ask. I have CUDA code that compiles fine for Ampere and Ada, but fails when targeting Blackwell with: (C7600) Register allocation failed with register count of '7'. Compile the program with a higher register target. This doesn’t look like real register pressure (the same code is <64 regs on Ampere/Ada), and even -maxrregcount=128 doesn’t help.
I can share a repro + exact nvcc command line if helpful. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
|
I narrowed this down further and it looks like a ptxas sm_120 optimization issue not real register pressure. If I compile for sm_89, everything is fine. If I take the same generated PTX and assemble it directly:
This happens even though the PTX does not use .maxnreg. Workaround experiments on sm_120:
So for sm_120, it appears only -O0 or -Ofast-compile=max avoid the failure; anything else reliably triggers the failure with the suspicious “register count of 7”. ptxas: NVIDIA (R) Ptx optimizing assembler |
Beta Was this translation helpful? Give feedback.
-
See here for reporting non-CCCL bugs. |
Beta Was this translation helpful? Give feedback.
See here for reporting non-CCCL bugs.