Skip to content

Conversation

@polarathene
Copy link

What does this PR do?

Pruning cuBLAS for CC 7.5 now also retains sm_70 in addition to the sm_75 target. See #610 (comment) for more information.

@polarathene
Copy link
Author

polarathene commented Jun 13, 2025

NOTE: There is no known need to do this for TEI, however Nvidia encourages retaining the major CC and any minors in-between when using nvprune on cuBLAS.


Feel free to close the PR if you prefer to avoid until there's a relevant bug report. My understanding is it should only be an issue when using a kernel from cuBLAS that would defer to sm_70 when it'd have been equivalent for sm_75.

For example in the current base image used to build, sm_70 has 184 cubins vs sm_75 containing only 8:

$ cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a | grep -oE '\.sm_70.*\.' | wc -l
184

$ cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a | grep -oE '\.sm_75.*\.' | wc -l
8

# Individual cubins:
$ cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a | grep -E '\.sm_75.*\.'
ELF file    5: libcublas_static.5.sm_75.cubin
ELF file   13: libcublas_static.13.sm_75.cubin
ELF file   21: libcublas_static.21.sm_75.cubin
ELF file   29: libcublas_static.29.sm_75.cubin
ELF file   37: libcublas_static.37.sm_75.cubin
ELF file   45: libcublas_static.45.sm_75.cubin
ELF file   53: libcublas_static.53.sm_75.cubin
ELF file   61: libcublas_static.61.sm_75.cubin

I'm not entirely sure why the minor CC versions in-between (when present) might matter to be retained.


The concern does not apply to the other two supported real archs handled via nvprune as sm_80 is already provided, while sm_90 does not target anything newer (since it's the only arch for that CC major):

nvprune --generate-code code=sm_80 --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \
elif [ ${CUDA_COMPUTE_CAP} -eq 90 ]; \
then \
nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant