This repository was archived by the owner on Aug 15, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 230
Update sm90 to -> sm90a #1878
Closed
Closed
Update sm90 to -> sm90a #1878
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pytorchmergebot
pushed a commit
to pytorch/pytorch
that referenced
this pull request
Jun 25, 2024
…129402) # Summary Instead of landing global builder changes; pytorch/builder#1878 This PR targets only the Rowwise file and adds the sm90a featurs. Verified locally by setting: ``` TORCH_CUDA_ARCH_LIST=9.0 ``` We can see in the build.ninja file that the proper flags are set: ``` build caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RowwiseScaledMM.cu.o: CUDA_COMPILER__torch_cuda_unscanned_Release /home/drisspg/meta/pytorch/aten/src/ATen/native/cuda/RowwiseScaledMM.cu || cmake_object_order_depends_target_torch_cuda DEFINES = -DAT_PER_OPERATOR_HEADERS -DFLASHATTENTION_DISABLE_ALIBI -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_FLASH_ATTENTION -DUSE_MEM_EFF_ATTENTION -DUSE_NCCL -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cuda_EXPORTS DEP_FILE = caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/RowwiseScaledMM.cu.o.d FLAGS = -DLIBCUDACXX_ENABLE_SIMPLIFIED_COMPLEX_OPERATIONS -D_GLIBCXX_USE_CXX11_ABI=1 -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_90,code=sm_90 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -O3 -DNDEBUG -std=c++17 -Xcompiler=-fPIC -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Xcompiler=-Wall,-Wextra,-Wdeprecated,-Wno-unused-parameter,-Wno-missing-field-initializers,-Wno-unknown-pragmas,-Wno-type-limits,-Wno-array-bounds,-Wno-unknown-pragmas,-Wno-strict-overflow,-Wno-strict-aliasing,-Wno-unused-function,-Wno-maybe-uninitialized -Wno-deprecated-copy -gencode arch=compute_90a,code=sm_90a INCLUDES = -I/home/drisspg/meta/pytorch/build/aten/src -I/home/drisspg/meta/pytorch/aten/src -I/home/drisspg/meta/pytorch/build -I/home/drisspg/meta/pytorch -I/home/drisspg/meta/pytorch/third_party/onnx -I/home/drisspg/meta/pytorch/build/third_party/onnx -I/home/drisspg/meta/pytorch/third_party/foxi -I/home/drisspg/meta/pytorch/build/third_party/foxi -I/home/drisspg/meta/pytorch/aten/src/THC -I/home/drisspg/meta/pytorch/aten/src/ATen/cuda -I/home/drisspg/meta/pytorch/aten/src/ATen/../../../third_party/cutlass/include -I/home/drisspg/meta/pytorch/aten/src/ATen/../../../third_party/cutlass/tools/util/include -I/home/drisspg/meta/pytorch/build/caffe2/aten/src -I/home/drisspg/meta/pytorch/aten/src/ATen/.. -I/home/drisspg/meta/pytorch/build/nccl/include -I/home/drisspg/meta/pytorch/c10/cuda/../.. -I/home/drisspg/meta/pytorch/c10/.. -I/home/drisspg/meta/pytorch/third_party/tensorpipe -I/home/drisspg/meta/pytorch/build/third_party/tensorpipe -I/home/drisspg/meta/pytorch/third_party/tensorpipe/third_party/libnop/include -I/home/drisspg/meta/pytorch/torch/csrc/api -I/home/drisspg/meta/pytorch/torch/csrc/api/include -isystem /home/drisspg/meta/pytorch/build/third_party/gloo -isystem /home/drisspg/meta/pytorch/cmake/../third_party/gloo -isystem /home/drisspg/meta/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /home/drisspg/meta/pytorch/third_party/protobuf/src -isystem /home/drisspg/meta/pytorch/third_party/ittapi/include -isystem /home/drisspg/meta/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda-12.3/include -isystem /home/drisspg/meta/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /home/drisspg/meta/pytorch/third_party/ideep/include -isystem /home/drisspg/meta/pytorch/cmake/../third_party/cudnn_frontend/include OBJECT_DIR = caffe2/CMakeFiles/torch_cuda.dir OBJECT_FILE_DIR = caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda ``` Pull Request resolved: #129402 Approved by: https://github.com/malfet
Adding sm90a to the base docker will also help Torchbench docker remove the current workaround: pytorch/benchmark#2338 |
The concencus here from @ptrblck and @malfet was that we weren't comfortable with doing a full swap and instead, at least for pytorch there is only one translation unit that required sm90a: pytorch/pytorch#129402 |
Synced with @ptrblck we should not merge this one in favor of pytorch/pytorch#129402 |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Sm90a is needed to enable some features found in Cutlass. For some reason google as the worst SEO for finding more information about this gencode. This is the best description I have found: NVIDIA/cccl#1270
I am not sure if we want to instead build for sm90 and sm90a ?
Pytorch Features requiring this change: pytorch/pytorch#128989
cc @ptrblck @nWEIdia