-
Notifications
You must be signed in to change notification settings - Fork 25
Add a custom gpu-module-to-binary pass.
#525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a custom gpu-module-to-binary pass.
#525
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a custom gpu-module-to-binary pass for Water/Wave that compiles GPU modules to HSACO binaries. The implementation is based on MLIR's upstream gpu-module-to-binary pass but streamlined to support only ROCDL/AMD targets. A key enhancement is the ability to dump and override intermediate compilation artifacts (LLVM IR, assembly) for debugging purposes.
Key changes:
- Implements
WaterGPUModuleToBinarypass that translates GPU modules to LLVM IR, optimizes them, compiles to ISA, and assembles to HSACO binaries - Adds
assembleISAToHSACOutility function that uses LLVM MC infrastructure and ld.lld to produce HSACO files - Provides dump-intermediates and override-intermediates options for debugging compilation stages
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| water/lib/Transforms/GPUModuleToBinary.cpp | Main pass implementation handling GPU module serialization through multiple stages (LLVM IR translation, linking, optimization, ISA compilation, and HSACO assembly) |
| water/lib/Transforms/AssembleISA.cpp | Assembly utilities for converting ISA to HSACO using LLVM MC infrastructure and ld.lld linker |
| water/lib/Transforms/AssembleISA.h | Header defining assembly interface and AMDGPU target initialization |
| water/include/water/Transforms/Passes.td | Pass definition with options for toolkit path, bitcode linking, and intermediate file dumping/overriding |
| water/lib/Transforms/CMakeLists.txt | Build configuration adding new source files and MLIRROCDLTarget library dependency |
| water/tools/water-opt/water-opt.cpp | Registration of LLVM conversion interfaces, GPU passes, and translation modules needed for the new pass |
| water/tools/water-opt/CMakeLists.txt | Added LLVM IR translation library dependencies |
| water/test/Transforms/gpu-module-to-binary.mlir | Basic test verifying GPU module to binary conversion |
| water/test/Transforms/gpu-module-to-binary-dump.mlir | Test validating intermediate file dumping functionality |
| water/test/Transforms/gpu-module-to-binary-override.mlir | Test demonstrating intermediate file override capability for debugging |
Comments suppressed due to low confidence (3)
water/lib/Transforms/CMakeLists.txt:29
- Missing library dependency:
GPUModuleToBinary.cppcallstranslateModuleToLLVMIRwhich requires theMLIRTargetLLVMIRExportlibrary. This should be added to theLINK_LIBS PUBLICsection to ensure proper linking.
LINK_LIBS PUBLIC
MLIRAnalysis
MLIRArithDialect
MLIRControlFlowDialect
MLIRFuncDialect
MLIRGPUDialect
MLIRIR
MLIRLLVMDialect
MLIRMemRefDialect
MLIRPass
MLIRROCDLTarget
MLIRRewrite
MLIRTransformUtils
MLIRVectorDialect
MLIRWaterAnalysis
water/lib/Transforms/CMakeLists.txt:29
- Missing library dependency:
GPUModuleToBinary.cppusesmakeOptimizingTransformerfrommlir/ExecutionEngine/OptUtils.h, which requires theMLIRExecutionEnginelibrary. This should be added to theLINK_LIBS PUBLICsection.
LINK_LIBS PUBLIC
MLIRAnalysis
MLIRArithDialect
MLIRControlFlowDialect
MLIRFuncDialect
MLIRGPUDialect
MLIRIR
MLIRLLVMDialect
MLIRMemRefDialect
MLIRPass
MLIRROCDLTarget
MLIRRewrite
MLIRTransformUtils
MLIRVectorDialect
MLIRWaterAnalysis
water/lib/Transforms/CMakeLists.txt:29
- Missing library dependency:
GPUModuleToBinary.cppusesROCDL::ROCDLTargetAttrandROCDL::getROCMPath()which require theMLIRROCDLDialectlibrary. This should be added to theLINK_LIBS PUBLICsection.
LINK_LIBS PUBLIC
MLIRAnalysis
MLIRArithDialect
MLIRControlFlowDialect
MLIRFuncDialect
MLIRGPUDialect
MLIRIR
MLIRLLVMDialect
MLIRMemRefDialect
MLIRPass
MLIRROCDLTarget
MLIRRewrite
MLIRTransformUtils
MLIRVectorDialect
MLIRWaterAnalysis
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Is the ability to inject a different assembly something we should rather put upstream? @fabianmcg may be interested and he created the current design for this pass. |
| (void)initialized; | ||
| } | ||
|
|
||
| FailureOr<SmallVector<char, 0>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preferred data type for raw memory is unsigned char or std::byte.
edit: I now see it used everywhere like this in the MLIR codebase. I don't understand why.
Using unsigned char or std::byte over char is done to disambiguate strings which are null terminated from raw bytes, mostly to avoid their accidental use in C string apis.
In this case it might be worth disambiguating because hsaco has both a serialized and a text representation. Which I would expect to be const char* and unsigned char* respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, all llvm classes like MemoryBuffer are using char*.
| return op->emitError("Failed to read HSACO from temporary file"); | ||
|
|
||
| StringRef buffer = (*hsacoFile)->getBuffer(); | ||
| return SmallVector<char, 0>(buffer.begin(), buffer.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return SmallVector<char, 0>(buffer.begin(), buffer.end()); | |
| return SmallVector<unsigned char, 0>(buffer.bytes_begin(), buffer.bytes_end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the prev comment, I can it, but it will be a lot of pointer casts.
Maybe in a month (or three), after we bikeshed callback return types enough llvm/llvm-project#170134 |
|
FWIW, it's been in my TODO list for a while to break From @Hardcode84 original PR upstream, there seemed no need to modify LLVM or assembly, my issue with the PR is the signal of an error inside a callback without propagation. |
I went upstream with just dump first as most non-controversial, and even it got stuck (see PR on callbacks) |
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
1228cce to
7f134f6
Compare
You shouldn't need errors if you just dump though :) It becomes controversial because people pick up on what you actually intend to do. |
|
Anyway, lets merge this one for now? |
|
ping. This one is actually blocking a few other things as there is no way to dump asm in the new pipeline. |
ftynse
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have it, but remove it by end of January after making upstream to work properly. Having a copy of the code is a long-term maintenance liability.
|
FWIW if you only care about dumping the ASM to check it, you can use |
|
|
This is mostly copypasted from the upstream
gpu-module-to-binarywith non-essential parts stripped down (like other vendors support, hehe).The one improvement over the upstream pass is the ability to dump and override LLVM IR and assembly, for debugging purposes.