Skip to content

Conversation

@Hardcode84
Copy link
Contributor

This is mostly copypasted from the upstream gpu-module-to-binary with non-essential parts stripped down (like other vendors support, hehe).

The one improvement over the upstream pass is the ability to dump and override LLVM IR and assembly, for debugging purposes.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a custom gpu-module-to-binary pass for Water/Wave that compiles GPU modules to HSACO binaries. The implementation is based on MLIR's upstream gpu-module-to-binary pass but streamlined to support only ROCDL/AMD targets. A key enhancement is the ability to dump and override intermediate compilation artifacts (LLVM IR, assembly) for debugging purposes.

Key changes:

  • Implements WaterGPUModuleToBinary pass that translates GPU modules to LLVM IR, optimizes them, compiles to ISA, and assembles to HSACO binaries
  • Adds assembleISAToHSACO utility function that uses LLVM MC infrastructure and ld.lld to produce HSACO files
  • Provides dump-intermediates and override-intermediates options for debugging compilation stages

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
water/lib/Transforms/GPUModuleToBinary.cpp Main pass implementation handling GPU module serialization through multiple stages (LLVM IR translation, linking, optimization, ISA compilation, and HSACO assembly)
water/lib/Transforms/AssembleISA.cpp Assembly utilities for converting ISA to HSACO using LLVM MC infrastructure and ld.lld linker
water/lib/Transforms/AssembleISA.h Header defining assembly interface and AMDGPU target initialization
water/include/water/Transforms/Passes.td Pass definition with options for toolkit path, bitcode linking, and intermediate file dumping/overriding
water/lib/Transforms/CMakeLists.txt Build configuration adding new source files and MLIRROCDLTarget library dependency
water/tools/water-opt/water-opt.cpp Registration of LLVM conversion interfaces, GPU passes, and translation modules needed for the new pass
water/tools/water-opt/CMakeLists.txt Added LLVM IR translation library dependencies
water/test/Transforms/gpu-module-to-binary.mlir Basic test verifying GPU module to binary conversion
water/test/Transforms/gpu-module-to-binary-dump.mlir Test validating intermediate file dumping functionality
water/test/Transforms/gpu-module-to-binary-override.mlir Test demonstrating intermediate file override capability for debugging
Comments suppressed due to low confidence (3)

water/lib/Transforms/CMakeLists.txt:29

  • Missing library dependency: GPUModuleToBinary.cpp calls translateModuleToLLVMIR which requires the MLIRTargetLLVMIRExport library. This should be added to the LINK_LIBS PUBLIC section to ensure proper linking.
  LINK_LIBS PUBLIC
  MLIRAnalysis
  MLIRArithDialect
  MLIRControlFlowDialect
  MLIRFuncDialect
  MLIRGPUDialect
  MLIRIR
  MLIRLLVMDialect
  MLIRMemRefDialect
  MLIRPass
  MLIRROCDLTarget
  MLIRRewrite
  MLIRTransformUtils
  MLIRVectorDialect
  MLIRWaterAnalysis

water/lib/Transforms/CMakeLists.txt:29

  • Missing library dependency: GPUModuleToBinary.cpp uses makeOptimizingTransformer from mlir/ExecutionEngine/OptUtils.h, which requires the MLIRExecutionEngine library. This should be added to the LINK_LIBS PUBLIC section.
  LINK_LIBS PUBLIC
  MLIRAnalysis
  MLIRArithDialect
  MLIRControlFlowDialect
  MLIRFuncDialect
  MLIRGPUDialect
  MLIRIR
  MLIRLLVMDialect
  MLIRMemRefDialect
  MLIRPass
  MLIRROCDLTarget
  MLIRRewrite
  MLIRTransformUtils
  MLIRVectorDialect
  MLIRWaterAnalysis

water/lib/Transforms/CMakeLists.txt:29

  • Missing library dependency: GPUModuleToBinary.cpp uses ROCDL::ROCDLTargetAttr and ROCDL::getROCMPath() which require the MLIRROCDLDialect library. This should be added to the LINK_LIBS PUBLIC section.
  LINK_LIBS PUBLIC
  MLIRAnalysis
  MLIRArithDialect
  MLIRControlFlowDialect
  MLIRFuncDialect
  MLIRGPUDialect
  MLIRIR
  MLIRLLVMDialect
  MLIRMemRefDialect
  MLIRPass
  MLIRROCDLTarget
  MLIRRewrite
  MLIRTransformUtils
  MLIRVectorDialect
  MLIRWaterAnalysis

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ftynse
Copy link
Contributor

ftynse commented Dec 4, 2025

Is the ability to inject a different assembly something we should rather put upstream? @fabianmcg may be interested and he created the current design for this pass.

(void)initialized;
}

FailureOr<SmallVector<char, 0>>
Copy link
Contributor

@tgymnich tgymnich Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred data type for raw memory is unsigned char or std::byte.
edit: I now see it used everywhere like this in the MLIR codebase. I don't understand why.

Using unsigned char or std::byte over char is done to disambiguate strings which are null terminated from raw bytes, mostly to avoid their accidental use in C string apis.

In this case it might be worth disambiguating because hsaco has both a serialized and a text representation. Which I would expect to be const char* and unsigned char* respectively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, all llvm classes like MemoryBuffer are using char*.

return op->emitError("Failed to read HSACO from temporary file");

StringRef buffer = (*hsacoFile)->getBuffer();
return SmallVector<char, 0>(buffer.begin(), buffer.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return SmallVector<char, 0>(buffer.begin(), buffer.end());
return SmallVector<unsigned char, 0>(buffer.bytes_begin(), buffer.bytes_end());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the prev comment, I can it, but it will be a lot of pointer casts.

@Hardcode84
Copy link
Contributor Author

Is the ability to inject a different assembly something we should rather put upstream?

Maybe in a month (or three), after we bikeshed callback return types enough llvm/llvm-project#170134

@fabianmcg
Copy link

fabianmcg commented Dec 4, 2025

FWIW, it's been in my TODO list for a while to break gpu-module-to-binary, into gpu-compile-module-to and gpu-compile-binary-to (allowing to resume/stop compilation at different representations). Which would also essentially allow what you want, however, it's not being in my critical path, and I don't think it will be there before the end of the year.

From @Hardcode84 original PR upstream, there seemed no need to modify LLVM or assembly, my issue with the PR is the signal of an error inside a callback without propagation.

@Hardcode84
Copy link
Contributor Author

Hardcode84 commented Dec 4, 2025

From @Hardcode84 original PR upstream, there seemed no need to modify LLVM or assembly, my issue with the PR is the signal of an error inside a callback without propagation.

I went upstream with just dump first as most non-controversial, and even it got stuck (see PR on callbacks)

Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
Signed-off-by: Ivan Butygin <[email protected]>
@Hardcode84 Hardcode84 force-pushed the water-dump-intermediates-clear branch from 1228cce to 7f134f6 Compare December 4, 2025 18:12
@ftynse
Copy link
Contributor

ftynse commented Dec 5, 2025

I went upstream with just dump first as most non-controversial, and even it got stuck (see PR on callbacks)

You shouldn't need errors if you just dump though :) It becomes controversial because people pick up on what you actually intend to do.

@Hardcode84
Copy link
Contributor Author

Anyway, lets merge this one for now?

@Hardcode84
Copy link
Contributor Author

ping. This one is actually blocking a few other things as there is no way to dump asm in the new pipeline.

Copy link
Contributor

@ftynse ftynse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have it, but remove it by end of January after making upstream to work properly. Having a copy of the code is a long-term maintenance liability.

@fabianmcg
Copy link

FWIW if you only care about dumping the ASM to check it, you can use debug-only=serialize-to-isa https://github.com/llvm/llvm-project/blob/main/mlir/lib/Target/LLVM/ROCDL/Target.cpp#L431-L436

@Hardcode84
Copy link
Contributor Author

debug-only only works if you build in debug/with assertions (which we do currently, but still)

@Hardcode84 Hardcode84 merged commit 28c5473 into iree-org:main Dec 8, 2025
16 checks passed
@Hardcode84 Hardcode84 deleted the water-dump-intermediates-clear branch December 8, 2025 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants