Skip to content
This repository has been archived by the owner on Apr 23, 2021. It is now read-only.

[ROCm] Adding pass to generate the HSACO binary blob from the GPU kernel function #181

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

[ROCm] Adding pass to generate the HSACO binary blob from the GPU kernel function #181

wants to merge 1 commit into from

Conversation

deven-amd
Copy link
Contributor

This is the first of four PRs to add support for the ROCm backend for AMD GPUs. This PR adds the pass to generate the HSA Code objects corresponding the GPU kernels. This PR builds on the work done by @whchung in PR #63

This PR is a follow-up to request in PR #169. It addresses all but two of the code-review comments made in that PR for the ConvertKernelFuncToHSACO.cpp file.

  1. [ROCm] Adding the ROCM backend implementation for GPU dialect + the mlir-rocm-runner utility #169 (comment)

Why are you wrapping the vector in a unique_ptr instead of just using vector by value?

There are a bunch of places where the code could be simplified, for instance:

   const char data[] = "HSACO";
   return std::make_unique<std::vector<char>>(data, data + sizeof(data) - 1);

would likely become:

  return "HSACO";

Two reasons for this.

  • The binary blob owned by this variable could potentially be large, and hence pass-by-pointer instead of pass-by-value (Note that the "HSACO" string is only returned in the "test" scenario)

  • Consistency with the CUDA side (where we OwnedCubin, which is also unique_ptr wrapper on a vector of chars)

  1. [ROCm] Adding the ROCM backend implementation for GPU dialect + the mlir-rocm-runner utility #169 (comment)

Do we have to go through files? Isn't lld usable as a library?

At least for the time being, it is easier to go through files. Dumping out the .ll file is helpful for debugging purposes, so that is something we would like to keep regardless. Also I am still trying to figure out how to invoke lld as an API call, within the mlir setup :(


@joker-eph @whchung

lib/Conversion/GPUToROCM/ConvertKernelFuncToHSACO.cpp Outdated Show resolved Hide resolved
lib/Conversion/GPUToROCM/ConvertKernelFuncToHSACO.cpp Outdated Show resolved Hide resolved
lib/Conversion/GPUToROCM/ConvertKernelFuncToHSACO.cpp Outdated Show resolved Hide resolved
lib/Conversion/GPUToROCM/ConvertKernelFuncToHSACO.cpp Outdated Show resolved Hide resolved
lib/Conversion/GPUToROCM/ConvertKernelFuncToHSACO.cpp Outdated Show resolved Hide resolved
include/mlir/Conversion/GPUToROCM/GPUToROCMPass.h Outdated Show resolved Hide resolved
set(MLIR_ROCM_CONVERSIONS_ENABLED 1)
else()
set(MLIR_ROCM_CONVERSIONS_ENABLED 0)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct I believe: at the moment you have a dependency on the environment (lld, ROCm runtime bitcode libraries, etc.)

As mentioned elsewhere: there should be a step detecting the environment (and we should probably put this behind an opt-in CMake flag by the way) and generating a rocm_config.h that can be included in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have address this in the CMakeLists.txt file in the GPUToROCM dir

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is much better :)

Copy link
Contributor Author

@deven-amd deven-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joker-eph , I have pushed out a new rebased commit that addresses the feedback you provided.

Please re-review....thanks

deven

set(MLIR_ROCM_CONVERSIONS_ENABLED 1)
else()
set(MLIR_ROCM_CONVERSIONS_ENABLED 0)
endif()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have address this in the CMakeLists.txt file in the GPUToROCM dir

@deven-amd
Copy link
Contributor Author

@joker-eph , gentle ping

@joker-eph
Copy link
Contributor

@joker-eph , gentle ping

Sorry, I missed the update (and LLVM Dev Meeting happened). I'm catching up on everything and will get to it this week. Ping on Monday if you don't see an update!

if (EXISTS ${ROCM_INSTALL_DIR})
message("-- ROCm Install Dir - ${ROCM_INSTALL_DIR}")
else()
message(SEND_ERROR "-- NOT FOUND : ROCm Install Dir - ${ROCM_INSTALL_DIR}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't issue an error when this will be the normal behavior (a complete config on a machine without ROCM).

We should differentiate when the user is setting the value and when it is the default one. When the user provides the path to ROCM and we don't find it, we should just use FATAL_ERROR, otherwise a normal STATUS should be enough.

Can you also set the variables to empty in the else branch?

Finally, if ROCM is correctly detected, it seems like we should set a variable so that we can distinguish on it in the code and in lit tests as well: basically we have two levels that we should distinguish:

  1. AMDGPU backend is built-in: we can run the test pass.
  2. We have 1) and ROCM is installed/configured on the system. In this case we can run another test that exercise the linker invocation as well.

In lit you have config.run_rocm_tests at the moment, which is only indicating 1) above. We should have another flag for 2), and a macro in the code as well ideally.

#define MLIR_CONVERSION_GPUTOROCM_ROCMCONFIG_H_

/// The code to generate the HSACO binary blobs (corresponding the GPU kernels)
/// assumes the presense of ROCm libraries/utilities. The location of these
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// assumes the presense of ROCm libraries/utilities. The location of these
/// assumes the presence of ROCm libraries/utilities. The location of these

static constexpr const char *kHSACOAnnotation = "amdgpu.hsaco";
static constexpr const char *kHSACOGetterAnnotation = "amdgpu.hsacogetter";
static constexpr const char *kHSACOGetterSuffix = "_hsaco";
static constexpr const char *kHSACOStorageSuffix = "_hsaco_cst";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a use for kHSACOGetterSuffix and kHSACOStorageSuffix?

Also are all these definition sintended to be defined in the public header of are they more internal detail of the pass (and could be in the implementation file).

set(MLIR_ROCM_CONVERSIONS_ENABLED 1)
else()
set(MLIR_ROCM_CONVERSIONS_ENABLED 0)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is much better :)

llvm::TargetMachine::CGFT_ObjectFile);
codegenPasses.run(llvmModule);
isabinFileStream.flush();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to keep maximum coverage for the test, since the pass is only available when the AMDGPU backend is built in, we should be able to get until there for the basic test. We could just reload the binary file isabinFilename print it as hexadecimal and return.

Something like:

if (testMode) {
  auto hsacoFileOrError = llvm::MemoryBuffer::getFileAsStream(isabinFilename);
  if ((ec = hsacoFileOrError.getError()))
    return {};
  std::unique_ptr<llvm::MemoryBuffer> hsacoFile = std::move(hsacoFileOrError.get());
  SmallVector<char, 1> buffer;
  {
    llvm::raw_svector_ostream os(buffer);
    for (auto char : hascoFile.getBuffer())
      os.write_hex(char);
  }
  return std::make_unique<std::vector<char>>(buffer.begin(), buffer.end());
}

The test would be able to check that we actually correctly generate an ELF binary (by checking the ELF magic in hexadecimal) and cover everything but the linker when ROCM is not available.


static PassRegistration<GpuKernelToHSACOPass>
pass("test-kernel-to-hsaco",
"Convert all kernel functions to HSA code object blobs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a macro that define if ROCM is installed or not, we can then conditionally register another version of the pass "rocm-hsaco-lowering" or something like which actually run the linker as well, and add a test for this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants