[WIP] [spirv] Add Vulkan runtime. #118

denis0x0D · 2019-09-02T12:45:22Z

This patch is addressing #60
Implements initial version of Vulkan runtime to test spirv::ModuleOp.
Creates and runs Vulkan computation pipeline.

Provides one unititest to test multiplication of two spv.arrays of float types.

Requirements:

Vulkan capable device and drivers installed.
Vulkan SDK installed and VULKAN_SDK environment variable is set.

How to build:

Provide -DMLIR_VULKAN_RUNNER_ENABLED as cmake variable.

Note:

Tested only under the GNU/Linux distribution with official Vulkan SDK provided by https://vulkan.lunarg.com/doc/sdk/latest/linux/getting_started.html

@antiagainst @MaheshRavishankar can you please take a look?
Thanks!

antiagainst

Awesome, thanks @denis0x0D!

I've quite a few inline comments. But generally, I think we should add more documentation and improve how errors are reported. :)

antiagainst · 2019-09-04T17:07:00Z

include/mlir/Support/VulkanRuntime.h

@@ -0,0 +1,59 @@
+//===- VulkanRuntime.h - MLIR Vulkan runtime ------------------------------===//


Nit: -*- C++ -*-===// at the end.

antiagainst · 2019-09-04T17:08:49Z

include/mlir/Support/VulkanRuntime.h

+// limitations under the License.
+// =============================================================================
+//
+// This file specifies VulkanDeviceMemoryBuffer,


"This file provides a library for running a module on a Vulkan device." ?

antiagainst · 2019-09-04T17:11:17Z

include/mlir/Support/VulkanRuntime.h

+
+#include <vulkan/vulkan.h>
+
+using Descriptor = uint32_t;


Can you add more comments to this? Normally a descriptor includes two numbers. We only have one here. Would be nice to be clear on what it means.

antiagainst · 2019-09-04T17:12:24Z

include/mlir/Support/VulkanRuntime.h

+
+using Descriptor = uint32_t;
+
+/// Represents device memory buffer.


" Struct containing information regarding a ..." ?

antiagainst · 2019-09-04T17:12:36Z

include/mlir/Support/VulkanRuntime.h

+  VkDeviceMemory deviceMemory;
+};
+
+/// Represents host memory buffer.


"Struct containing information regarding a ..."?

antiagainst · 2019-09-04T19:56:09Z

tools/mlir-vulkan-runner/VulkanRuntime.cpp

+}
+
+LogicalResult VulkanRuntime::run() {
+  if (failed(vulkanCreateInstance())) {


if (... || ... || ...)

antiagainst · 2019-09-04T19:57:21Z

tools/mlir-vulkan-runner/VulkanRuntime.cpp

+  VkApplicationInfo applicationInfo;
+  applicationInfo.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO;
+  applicationInfo.pNext = nullptr;
+  applicationInfo.pApplicationName = "Vulkan MLIR runtime";


MLIR Vulkan runtime

antiagainst · 2019-09-04T19:58:07Z

tools/mlir-vulkan-runner/VulkanRuntime.cpp

+      vkEnumeratePhysicalDevices(instance, &physicalDeviceCount, 0),
+      "vkEnumeratePhysicalDevices");
+
+  llvm::SmallVector<VkPhysicalDevice, 0> physicalDevices(physicalDeviceCount);


1 ? Most hosts should just have one physical device I think.

antiagainst · 2019-09-04T20:02:23Z

tools/mlir-vulkan-runner/VulkanRuntime.cpp

+      vkCreateDevice(physicalDevice, &deviceCreateInfo, 0, &device),
+      "vkCreateDevice");
+
+  VkPhysicalDeviceMemoryProperties properties;


Let's initialize these local variables with = {} otherwise MSAN might be unhappy.

antiagainst · 2019-09-04T20:02:45Z

tools/mlir-vulkan-runner/VulkanRuntime.cpp

+  VkPhysicalDeviceMemoryProperties properties;
+  vkGetPhysicalDeviceMemoryProperties(physicalDevice, &properties);
+
+  for (uint32_t i = 0, e = properties.memoryTypeCount; i < e; ++i) {


Can you put comments here on what we are doing here so it's easier for others to follow?

denis0x0D · 2019-09-04T20:39:54Z

@antiagainst thanks a lot for review! I will update the patch regarding to the comments.

River707 · 2019-09-04T20:39:56Z

include/mlir/Support/VulkanRuntime.h

@@ -0,0 +1,59 @@
+//===- VulkanRuntime.h - MLIR Vulkan runtime ------------------------------===//


This file should not be in /Support.

@River707 thanks, I'll delete it, since it's just for the test purpose, I think I can declare the function inside the test file.

Implements initial version of Vulkan runtime to test spirv::ModuleOp. Creates and runs Vulkan computation pipeline. Requirements: - Vulkan capable device and drivers installed. - Vulkan SDK installed and VULKAN_SDK environment variable is set. How to build: - Provide -DMLIR_VULKAN_RUNNER_ENABLED=ON as cmake variable.

denis0x0D · 2019-09-08T18:20:10Z

@antiagainst I've updated the patch regarding to your comments and have some thoughts about the error reporting machinery in particular and about the vulkan-runner in general,
the following text may contain repetitions with what has already been agreed, sorry about it, but I want make sure that I unrerstand all in the right way. Can you please comment on the following, please fix me if I'm wrong.

I was looking at the current realization of other runtime libs inside MLIR and they are using llvm:errs(), for example cuda runtime wrappers https://github.com/tensorflow/mlir/blob/master/tools/mlir-cuda-runner/cuda-runtime-wrappers.cpp#L34
It could be better for Vulkan runtime to be independent from MLIR context, for example the full pipeline for vulkan-runner could look like this:

2.1. Compile-time passes:

a. Pass which serializes spir-v module into binary and inserts it as an attribute, similar to GpuToCubinPass:

function.setAttr(kCubinAnnotation,
                   builder.getStringAttr({cubin->data(), cubin->size()}));

b. Pass which creates global constant containing spir-v binary, similar to GpuGenerateCubinAccessorsPass:

Value *startPtr = LLVM::createGlobalString(
        loc, builder, StringRef(nameBuffer), blob.getValue(), llvmDialect);
    builder.create<LLVM::ReturnOp>(loc, startPtr);

c. And the final pass which converts launch call into calls for Vulkan runtime wrappers, instruments calls for buffers registration into Vulkan runtime and so on.

2.2. Runtime part must consist of two parts: Vulkan runtime wrappers over actual runtime and actual Vulkan runtime which manages memory, descriptors, layouts, pipeline and so on.
In this case the mlir module could look like this:

module {
  func @main() {
     // ...
    call @vkPopulateBuffer(%buff : memref<x?f32>, %desriptorSet : i32, %descriptorBinding : i32, %storageClass : i32) ...
    // ...
    call @vkLaunchShader(...) ...
    // ...
 }
// extern @vkPopulateBuffer
// extern @vkLaunchShader

Vulkan runtime wrappers could look like this:

extern "C" void vkPopulateBuffer(const memref_t buff, int32_t descriptorSet,
 int32_t decsriptorBinding, int32_t stroageClass) {

     VulkanRuntimeManager::instance()->registerBuffer(
      vulkanHostMemoryBuffer(buff.values, buff.length * sizeof(float)),
      decsriptorSet, descriptorBinding, storageClass);
}

VulkanRuntimeManager could look like this:

class VulkanRuntimeManager {
public:
  static VulkanRuntimeManager *instance() {
  static VulkanRuntimeManager *mng = new VulkanRuntimeManager;
  return mng;
}
void registerBuffer(VulkanHostMemoryBuffer buffer, int32_t set, int32_t binding, int32_t storageclass) {
 lock(m);
 runtime.registerBuffer(...)
}
void laucnhShader(...) ;
private:
mutex m;
VulkanRuntime runtime;
};

In this case we can at first register all needed information for Vulkan runtime such as resources (set, binding), entry point, number of work groups, then create device memory buffers, descriptor set layouts, pipeline layout and bind it to computation pipeline, create command buffer and finaly submit command buffer to the working queue, what do you think?
Thanks.

kiszk · 2019-11-20T17:20:53Z

tools/mlir-vulkan-runner/VulkanRuntime.cpp

+namespace {
+
+/// Vulkan runtime.
+/// The purpose of this class is to run spir-v computaiton shader on Vulkan


nit: computaiton -> computation

kiszk · 2019-11-20T17:22:23Z

tools/mlir-vulkan-runner/VulkanRuntime.cpp

+/// spir-v shader, number of work groups and entry point. After the creation of
+/// VulkanRuntime, special methods must be called in the following
+/// sequence: initRuntime(), run(), updateHostMemoryBuffers(), destroy();
+/// each method in the sequence returns sussses or failure depends on the Vulkan


nit: sussses -> success

kiszk · 2019-11-20T17:23:12Z

tools/mlir-vulkan-runner/VulkanRuntime.cpp

+  VkDevice device;
+  VkQueue queue;
+
+  /// Specifies VulkanDeviceMemoryBuffers devided into sets.


nit: devided -> divided

kiszk · 2019-11-20T17:24:05Z

tools/mlir-vulkan-runner/VulkanRuntime.cpp

+    return failure();
+  }
+
+  // Descriptor bindings devided into sets. Each descriptor binding


nit: devided -> divided

kiszk · 2019-11-20T17:54:18Z

unittests/Dialect/SPIRV/Runtime/RuntimeTests.cpp

+
+TEST_F(RuntimeTest, SimpleTest) {
+  // SPIRV module embedded into the string.
+  // This module contains 4 resource variables devided into 2 sets.


nit: devided -> divided

joker-eph · 2019-11-20T20:30:35Z

tools/mlir-vulkan-runner/CMakeLists.txt

+  # https://cmake.org/cmake/help/v3.7/module/FindVulkan.html
+  if (NOT CMAKE_VERSION VERSION_LESS 3.7.0)
+    find_package(Vulkan)
+  endif()


I'd add a else and a proper error message here.

I think the error message is at L29. :) The probe is not done yet here.

I see that there is a fallback, so throwing error here is not possible indeed, but I still feel that the "you're not using the right CMake version should be something surfaced at some point as a root cause of failure.
Something like adding here: an else if ("$ENV{VULKAN_SDK}" STREQUAL "") message(FATAL_ERROR "Please use at least Make 3.7.0 or provide the VULKAN_SDK path as an environment variable")

joker-eph · 2019-11-20T20:34:19Z

unittests/Dialect/SPIRV/Runtime/RuntimeTests.cpp

+  uint32_t z{1};
+};
+
+// This is a temporary function and will be removed in the future.


Can you clarify the temporary aspect: there is a layering violation right now where you're poking at the implementation of a tools.

joker-eph · 2019-11-20T20:34:59Z

unittests/Dialect/SPIRV/Runtime/RuntimeTests.cpp

+  destroyResourceVarFloat(vars[1][1]);
+  destroyResourceVarFloat(fmulResult);
+  destroyResourceVarFloat(expected);
+}


Can we avoid C++ unit-tests and use a regular lit+FileCheck testing please? I think that is the whole point of using a runner.

See the cuda testing for reference.

I think this was just a temporary step to have functionalities implemented incrementally. We don't have all the shims and conversions like the CUDA side yet. We've already >1k LOCs here; having all the above, we are expecting another >1k LOCs. So I'm fine of landing this as-is. But if you are very uncomfortable about this, I see two ways going forward: 1) remove the C++ tests and land the functionality; 2) leaving this PR open and have following-up commits to implement the rest of the functionality so finally we can squash the intermediate state away.

I'm still on the same opinion as in September: build the right tool step by step with the right testing at every step along the way.

antiagainst

Sorry for the late reply! On my side I just have a few more nits.

antiagainst · 2019-11-20T21:06:51Z

tools/mlir-vulkan-runner/CMakeLists.txt

+  # https://cmake.org/cmake/help/v3.7/module/FindVulkan.html
+  if (NOT CMAKE_VERSION VERSION_LESS 3.7.0)
+    find_package(Vulkan)
+  endif()


I think the error message is at L29. :) The probe is not done yet here.

antiagainst · 2019-11-20T21:10:39Z

tools/mlir-vulkan-runner/VulkanRuntime.cpp

+struct VulkanDeviceMemoryBuffer {
+  BindingIndex bindingIndex{0};
+  VkDescriptorType descriptorType{VK_DESCRIPTOR_TYPE_MAX_ENUM};
+  VkDescriptorBufferInfo bufferInfo;


Zero initialize the other fields too? (Using VK_NULL_HANDLE and other reasonable defaults)

antiagainst · 2019-11-20T21:11:57Z

tools/mlir-vulkan-runner/VulkanRuntime.cpp

+  uint32_t z{1};
+};
+
+/// Struct containing information regarding to a descriptor set.


antiagainst · 2019-11-20T21:14:23Z

tools/mlir-vulkan-runner/VulkanRuntime.cpp

+namespace {
+
+/// Vulkan runtime.
+/// The purpose of this class is to run spir-v computaiton shader on Vulkan


s/spir-v/SPIR-V/

globally

antiagainst · 2019-11-20T21:31:58Z

unittests/Dialect/SPIRV/Runtime/RuntimeTests.cpp

+
+  void createResourceVarFloat(uint32_t id, uint32_t elementCount) {
+    std::srand(unsigned(std::time(0)));
+    float *ptr = new float[elementCount];


No. Actually it's fine to have VulkanHostMemoryBuffer to hold raw pointer. I was just saying that we should wrap these raw pointers in unique_ptr to avoid potential leaks. For example, you can let this function to return the unique_ptr and assign it to a local variable in the test so we don't need to explicitly call delete later.

antiagainst · 2019-11-20T21:47:12Z

unittests/Dialect/SPIRV/Runtime/RuntimeTests.cpp

+  destroyResourceVarFloat(vars[1][1]);
+  destroyResourceVarFloat(fmulResult);
+  destroyResourceVarFloat(expected);
+}


I think this was just a temporary step to have functionalities implemented incrementally. We don't have all the shims and conversions like the CUDA side yet. We've already >1k LOCs here; having all the above, we are expecting another >1k LOCs. So I'm fine of landing this as-is. But if you are very uncomfortable about this, I see two ways going forward: 1) remove the C++ tests and land the functionality; 2) leaving this PR open and have following-up commits to implement the rest of the functionality so finally we can squash the intermediate state away.

antiagainst · 2019-11-20T22:18:48Z

Re: #118 (comment)

Okay, using llvm::errs() sgtm. :)
What you said is reasonable for me!

Again, sorry for the delay on reviewing! There are quite a few interesting things happened in the meantime that relates to this. Our prototyping on the Vulkan runtime side, IREE, was open sourced. (I see you've already noticed that because you posted questions there. ;-P) It is a reflection of how we actually want to approach in a more Vulkan-native way. Compared to CUDA, which provides a more developer-friendly and middle-level host API abstractions, Vulkan's host API is more low-level and much verbose. We think by modelling it in IR form (not literally but selectively on core features), we can gain the benefits of running compiler optimizations on it. It is not fully proven yet but looks quite promising thus far. We are also thinking about how to structure MLIR core and IREE regarding the components. SPIR-V dialect is in MLIR core right now, but the lowering from high-level dialect used by IREE is not; it's inside IREE. That is partially because we don't have a proper modelling of Vulkan host side in MLIR core. Here different from CUDA again, a simple gpu.launch op that connects the host and device code does not really work for Vulkan (I mean yes we can still lower it to Vulkan but it's not gaining all expected benefits from Vulkan performance-wise). So it would be nice to have proper modelling on Vulkan in core so we can have the lowering in core and potentially share with other code paths for functionalities. Just wanted to point out recent developments to keep you informed where we are going; what you've here is certainly very valuable for testing and running in core and as building blocks for future Vulkan modelling. Thanks for the contribution again! :)

joker-eph

Actually I just noticed that the PR wasn't updated by Denis recently, I got mislead because it popped up in my email box after recent comments from kiszk@

joker-eph · 2019-11-21T00:19:32Z

tools/mlir-vulkan-runner/CMakeLists.txt

+  # https://cmake.org/cmake/help/v3.7/module/FindVulkan.html
+  if (NOT CMAKE_VERSION VERSION_LESS 3.7.0)
+    find_package(Vulkan)
+  endif()


I see that there is a fallback, so throwing error here is not possible indeed, but I still feel that the "you're not using the right CMake version should be something surfaced at some point as a root cause of failure.
Something like adding here: an else if ("$ENV{VULKAN_SDK}" STREQUAL "") message(FATAL_ERROR "Please use at least Make 3.7.0 or provide the VULKAN_SDK path as an environment variable")

denis0x0D · 2019-11-21T11:19:50Z

@kiszk @joker-eph @antiagainst @MaheshRavishankar thanks for review, actually I was thinking that IREE covers that PR as well, and this PR is not needed anymore, but if it is still relevant to test lowering - that sounds great to me!

joker-eph · 2019-11-21T12:03:04Z

IREE is an independent project, I am not aware of a plan to integrate IREE within MLIR, so having end-to-end codegen story and testing capability seems still important to me.

denis0x0D · 2019-11-22T12:51:43Z

@joker-eph @antiagainst

having end-to-end codegen story and testing capability seems still important to me.

sounds great to me!

If it's ok, on the next iteration I'll rebase current patch on trunk, update according to the comments, delete c++ unit tests and leave this PR under WIP.
After that I can start to work on a separate PR, the PR will implement a pass which:

Translates module into spv.module.
Serializes spv.module into binary form.
Sets serialized module as an attribute.
This is similar to GpuKernelToCubinPass does https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp

Seems to me, to implement fully working mlir-vulkan-runner also requires one more PR to cover similar functionality with https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertLaunchFuncToCudaCalls.cpp and to update runtime part to be consistent with new passes and mlir-vulkan-runner. I can cover it on the future PRs.
What do you think? If you see the better plan, please fix me in this case.
Thanks!

antiagainst · 2019-11-22T14:01:54Z

+1 to what @joker-eph said in #118 (comment).

@denis0x0D: what you said SGTM! Just wanted to let you know, we are working on enabling the lowering flow from Linalg dialect to SPIR-V right now. I think that has a higher priority. We'll create issues for tasks soon and would really appreciate it if you can help there first. :) Sorry for keeping this thread open for even longer time, but I guess because this is entirely new code so it won't be much a problem to rebase against master branch later. Thanks!

denis0x0D · 2019-11-22T14:27:39Z

@antiagainst

we are working on enabling the lowering flow from Linalg dialect to SPIR-V right now. I think that has a higher priority. We'll create issues for tasks soon and would really appreciate it if you can help there first.

Sounds great, I would like to help on it if possible!
Thanks!

MaheshRavishankar · 2019-11-24T07:45:27Z

Just catching up on all the discussion here now. Apologies for not paying more attention here.

As everyone agrees here, having a mlir-vulkan-runner in MLIR core is extremely useful for testing. So thanks again @denis0x0D for taking this up. Just to provide more context, as @antiagainst mentioned we have been trying to get all the pieces in place to convert from Linalg ops to GPU dialect to SPIR-V dialect. Some of the changes w.r.t to these will land soon (in a couple of days). I have also been making some fairly significant change to the SPIR-V lowering infrastructure to make it easy to build upon. (these should also land in a couple of days). While these changes don't directly affect the development of a vulkan-runner (the runner should be only be concerned about the spir-v binary generated by the dialect), they do motivate some requirements on the vulkan-runner summarized below.

As structured right now, going to GPU dialect will result in two sub-modules within the main module

module attributes {gpu.container_module} {
    // Host side code which will contain gpu.launch_func op
   module @<some_name> attributes {gpu.kernel_module} {
     // Kernel functions ,i.e. FuncOps with the gpu.kernel attribute
  }
}

The functions in the gpu.kernel_module with the gpu.kernel attribute are targeted for SPIR-V lowering. THe lowering will create a new spv.module within the gpu.kernel_module

module attributes {gpu.container_module} {
   // Host side code which will contain gpu.launch_func op
  module @<some_name> attributes {gpu.kernel_module} {
    spv.module {
       ...
    }
     // Kernel functions ,i.e. FuncOps with the gpu.kernel attribute
}

So it would be useful if the vulkan runner can handle such module. Given this I have some follow up below on a previous comment

@joker-eph @antiagainst

having end-to-end codegen story and testing capability seems still important to me.

sounds great to me!

If it's ok, on the next iteration I'll rebase current patch on trunk, update according to the comments, delete c++ unit tests and leave this PR under WIP.
After that I can start to work on a separate PR, the PR will implement a pass which:

Translates module into spv.module.

I am not sure we can translate a module into spv.module directly since the module contains both host and device side components. Do you mean extracting spv.module from within a module and running them?

Serializes spv.module into binary form.

Sets serialized module as an attribute.
This is similar to GpuKernelToCubinPass does https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp

Seems to me, to implement fully working mlir-vulkan-runner also requires one more PR to cover similar functionality with https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertLaunchFuncToCudaCalls.cpp and to update runtime part to be consistent with new passes and mlir-vulkan-runner. I can cover it on the future PRs.
What do you think? If you see the better plan, please fix me in this case.
Thanks!

joker-eph · 2019-11-24T07:59:26Z

On Sat, Nov 23, 2019 at 11:45 PM MaheshRavishankar ***@***.***> wrote: Just catching up on all the discussion here now. Apologies for not paying more attention here. As everyone agrees here, having a mlir-vulkan-runner in MLIR core is extremely useful for testing. So thanks again @denis0x0D <https://github.com/denis0x0D> for taking this up. Just to provide more context, as @antiagainst <https://github.com/antiagainst> mentioned we have been trying to get all the pieces in place to convert from Linalg ops to GPU dialect to SPIR-V dialect. Some of the changes w.r.t to these will land soon (in a couple of days). I have also been making some fairly significant change to the SPIR-V lowering infrastructure to make it easy to build upon. (these should also land in a couple of days). While these changes don't directly affect the development of a vulkan-runner (the runner should be only be concerned about the spir-v binary generated by the dialect), they do motivate some requirements on the vulkan-runner summarized below. 1. As structured right now, going to GPU dialect will result in two sub-modules within the main module module { module attributes {gpu.container_module} { // Host side code which will contain gpu.launch_func op } module @fmul_kernel attributes {gpu.kernel_module} { // Kernel functions ,i.e. FuncOps with the gpu.kernel attribute } }

As far as I can tell, at the moment the host code is in the module enclosing the kernel module, they aren't siblings: https://github.com/tensorflow/mlir/blob/master/g3doc/Dialects/GPU.md#gpulaunch_func Did I miss anything?

The functions in the gpu.kernel_module with the gpu.kernel attribute are targeted for SPIR-V lowering. THe lowering will create a new spv.module within the gpu.kernel_module module { module attributes {gpu.container_module} { // Host side code which will contain gpu.launch_func op } module @fmul_kernel attributes {gpu.kernel_module} { spv.module { ... } // Kernel functions ,i.e. FuncOps with the gpu.kernel attribute } }

Why is the spv.module nested inside the kernel_module instead lowering the kernel_module into a spv.module? What do the "FuncOps with the gpu.kernel" attribute becomes in your representation above? (is there an example I could follow in the repo maybe?) Thanks,

…

-- Mehdi

So it would be useful if the vulkan runner can handle such module. Given this I have some follow up below on a previous comment @joker-eph <https://github.com/joker-eph> @antiagainst <https://github.com/antiagainst> having end-to-end codegen story and testing capability seems still important to me. sounds great to me! If it's ok, on the next iteration I'll rebase current patch on trunk, update according to the comments, delete c++ unit tests and leave this PR under WIP. After that I can start to work on a separate PR, the PR will implement a pass which: 1. Translates module into spv.module. I am not sure we can translate a module into spv.module directly since the module contains both host and device side components. Do you mean extracting spv.module from within a module and running them? 1. Serializes spv.module into binary form. 2. Sets serialized module as an attribute. This is similar to GpuKernelToCubinPass does https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp Seems to me, to implement fully working mlir-vulkan-runner also requires one more PR to cover similar functionality with https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertLaunchFuncToCudaCalls.cpp and to update runtime part to be consistent with new passes and mlir-vulkan-runner. I can cover it on the future PRs. What do you think? If you see the better plan, please fix me in this case. Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#118?email_source=notifications&email_token=AAZXKDFCJRDZ2QY2OE4F3ZTQVIWJRA5CNFSM4IS5DSN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAFU3A#issuecomment-557865580>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZXKDGVYKFLG7BEALSKFYDQVIWJRANCNFSM4IS5DSNQ> .

denis0x0D · 2019-11-24T13:55:43Z

@MaheshRavishankar thanks for the feedback,

I am not sure we can translate a module into spv.module directly since the module contains both host and device side components. Do you mean extracting spv.module from within a module and running them?

I was wrong about translate to spv.module inside this pass, sorry about it, I lost the context of this task :)
The lowering part GPU -> SPIR-V is already exists and it should be added to pass manager in mlir-vulkan-runner.

So to enalbe mlir-vulkan-runner it needs:

A pass which enables right after "-convert-gpu-to-spirv", it consumes module with {gpu.container_module} attribute, serializes spv.module into binary form, using existing spirv::Serializer and attaches serialized spv.module as an attribute, simliar to this
https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp#L85
A pass which instruments host part, the part which contains gpu.launch_func, with calls to runtime wrappers, simliar to this (the API should be discussed):

module {
  func @main() {
     // ...
    call @vkPopulateBuffer(%buff : memref<x?f32>, %desriptorSet : i32, %descriptorBinding : i32, %storageClass : i32) ...
    // ...
    call @vkLaunchShader(...) ...
    // ...
 }
// extern @vkPopulateBuffer
// extern @vkLaunchShader

The host part is lowering to llvm ir (llvm.dialect), the serialized module becomes a llvm.global https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertLaunchFuncToCudaCalls.cpp#L370 with just a binary data, so we can pass a pointer to that data to runtime https://github.com/tensorflow/mlir/pull/118/files#diff-13d6d98d70fab37eda97625d19b84998R685, actually to create a shaderModule we just need to populate VkShaderModuleCreateInfo with size and pointer to binary shader.

So as result after the second pass we get a module which can be run under JitRunner, the pipeline must look similiar to this code https://github.com/tensorflow/mlir/blob/master/tools/mlir-cuda-runner/mlir-cuda-runner.cpp#L112.
Thanks!

MaheshRavishankar · 2019-11-24T22:19:58Z

Thanks @denis0x0D . Overall what you are suggesting makes sense to me, but I am not that fluent in Vulkan speak to provide more feedback on specifics. But the interaction with the kernel to SPIR-V binary compilation makes sense.

@joker-eph : Thanks for catching my mistake. Edited my post to reflect the actual layout. The main point i was raising though was that you cannot serialize the entire module to a spv.module.

MaheshRavishankar · 2019-11-24T22:45:28Z

On Sat, Nov 23, 2019 at 11:45 PM MaheshRavishankar @.***> wrote: Just catching up on all the discussion here now. Apologies for not paying more attention here. As everyone agrees here, having a mlir-vulkan-runner in MLIR core is extremely useful for testing. So thanks again @denis0x0D https://github.com/denis0x0D for taking this up. Just to provide more context, as @antiagainst https://github.com/antiagainst mentioned we have been trying to get all the pieces in place to convert from Linalg ops to GPU dialect to SPIR-V dialect. Some of the changes w.r.t to these will land soon (in a couple of days). I have also been making some fairly significant change to the SPIR-V lowering infrastructure to make it easy to build upon. (these should also land in a couple of days). While these changes don't directly affect the development of a vulkan-runner (the runner should be only be concerned about the spir-v binary generated by the dialect), they do motivate some requirements on the vulkan-runner summarized below. 1. As structured right now, going to GPU dialect will result in two sub-modules within the main module module { module attributes {gpu.container_module} { // Host side code which will contain gpu.launch_func op } module @fmul_kernel attributes {gpu.kernel_module} { // Kernel functions ,i.e. FuncOps with the gpu.kernel attribute } }
As far as I can tell, at the moment the host code is in the module enclosing the kernel module, they aren't siblings: https://github.com/tensorflow/mlir/blob/master/g3doc/Dialects/GPU.md#gpulaunch_func Did I miss anything?
The functions in the gpu.kernel_module with the gpu.kernel attribute are targeted for SPIR-V lowering. THe lowering will create a new spv.module within the gpu.kernel_module module { module attributes {gpu.container_module} { // Host side code which will contain gpu.launch_func op } module @fmul_kernel attributes {gpu.kernel_module} { spv.module { ... } // Kernel functions ,i.e. FuncOps with the gpu.kernel attribute } }
Why is the spv.module nested inside the kernel_module instead lowering the kernel_module into a spv.module? What do the "FuncOps with the gpu.kernel" attribute becomes in your representation above? (is there an example I could follow in the repo maybe?) Thanks,
…
-- Mehdi

The conversion of a FuncOp with gpu.kernel attribute to spv.module predates the existence of kernel_module. It is pretty straight-forward (and would actually clean up the conversion a little bit) to lower a kernel_module to a spv.module. It is on my list of things to do. Will probably get to that in a day or so.

@denis0x0D : I will try to send the change to lower kernel_module to spv.module as a pull request to the mlir github. So that might be something to look out for w.r.t to the vulkan runner. Thanks!

So it would be useful if the vulkan runner can handle such module. Given this I have some follow up below on a previous comment @joker-eph https://github.com/joker-eph @antiagainst https://github.com/antiagainst having end-to-end codegen story and testing capability seems still important to me. sounds great to me! If it's ok, on the next iteration I'll rebase current patch on trunk, update according to the comments, delete c++ unit tests and leave this PR under WIP. After that I can start to work on a separate PR, the PR will implement a pass which: 1. Translates module into spv.module. I am not sure we can translate a module into spv.module directly since the module contains both host and device side components. Do you mean extracting spv.module from within a module and running them? 1. Serializes spv.module into binary form. 2. Sets serialized module as an attribute. This is similar to GpuKernelToCubinPass does https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertKernelFuncToCubin.cpp Seems to me, to implement fully working mlir-vulkan-runner also requires one more PR to cover similar functionality with https://github.com/tensorflow/mlir/blob/master/lib/Conversion/GPUToCUDA/ConvertLaunchFuncToCudaCalls.cpp and to update runtime part to be consistent with new passes and mlir-vulkan-runner. I can cover it on the future PRs. What do you think? If you see the better plan, please fix me in this case. Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#118?email_source=notifications&email_token=AAZXKDFCJRDZ2QY2OE4F3ZTQVIWJRA5CNFSM4IS5DSN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAFU3A#issuecomment-557865580>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZXKDGVYKFLG7BEALSKFYDQVIWJRANCNFSM4IS5DSNQ .

denis0x0D · 2019-11-25T12:42:09Z

@MaheshRavishankar

I will try to send the change to lower kernel_module to spv.module as a pull request to the mlir github. So that might be something to look out for w.r.t to the vulkan runner. Thanks!

Sounds great to me! By the way, those convertions looks great https://github.com/tensorflow/mlir/tree/master/test/Conversion/GPUToSPIRV

antiagainst · 2019-11-25T14:09:00Z

Thannks @MaheshRavishankar for the good catch! I was actually implicitly thinking that way and didn't notice that it is not obvious to everyone. :)

I think we are already creating the spv.module inside the top-level model, so not embedded in gpu.kernel_module:

mlir/lib/Conversion/GPUToSPIRV/ConvertGPUToSPIRVPass.cpp

Lines 48 to 57 in d342ff9

    
           void GPUToSPIRVPass::runOnModule() { 
        
             auto context = &getContext(); 
        
             auto module = getModule(); 
        
             SmallVector<Operation *, 4> spirvModules; 
        
             module.walk([&module, &spirvModules](FuncOp funcOp) { 
        
               if (!gpu::GPUDialect::isKernel(funcOp)) { 
        
                 return; 
        
               } 
        
               OpBuilder builder(module.getBodyRegion());

But yes going directly from gpu.kernel_module instead of gpu.kernel is cleaner and we can have multiple entry points in the same spv.module! (We need to do more verification along the way too.)

MaheshRavishankar · 2019-11-25T20:46:43Z

@antiagainst : In top of tree now it is added just before the FuncOp

mlir/lib/Conversion/GPUToSPIRV/ConvertGPUToSPIRVPass.cpp

Line 57 in 5614db0

OpBuilder builder(funcOp.getOperation());

But the correct way is to convert the entire module into a spv.Module.

@denis0x0D : The conversion from GPU to SPIRV is changed now with 8c152c5
The CL introduces attributes that you use to specify the ABI for entry point functions. So when you convert from GPU to SPIR-V dialect you add these attributes to the function arguments and function itself that describe various ABI related aspects (like storage class, descriptor_set, binding, etc for spv.globalVariables), as well as the workgroup size for the entry point function.
A later pass (just before serialization) will materialize this ABI.

denis0x0D · 2019-11-25T22:23:13Z

@MaheshRavishankar thanks for mention this!

sherhut · 2019-11-28T21:47:16Z

Late to the game and just wanted to comment that it would be really awesome if we had a lowering for the host-side of the gpu dialect in the vulkan/SPIRV context, as well. I think by using a C++ library to implement a higher-level API should make the host-side code-generation part relatively straight forward. However, as far as I have been told, it still requires quite some code to write such C++ library.

I would not spend too much energy on the design of the C++ library's API and just build what makes sense for easy code generation. The host side of the dialect will likely evolve a fair bit so these abstractions won't be forever and the mlir-vulkan-runner is not meant to replace a real runtime, so simplicity is more important.

antiagainst · 2019-11-29T22:36:27Z

I would not spend too much energy on the design of the C++ library's API and just build what makes sense for easy code generation.

That's the approach that has been taken in the current pull request. You'll need thousands of lines of code to bootstrap a basic Vulkan compute shader. Right now that's all hidden in a single entry point runOnVulkan.

googlebot added the cla: yes label Sep 2, 2019

denis0x0D changed the title ~~[spirv] Add Vulkan runtime.~~ [WIP] [spirv] Add Vulkan runtime. Sep 3, 2019

antiagainst reviewed Sep 4, 2019

View reviewed changes

River707 reviewed Sep 4, 2019

View reviewed changes

denis0x0D force-pushed the sandbox/vulkan-runtime branch from e1220ee to 6ab4a57 Compare September 7, 2019 17:29

kiszk reviewed Nov 20, 2019

View reviewed changes

joker-eph reviewed Nov 20, 2019

View reviewed changes

antiagainst suggested changes Nov 20, 2019

View reviewed changes

joker-eph reviewed Nov 21, 2019

View reviewed changes

joker-eph force-pushed the master branch 3 times, most recently from 48dcae0 to 3722f03 Compare December 26, 2019 04:35

		@@ -0,0 +1,59 @@
		//===- VulkanRuntime.h - MLIR Vulkan runtime ------------------------------===//


		using Descriptor = uint32_t;

		/// Represents device memory buffer.

[WIP] [spirv] Add Vulkan runtime. #118

Are you sure you want to change the base?

[WIP] [spirv] Add Vulkan runtime. #118

Uh oh!

Conversation

denis0x0D commented Sep 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antiagainst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

denis0x0D commented Sep 4, 2019

Uh oh!

River707 Sep 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

denis0x0D commented Sep 8, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antiagainst left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

denis0x0D commented Sep 2, 2019 •

edited

Loading

River707 Sep 4, 2019 •

edited

Loading

antiagainst left a comment •

edited

Loading

MaheshRavishankar commented Nov 24, 2019 •

edited

Loading

denis0x0D commented Nov 24, 2019 •

edited

Loading