Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No-op slices cause executables to drop outputs #496

Open
pranavm-nvidia opened this issue Feb 12, 2025 · 4 comments
Open

No-op slices cause executables to drop outputs #496

pranavm-nvidia opened this issue Feb 12, 2025 · 4 comments
Labels
mlir-tensorrt Pull request for the mlir-tensorrt project

Comments

@pranavm-nvidia
Copy link
Collaborator

If we have a no-op slice as follows:

def func(x):
    return x[:2]

fast_func = tp.compile(func, args=[tp.InputInfo((2,), dtype=tp.float32)])

MLIR-TRT correctly optimizes the generated StableHLO into an identity op, so we end up with:

module @ins_x_outs_t9_1 attributes {executor.process_grid_shape = array<i64: 1, 1>, plan.cluster_kinds = [#plan.tensorrt_cluster<disallow_shape_tensor_calculations = false, benefit = 10, tensorrt_major_version = 10>, #plan.host_cluster<benefit = 9>]} {
  func.func @main(%arg0: tensor<2xf32> {plan.shape_profile = #plan.bounds<shape, [2], [2]>}) -> tensor<2xf32> {
    return %arg0 : tensor<2xf32>
  }
}

After the bufferization passes, we have:

module @ins_x_outs_t9_1 attributes {executor.process_grid_shape = array<i64: 1, 1>, plan.cluster_kinds = [#plan.tensorrt_cluster<disallow_shape_tensor_calculations = false, benefit = 10, tensorrt_major_version = 10>, #plan.host_cluster<benefit = 9>]} {
  func.func @main(%arg0: memref<2xf32, #plan.memory_space<device>> {plan.shape_profile = #plan.bounds<shape, [2], [2]>}) -> memref<2xf32, #plan.memory_space<device>> {
    return %arg0 : memref<2xf32, #plan.memory_space<device>>
  }
}

However, after the DropEquivalentBufferResults pass, we lose the function return value:

module @ins_x_outs_t9_1 attributes {executor.process_grid_shape = array<i64: 1, 1>, plan.cluster_kinds = [#plan.tensorrt_cluster<disallow_shape_tensor_calculations = false, benefit = 10, tensorrt_major_version = 10>, #plan.host_cluster<benefit = 9>]} {
  func.func @main(%arg0: memref<2xf32, #plan.memory_space<device>> {plan.shape_profile = #plan.bounds<shape, [2], [2]>}) {
    return
  }
}
@christopherbate
Copy link
Collaborator

christopherbate commented Feb 12, 2025

We need to look at the Ir after plan-alloc-tensors pass. I would say that the output you showed after plan-bufferization is actually incorrect.

We never discussed whether it's OK for a result or output arg to alias an input, but I would guess that it is not OK. therefore, we should be making a copy of arg0. If force-entrypoint-return-allocs was off and there was a %arg1 destination argument, then we would have bufferization.alloc_tensor copy(...) or use of bufferization.materialize_in_destination to make the copy. If force-entrypoint-return-allocs=true, then we need to be inserting copies if an argument is being returned. This is all being done in the plan-alloc-tensors pass.

I believe that this case is handled correctly in the case of force-entrypoint-return-allocs=false. If so, then we just need to handle the other case.

@pranavm-nvidia
Copy link
Collaborator Author

@christopherbate Here's the IR after the plan-alloc-tensors pass for reference:

// -----// IR Dump After PlanAllocTensorsPass (plan-alloc-tensors) //----- //
module @ins_x_outs_t9_1 attributes {executor.process_grid_shape = array<i64: 1, 1>, plan.cluster_kinds = [#plan.tensorrt_cluster<disallow_shape_tensor_calculations = false, benefit = 10, tensorrt_major_version = 10>, #plan.host_cluster<benefit = 9>]} {
  func.func @main(%arg0: tensor<2xf32> {plan.shape_profile = #plan.bounds<shape, [2], [2]>}) -> tensor<2xf32> {
    return %arg0 : tensor<2xf32>
  }
}

@christopherbate
Copy link
Collaborator

Ok. So if the force-entrypoints-return-allocs is set, we should, for each operand of return which is a block argument,

  func.func @main(%arg0: tensor<2xf32> {plan.shape_profile = #plan.bounds<shape, [2], [2]>}) -> tensor<2xf32> {
    %cpy = bufferization.alloc_tensor copy(%arg0)
    return %cpy : tensor<2xf32>
  }

@pranavm-nvidia pranavm-nvidia added the mlir-tensorrt Pull request for the mlir-tensorrt project label Feb 13, 2025
@pranavm-nvidia
Copy link
Collaborator Author

After discussing offline, we can fix this by adding an attribute to block arguments: bufferization.writable = false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir-tensorrt Pull request for the mlir-tensorrt project
Projects
None yet
Development

No branches or pull requests

2 participants