No-op slices cause executables to drop outputs #496

pranavm-nvidia · 2025-02-12T19:31:19Z

If we have a no-op slice as follows:

def func(x):
    return x[:2]

fast_func = tp.compile(func, args=[tp.InputInfo((2,), dtype=tp.float32)])

MLIR-TRT correctly optimizes the generated StableHLO into an identity op, so we end up with:

module @ins_x_outs_t9_1 attributes {executor.process_grid_shape = array<i64: 1, 1>, plan.cluster_kinds = [#plan.tensorrt_cluster<disallow_shape_tensor_calculations = false, benefit = 10, tensorrt_major_version = 10>, #plan.host_cluster<benefit = 9>]} {
  func.func @main(%arg0: tensor<2xf32> {plan.shape_profile = #plan.bounds<shape, [2], [2]>}) -> tensor<2xf32> {
    return %arg0 : tensor<2xf32>
  }
}

After the bufferization passes, we have:

module @ins_x_outs_t9_1 attributes {executor.process_grid_shape = array<i64: 1, 1>, plan.cluster_kinds = [#plan.tensorrt_cluster<disallow_shape_tensor_calculations = false, benefit = 10, tensorrt_major_version = 10>, #plan.host_cluster<benefit = 9>]} {
  func.func @main(%arg0: memref<2xf32, #plan.memory_space<device>> {plan.shape_profile = #plan.bounds<shape, [2], [2]>}) -> memref<2xf32, #plan.memory_space<device>> {
    return %arg0 : memref<2xf32, #plan.memory_space<device>>
  }
}

However, after the DropEquivalentBufferResults pass, we lose the function return value:

module @ins_x_outs_t9_1 attributes {executor.process_grid_shape = array<i64: 1, 1>, plan.cluster_kinds = [#plan.tensorrt_cluster<disallow_shape_tensor_calculations = false, benefit = 10, tensorrt_major_version = 10>, #plan.host_cluster<benefit = 9>]} {
  func.func @main(%arg0: memref<2xf32, #plan.memory_space<device>> {plan.shape_profile = #plan.bounds<shape, [2], [2]>}) {
    return
  }
}

The text was updated successfully, but these errors were encountered:

christopherbate · 2025-02-12T20:27:00Z

We need to look at the Ir after plan-alloc-tensors pass. I would say that the output you showed after plan-bufferization is actually incorrect.

We never discussed whether it's OK for a result or output arg to alias an input, but I would guess that it is not OK. therefore, we should be making a copy of arg0. If force-entrypoint-return-allocs was off and there was a %arg1 destination argument, then we would have bufferization.alloc_tensor copy(...) or use of bufferization.materialize_in_destination to make the copy. If force-entrypoint-return-allocs=true, then we need to be inserting copies if an argument is being returned. This is all being done in the plan-alloc-tensors pass.

I believe that this case is handled correctly in the case of force-entrypoint-return-allocs=false. If so, then we just need to handle the other case.

pranavm-nvidia · 2025-02-12T20:33:54Z

@christopherbate Here's the IR after the plan-alloc-tensors pass for reference:

// -----// IR Dump After PlanAllocTensorsPass (plan-alloc-tensors) //----- //
module @ins_x_outs_t9_1 attributes {executor.process_grid_shape = array<i64: 1, 1>, plan.cluster_kinds = [#plan.tensorrt_cluster<disallow_shape_tensor_calculations = false, benefit = 10, tensorrt_major_version = 10>, #plan.host_cluster<benefit = 9>]} {
  func.func @main(%arg0: tensor<2xf32> {plan.shape_profile = #plan.bounds<shape, [2], [2]>}) -> tensor<2xf32> {
    return %arg0 : tensor<2xf32>
  }
}

christopherbate · 2025-02-12T20:37:27Z

Ok. So if the force-entrypoints-return-allocs is set, we should, for each operand of return which is a block argument,

  func.func @main(%arg0: tensor<2xf32> {plan.shape_profile = #plan.bounds<shape, [2], [2]>}) -> tensor<2xf32> {
    %cpy = bufferization.alloc_tensor copy(%arg0)
    return %cpy : tensor<2xf32>
  }

pranavm-nvidia · 2025-02-18T22:47:41Z

After discussing offline, we can fix this by adding an attribute to block arguments: bufferization.writable = false.

pranavm-nvidia added the mlir-tensorrt Pull request for the mlir-tensorrt project label Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No-op slices cause executables to drop outputs #496

No-op slices cause executables to drop outputs #496

pranavm-nvidia commented Feb 12, 2025

christopherbate commented Feb 12, 2025 •

edited

Loading

pranavm-nvidia commented Feb 12, 2025

christopherbate commented Feb 12, 2025

pranavm-nvidia commented Feb 18, 2025

No-op slices cause executables to drop outputs #496

No-op slices cause executables to drop outputs #496

Comments

pranavm-nvidia commented Feb 12, 2025

christopherbate commented Feb 12, 2025 • edited Loading

pranavm-nvidia commented Feb 12, 2025

christopherbate commented Feb 12, 2025

pranavm-nvidia commented Feb 18, 2025

christopherbate commented Feb 12, 2025 •

edited

Loading