Skip to content

Conversation

@zoecarver
Copy link
Contributor

Adds support for tensor[row, col] syntax in TTL kernels to access specific tiles within multi-tile tensors. Previously, tensor indexing was restricted to [0, 0].

  • Add TensorSliceType and TensorSliceOp to the TTL dialect for representing tile-indexed tensor views
  • Update Python DSL to emit ttl.tensor_slice ops when tensor subscript syntax is used
  • Add lowering in ConvertTTLToTTKernel to compute correct linear tile offsets (row * num_cols + col) for NOC read/write operations
  • Add simple_tensor_slice.py lit test and pytest_tensor_slice.py parameterized test covering 1x1 through 16x16 tile shapes
  • Based on Fix multi-tile CB addressing and add elementwise shape sweep tests #212

@zoecarver zoecarver requested a review from a team as a code owner January 6, 2026 16:37
@zoecarver
Copy link
Contributor Author

test/python/test_tensor_slice_indices.py .................................................................................... [ 18%]
............................................................................................................................. [ 46%]
................sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 74%]
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss          [100%]

================================================= 225 passed, 225 skipped in 18.08s =================================================

Copy link
Contributor

@brnorris03 brnorris03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails for me on qb (some mlir lit tests and it seems to not have the ttnn build fixes), is it ready for review? Tried my usual clean build + test with a pre-built tt-mlir that includes ttnn jit:

rm -rf build; deactivate; cmake -GNinja -B build -DTTMLIR_BUILD_DIR=$HOME/tt/tt-mlir/build-ttlang && source build/env/activate &&  ninja -C build && time ninja -C build check-ttlang-alltt-lang-cursor git:(zoecarver/dynamic-tensor-subscript) ✗ rm -rf build; deactivate; cmake -GNinja -B build -DTTMLIR_BUILD_DIR=$HOME/tt/tt-mlir/build-ttlang && source build/env/activate &&  ninja -C build && time ninja -C build check-ttlang-all

got [1/1] Skipping ttlang Python lit tests (TTNN not available) and

Failed Tests (8):
  TTLang :: ttlang/Conversion/TTLToTTKernel/compute_fused_chain.mlir
  TTLang :: ttlang/Conversion/TTLToTTKernel/dma_single_core.mlir
  TTLang :: ttlang/Translate/TTLToCpp/compute_fused_chain_to_cpp.mlir
  TTLang :: ttlang/Translate/TTLToCpp/compute_with_data_movement.mlir
  TTLang :: ttlang/Translate/TTLToCpp/dma_loop_multi_tile_nontrivial_cb.mlir
  TTLang :: ttlang/Translate/TTLToCpp/dma_multi_tile_batched_in_user_loop.mlir
  TTLang :: ttlang/Translate/TTLToCpp/dma_multi_tile_read.mlir
  TTLang :: ttlang/Translate/TTLToCpp/dma_multi_tile_same_layout_different_cb.mlir

@zoecarver
Copy link
Contributor Author

This is ready for review. I will look into the test failures.

@zoecarver zoecarver force-pushed the zoecarver/sweep-over-shapes branch 6 times, most recently from 7367541 to 55b68d2 Compare January 8, 2026 15:27
Base automatically changed from zoecarver/sweep-over-shapes to main January 8, 2026 15:39
@brnorris03
Copy link
Contributor

brnorris03 commented Jan 8, 2026

High level question first -- why is the tensor dialect not appropriate to use for this (necessitating custom ops)? For example, tensor.extract_slice for extracting slices.

@zoecarver zoecarver force-pushed the zoecarver/dynamic-tensor-subscript branch from d601524 to 821b6e4 Compare January 8, 2026 16:12
@zoecarver zoecarver force-pushed the zoecarver/dynamic-tensor-subscript branch from 821b6e4 to e9061a4 Compare January 8, 2026 16:17
@zoecarver zoecarver force-pushed the zoecarver/dynamic-tensor-subscript branch from a141858 to e925932 Compare January 8, 2026 21:20
@zoecarver zoecarver force-pushed the zoecarver/dynamic-tensor-subscript branch from e925932 to e91abda Compare January 8, 2026 21:29
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%slice = ttl.tensor_slice %tensor[%c0, %c1]
: tensor<64x64xbf16, #layout> -> !ttl.tensor_slice<tensor<64x64xbf16, #layout>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have the return type also be a tensor of the new resulting slice size, in this case it should be 32x32xbf16

Copy link
Contributor

@arichinsTT arichinsTT Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think theres a benefit to a tensor_slice value type, if a tensor is a result of a slice, it is on the optimization to check the parent of the value. otherwise this adds unnecessary baggage

Copy link
Contributor

@brnorris03 brnorris03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good in general, thank you

Comment on lines +165 to +180
def visit_Subscript(self, node):
"""Handle tensor[row, col] indexing for TTL tensor slices."""
tbl = self._var_exists(node.value.id)
if not tbl:
self._raise_error(node, f"Unknown variable: {node.value.id}")

tensor = tbl[node.value.id]
if not isinstance(getattr(tensor, "type", None), RankedTensorType):
self._raise_error(node, "TTL only supports subscripting tensors")

if isinstance(node.slice, ast.Tuple):
indices = [self._build_index_value(elt) for elt in node.slice.elts]
else:
indices = [self._build_index_value(node.slice)]

return (tensor, indices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the constraints on the subscripts (row, col) -- can they be arbitrary expressions, e.g., calls to range? Is node.slice a python slice object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an AST object, not a python object. We don't handle ranges today.

Currently supported:

  1. Integer literals: tensor[0, 1] → ast.Constant nodes → arith.ConstantOp with IndexType
  2. Loop induction variables: tensor[r, c] where r, c come from for r in range(N) → already IndexType from scf.ForOp
  3. Arithmetic expressions: tensor[i+1, j*2] → visits BinOp, produces i64, then casts to index via IndexCastOp

What happens with a slice like tensor[0:2, 0:3]?

  1. node.slice would be an ast.Tuple containing two ast.Slice objects
  2. _build_index_value is called on each ast.Slice
  3. ast.Slice is not ast.Constant, so it calls self.visit(node) on the Slice
  4. No visit_Slice method exists
  5. ast.Slice is not in supported_nodes (base_ast.py:107-128)
  6. Error: NotImplementedError("visit Slice not supported")

@arichinsTT
Copy link
Contributor

High level question first -- why is the tensor dialect not appropriate to use for this (necessitating custom ops)? For example, tensor.extract_slice for extracting slices.

I vote yes, especially if we plan on utilizing memref and bufferization dialects or something similar, but it def depends on the lowering after, and what lowers into ttkernel

Copy link
Contributor

@arichinsTT arichinsTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no tensor_slice value type, have it return reduced tensor, which should remove a lot of cases. I think it is worth while to have handling for multiple tile slices from the get go.

@zoecarver
Copy link
Contributor Author

High level question first -- why is the tensor dialect not appropriate to use for this (necessitating custom ops)? For example, tensor.extract_slice for extracting slices.
I vote yes, especially if we plan on utilizing memref and bufferization dialects or something similar, but it def depends on the lowering after, and what lowers into ttkernel

Based on offline discussion, this is going to add a lot of logic and checking in both lowering and in building the static + dynamic offsets in python. You can see the diff here: https://github.com/tenstorrent/tt-lang/compare/zoecarver/dynamic-tensor-subscript...zoecarver/ttl-tensor-slice-to-mlir-tensor-extract?expand=1

My recommendation is to land this, and then investigate how to move to tensor.extract_slice after the fact, maybe there is a cleaner way to map it. Is that OK with you, Alex?

Regarding your comment about removing TensorSliceType and using tensor directly, the biggest inconsistency I see with that is the layout. If we use tensor type directly, it will have to point to a layout with a different shape:

  #ttnn_layout = #ttnn.ttnn_layout<..., memref<1x4x!ttcore.tile<32x32, bf16>, #l1>, ...>
                                         ^^^^ shape encoded here

I don't think this is the end of the world, it won't affect lowering today, but it might be confusing in the future if we wanted to eg validate that tensor shape == layout shape or use the layout to make some decision. I'm generally against this kind of defensive design, but I also think the separate type adds some semantic clarity by explicitly saying "this is a slice".

Given this, what do you think? Do you still want me to remove TensorSliceType and just use the tensor type? I'm happy either way.

@zoecarver
Copy link
Contributor Author

Oops accidentally pushed to the wrong branch 🫣

All comments have been addressed or responded to. Thank you!

Copy link
Contributor

@brnorris03 brnorris03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thank you for updating!


// Copy A to CB0
%xf_a = ttl.copy %a, %cb0 : (tensor<64x64xf32, #layout>, !ttl.cb<[2, 2], f32, 2>) -> !ttl.transfer_handle<read>
%slice_a = ttl.tensor_slice %a[%c0, %c0] : tensor<64x64xf32, #layout> -> tensor<64x64xf32, #layout>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this slice is capturing more than a single tile, based on the output type and the copy in a 2x2 cb, is this allowed with index slicing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should be allowed to copy 2x2 tiles at a time if the CB shape is 2x2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like range based slicing is a later todo

@zoecarver zoecarver force-pushed the zoecarver/dynamic-tensor-subscript branch from 049da97 to a3a5d90 Compare January 9, 2026 22:15
@phizalev-TT
Copy link
Contributor

phizalev-TT commented Jan 11, 2026

When testing on QB with CB shape (>1, >1) it hangs.

Copy link
Contributor

@arichinsTT arichinsTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for discussion! looks good


// Copy A to CB0
%xf_a = ttl.copy %a, %cb0 : (tensor<64x64xf32, #layout>, !ttl.cb<[2, 2], f32, 2>) -> !ttl.transfer_handle<read>
%slice_a = ttl.tensor_slice %a[%c0, %c0] : tensor<64x64xf32, #layout> -> tensor<64x64xf32, #layout>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like range based slicing is a later todo

@zoecarver zoecarver merged commit 3ddf92f into main Jan 12, 2026
5 checks passed
@zoecarver zoecarver deleted the zoecarver/dynamic-tensor-subscript branch January 12, 2026 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants