-
Notifications
You must be signed in to change notification settings - Fork 1
Implement tensor_slice op and dynamic tensor indexing #213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
brnorris03
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails for me on qb (some mlir lit tests and it seems to not have the ttnn build fixes), is it ready for review? Tried my usual clean build + test with a pre-built tt-mlir that includes ttnn jit:
rm -rf build; deactivate; cmake -GNinja -B build -DTTMLIR_BUILD_DIR=$HOME/tt/tt-mlir/build-ttlang && source build/env/activate && ninja -C build && time ninja -C build check-ttlang-alltt-lang-cursor git:(zoecarver/dynamic-tensor-subscript) ✗ rm -rf build; deactivate; cmake -GNinja -B build -DTTMLIR_BUILD_DIR=$HOME/tt/tt-mlir/build-ttlang && source build/env/activate && ninja -C build && time ninja -C build check-ttlang-all
got [1/1] Skipping ttlang Python lit tests (TTNN not available) and
Failed Tests (8):
TTLang :: ttlang/Conversion/TTLToTTKernel/compute_fused_chain.mlir
TTLang :: ttlang/Conversion/TTLToTTKernel/dma_single_core.mlir
TTLang :: ttlang/Translate/TTLToCpp/compute_fused_chain_to_cpp.mlir
TTLang :: ttlang/Translate/TTLToCpp/compute_with_data_movement.mlir
TTLang :: ttlang/Translate/TTLToCpp/dma_loop_multi_tile_nontrivial_cb.mlir
TTLang :: ttlang/Translate/TTLToCpp/dma_multi_tile_batched_in_user_loop.mlir
TTLang :: ttlang/Translate/TTLToCpp/dma_multi_tile_read.mlir
TTLang :: ttlang/Translate/TTLToCpp/dma_multi_tile_same_layout_different_cb.mlir
|
This is ready for review. I will look into the test failures. |
7367541 to
55b68d2
Compare
|
High level question first -- why is the |
d601524 to
821b6e4
Compare
…unch of dead code
821b6e4 to
e9061a4
Compare
a141858 to
e925932
Compare
e925932 to
e91abda
Compare
| %c0 = arith.constant 0 : index | ||
| %c1 = arith.constant 1 : index | ||
| %slice = ttl.tensor_slice %tensor[%c0, %c1] | ||
| : tensor<64x64xbf16, #layout> -> !ttl.tensor_slice<tensor<64x64xbf16, #layout>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have the return type also be a tensor of the new resulting slice size, in this case it should be 32x32xbf16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think theres a benefit to a tensor_slice value type, if a tensor is a result of a slice, it is on the optimization to check the parent of the value. otherwise this adds unnecessary baggage
brnorris03
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good in general, thank you
| def visit_Subscript(self, node): | ||
| """Handle tensor[row, col] indexing for TTL tensor slices.""" | ||
| tbl = self._var_exists(node.value.id) | ||
| if not tbl: | ||
| self._raise_error(node, f"Unknown variable: {node.value.id}") | ||
|
|
||
| tensor = tbl[node.value.id] | ||
| if not isinstance(getattr(tensor, "type", None), RankedTensorType): | ||
| self._raise_error(node, "TTL only supports subscripting tensors") | ||
|
|
||
| if isinstance(node.slice, ast.Tuple): | ||
| indices = [self._build_index_value(elt) for elt in node.slice.elts] | ||
| else: | ||
| indices = [self._build_index_value(node.slice)] | ||
|
|
||
| return (tensor, indices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the constraints on the subscripts (row, col) -- can they be arbitrary expressions, e.g., calls to range? Is node.slice a python slice object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an AST object, not a python object. We don't handle ranges today.
Currently supported:
- Integer literals: tensor[0, 1] → ast.Constant nodes → arith.ConstantOp with IndexType
- Loop induction variables: tensor[r, c] where r, c come from for r in range(N) → already IndexType from scf.ForOp
- Arithmetic expressions: tensor[i+1, j*2] → visits BinOp, produces i64, then casts to index via IndexCastOp
What happens with a slice like tensor[0:2, 0:3]?
- node.slice would be an ast.Tuple containing two ast.Slice objects
- _build_index_value is called on each ast.Slice
- ast.Slice is not ast.Constant, so it calls self.visit(node) on the Slice
- No visit_Slice method exists
- ast.Slice is not in supported_nodes (base_ast.py:107-128)
- Error: NotImplementedError("visit Slice not supported")
I vote yes, especially if we plan on utilizing memref and bufferization dialects or something similar, but it def depends on the lowering after, and what lowers into ttkernel |
arichinsTT
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no tensor_slice value type, have it return reduced tensor, which should remove a lot of cases. I think it is worth while to have handling for multiple tile slices from the get go.
Based on offline discussion, this is going to add a lot of logic and checking in both lowering and in building the static + dynamic offsets in python. You can see the diff here: https://github.com/tenstorrent/tt-lang/compare/zoecarver/dynamic-tensor-subscript...zoecarver/ttl-tensor-slice-to-mlir-tensor-extract?expand=1 My recommendation is to land this, and then investigate how to move to tensor.extract_slice after the fact, maybe there is a cleaner way to map it. Is that OK with you, Alex? Regarding your comment about removing TensorSliceType and using tensor directly, the biggest inconsistency I see with that is the layout. If we use tensor type directly, it will have to point to a layout with a different shape: I don't think this is the end of the world, it won't affect lowering today, but it might be confusing in the future if we wanted to eg validate that tensor shape == layout shape or use the layout to make some decision. I'm generally against this kind of defensive design, but I also think the separate type adds some semantic clarity by explicitly saying "this is a slice". Given this, what do you think? Do you still want me to remove TensorSliceType and just use the tensor type? I'm happy either way. |
|
Oops accidentally pushed to the wrong branch 🫣 All comments have been addressed or responded to. Thank you! |
brnorris03
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thank you for updating!
|
|
||
| // Copy A to CB0 | ||
| %xf_a = ttl.copy %a, %cb0 : (tensor<64x64xf32, #layout>, !ttl.cb<[2, 2], f32, 2>) -> !ttl.transfer_handle<read> | ||
| %slice_a = ttl.tensor_slice %a[%c0, %c0] : tensor<64x64xf32, #layout> -> tensor<64x64xf32, #layout> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this slice is capturing more than a single tile, based on the output type and the copy in a 2x2 cb, is this allowed with index slicing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should be allowed to copy 2x2 tiles at a time if the CB shape is 2x2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like range based slicing is a later todo
049da97 to
a3a5d90
Compare
|
When testing on QB with CB shape (>1, >1) it hangs. |
arichinsTT
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for discussion! looks good
|
|
||
| // Copy A to CB0 | ||
| %xf_a = ttl.copy %a, %cb0 : (tensor<64x64xf32, #layout>, !ttl.cb<[2, 2], f32, 2>) -> !ttl.transfer_handle<read> | ||
| %slice_a = ttl.tensor_slice %a[%c0, %c0] : tensor<64x64xf32, #layout> -> tensor<64x64xf32, #layout> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like range based slicing is a later todo
Adds support for
tensor[row, col]syntax in TTL kernels to access specific tiles within multi-tile tensors. Previously, tensor indexing was restricted to[0, 0].TensorSliceTypeandTensorSliceOpto the TTL dialect for representing tile-indexed tensor viewsttl.tensor_sliceops when tensor subscript syntax is usedConvertTTLToTTKernelto compute correct linear tile offsets (row * num_cols + col) for NOC read/write operationssimple_tensor_slice.pylit test andpytest_tensor_slice.pyparameterized test covering 1x1 through 16x16 tile shapes