[ttl] The one python bindings PR #159

zoecarver · 2025-12-23T01:03:55Z

Port the python bindings from d2m to ttl. Refactor to remove existing runtime tests and replace them with parameterized pytest. Write new lit tests.

Removed the metal runtime path and all d2m logic, renamed accordingly.

Overhauled operators.py to auto generate almost everything (kind of cool).

Filed a few issues for patterns we don't support, there is still more work to do:

get pytest running (hangs right now, maybe can defer)
get lit working natively (hacked right now because I lost the battle with lit config, required)
add a large-ish fused kernel (follow up change likely)

zoecarver · 2025-12-24T00:35:02Z

lib/Dialect/TTL/IR/TTLOps.cpp

+  // TODO: Revisit shape rank validation for TTNN tensors.
+  // TTNN tensors have 4D device shape (grid + shard) while CBs have 2D shard
+  // shape. For now, only validate element types match. The relationship between
+  // tensor shape and CB shape needs further investigation.


@brnorris03 curious to hear your thoughts on this

I am fine with doing this validation later in a separate PR. I myself need to do a bit more reading to understand that fully.

zoecarver · 2025-12-24T00:35:28Z

python/gen_elementwise.py

@@ -0,0 +1,142 @@
+#!/usr/bin/env python3


Also curious to hear your thoughts on this

Definitely good to generate from the defs and eliminate manual additions

zoecarver · 2025-12-24T00:37:17Z

python/ttlang/ttl_api.py

+    # TODO(XX): Fix TensorAccessorArgs CTA offsets. C++ emits placeholder 42+idx,
+    # replace with actual offset = idx * args_per_tensor.
+    if args_per_tensor > 0:


We are going to need to totally overhaul this, but I think that should maybe come as a post-commit fix if we can tolerate this awful hack for a bit.

zoecarver · 2025-12-24T00:37:57Z

test/lit.cfg.py

    )
 )

+# Run tests via tt-lang-hw-sim VM (for tests requiring simulator)


Temporary, will remove once I build up the courage to fight lit configs again

zoecarver · 2025-12-24T00:38:52Z

test/python/test_elementwise_ops.py

+# UNSUPPORTED: system-darwin
+# RUN: %python -m pytest %s -v
+
+# NOT YET RUNNING this is just a placeholder for how this could look.


Also would love your thoughts on this pytest strategy.

I put together some pytest scaffolding for the middle end (but re-usable for DSL inputs, too) at #167 -- it should enable very minimalistic test definitions (and single ops fully auto-generated, as you already started here). I think it would be great to have a single test base that does all the logistics of compiling and running.

In any case, after you merge this, I should be able to consolidate both the DSL-based and middle end to end tests if the design works as expected.

phizalev-TT · 2025-12-24T01:08:50Z

python/ttlang/layouts.py

+    )
+
+
 def create_stream_layout_for_input(


Should this be removed?

Good catch. Was able to remove a lot of stuff in the last commit :)

phizalev-TT · 2025-12-24T01:09:47Z

python/ttlang/semaphore.py

            core: Target core coordinates for multicast
            mcast: Multicast dimensions
        """
        return d2m.semaphore_set(


Do we have corresponding ttl ops?

Not yet, I suggest we do this after the fact given the size of this PR and the prerequisite of semaphore support landing.

brnorris03 · 2025-12-25T21:46:54Z

lib/Dialect/TTL/Transforms/ConvertTTLToTTKernel.cpp

would be good to create a separate PR for this bugfix

brnorris03 · 2025-12-26T03:12:08Z

lib/Dialect/TTL/Transforms/ConvertTTLToTTKernel.cpp

+  // TODO(XX): Placeholder CTA offsets - Python regex replaces 42+argIdx
+  // with actual offsets from ttnn.TensorAccessorArgs.get_compile_time_args().
+  auto argIdx = getTensorFuncArgIndex(tensor);
+  int32_t ctaPlaceholder =
+      42 + (failed(argIdx) ? 0 : static_cast<int32_t>(*argIdx));
+  constexpr int32_t crtaPlaceholder = 0;


Regex replacement in the C++? Sounds kind of risky... Can you instead track the actual compile-time args offsets during the MLIR generation. So something like this:

Compute args_per_tensor early in _compile_and_run_kernel (where actual tensors are available):

args_per_tensor = len(ttnn.TensorAccessorArgs(args[0]).get_compile_time_args())

Pass it through the compilation pipeline as a kwarg to TTLGenericCompiler.__init__ and store it.

In ttl_ast.py _emit_entry, compute CTA offsets as tensor arguments are collected:

cta_offset = tensor_index * self.args_per_tensor

Attach each CTA offset as a function argument attribute when creating the func.FuncOp:

self.func_entry.setArgAttr(i, "ttl.cta_offset", IntegerAttr.get(..., cta_offset))

When compiling, in materializeTensorAccessor, we'd just read the attribute from the function argument

Although this may be overkill? I re-read the documentation and played a bit with my examples, and you always start the args at 0, then call next_compile_time_args_offset() for the rest of the args, e.g., for a simple binary op example below CTA offset is always 0. When wouldn't it be 0 for ttl dm threads?

What runtime parameters do we have at the moment?

const uint32_t a_addr = get_arg_val<uint32_t>(0); const uint32_t b_addr = get_arg_val<uint32_t>(1); const uint32_t n_tiles = get_arg_val<uint32_t>(2); const uint32_t start_id = get_arg_val<uint32_t>(3); // Starting tile ID for this core constexpr auto cb_in0 = tt::CBIndex::c_0; constexpr auto cb_in1 = tt::CBIndex::c_1; const uint32_t tile_size_bytes = get_tile_size(cb_in0); constexpr auto args_a = TensorAccessorArgs<0>(); constexpr auto args_b = TensorAccessorArgs<args_a.next_compile_time_args_offset()>(); const auto a = TensorAccessor(args_a, a_addr, tile_size_bytes); const auto b = TensorAccessor(args_b, b_addr, tile_size_bytes);

I think in our case, this should always be correct, right? And if not 0, you control the args, so you would know which is the first tensor (accessor) arg., but at the moment I don't think we have any others, so should be 0.

// First accessor at offset 0 auto args1 = TensorAccessorArgs<0>(); // Second accessor starts after first one's CTA args auto args2 = TensorAccessorArgs<args1.next_compile_time_args_offset()>(); // Third accessor starts after second one's CTA args auto args3 = TensorAccessorArgs<args2.next_compile_time_args_offset()>(); ...

or if we have runtime args at some point:

// First tensor accessor starts at CRTA offset 0 auto args_src = TensorAccessorArgs<0, 0>(); // Second tensor accessor starts after the first one's CRTA args auto args_dst = TensorAccessorArgs<args_src.next_compile_time_args_offset(), args_src.next_common_runtime_args_offset()>();

I checked and ttkernel doesn't support this yet, but I can add it if needed (first would try to figure out if we can do entirely in tt-lang).

I propose that this (hacky) fix is split out from this PR and merged first. In the meanwhile I am working on adding support for CTA and CRTA argument chaining as shown above (to tt-mlir). When that is merged, we can get rid of the hack and update the AST code generator to include the base cta and crta indices to the dm threads.

brnorris03

Much offline discussion on the tensor accessors, but nothing that would stop merging this. I would like to get the current e2e working within some limited context (with known next steps to remove limitations).

brnorris03 · 2025-12-26T19:42:06Z

python/ttlang/ttl_api.py

+    args_per_tensor = len(ttnn.TensorAccessorArgs(args[0]).get_compile_time_args())
+
    # Write all kernels to /tmp for debugging
    for name, thread_type in kernel_info:
        cpp_source = ttkernel_to_cpp_by_name(module, name)
-        _write_kernel_to_tmp(name, cpp_source)
+        _write_kernel_to_tmp(name, cpp_source, args_per_tensor)


Assumes args_per_tensor is the same for all which is not necessarily true, could have different layout tensors in non-elementwise kinds of functions (but ok for now)

Now just num_tensors = len(args), does that sound good?

brnorris03 · 2025-12-26T20:01:33Z

test/python/invalid/invalid_3d_grid.py

minor: consider putting the invalid test in a subdirectory

brnorris03 · 2025-12-26T20:06:06Z

test/python/simple_add_with_stmt.py

+    lhs_accessor = TensorAccessor(lhs)
+    rhs_accessor = TensorAccessor(rhs)
+    out_accessor = TensorAccessor(out)


I am starting to wonder whether having these be explicitly specified by the user has any value at all? If we generate them, then we can ensure consistency among reads/compute/writes. I guess that's more of a discussion point, not a thing you need to address in this PR.

I think we agreed that TAs can go away (in the spec at least)

brnorris03 · 2025-12-26T20:07:25Z

python/TTLModule.cpp

+  // CircularBufferType
+  //===--------------------------------------------------------------------===//
+
+  tt_type_class<CircularBufferType>(m, "CircularBufferType")


thank you for adding it!

brnorris03

LGTM!

…ring d2m logic

run lit directly.

Base automatically changed from bnorris/integrate-store to main December 23, 2025 17:55

zoecarver force-pushed the zoecarver/ttl-python-bindings-take-three branch 2 times, most recently from e8ce977 to 2fe77d6 Compare December 24, 2025 00:33

zoecarver commented Dec 24, 2025

View reviewed changes

zoecarver marked this pull request as ready for review December 24, 2025 00:40

zoecarver requested a review from a team as a code owner December 24, 2025 00:40

phizalev-TT requested changes Dec 24, 2025

View reviewed changes

brnorris03 reviewed Dec 25, 2025

View reviewed changes

lib/Dialect/TTL/Transforms/ConvertTTLToTTKernel.cpp

Copy link

Contributor

brnorris03 Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to create a separate PR for this bugfix

brnorris03 reviewed Dec 26, 2025

View reviewed changes

brnorris03 approved these changes Dec 26, 2025

View reviewed changes

brnorris03 approved these changes Dec 27, 2025

View reviewed changes

phizalev-TT approved these changes Dec 27, 2025

View reviewed changes

zoecarver enabled auto-merge (squash) December 27, 2025 00:24

zoecarver added 13 commits December 26, 2025 17:24

Rebase all the python bindings on ttl (instead of d2m)

ce8002b

super hack to get tensor accessor arg mapping working

1a3b66f

remove other hacks

604b87f

Rename d2m files and references to ttl

52a53b6

Add CircularBufferType Python bindings

0b0014b

Remove comments and deduplicate cleanup passes

0c31188

Remove deprecated dma operation

494cc99

Update dram test to use copy and support DRAM memory space

2c0c776

Remove metal runtime mode and ttnn_interop parameter

820749f

Remove unused imports, constants, and dead code

127ac41

Remove unused codegen.py and D2M dialect

8fbe579

Overhaul operators.py: auto generate elementwise ops and remove linge…

731cd7b

…ring d2m logic

stash

cb0d675

zoecarver added 12 commits December 26, 2025 17:24

Add lit tests

af1f9b6

a few more test patterns (two xfailed with gh issues)

ee4da92

Smattering of invalid tests

b447f3c

pre commit

a1384f9

remove stream infra and more layout and utils that are no longer needed

ecfe92b

bring back smoketest

804d552

Fix compile_time_args and common_runtime_args for TTNN interop kernels

21cfefc

Get pytest working

bdde5de

get lit tests passing

66d4b27

Remove run test sim hack, now pipe through env vars correctly so we can

c5e17ed

run lit directly.

review comment: move invalid tests

e201a15

pre commit

c4ef41e

zoecarver force-pushed the zoecarver/ttl-python-bindings-take-three branch from 126e1bd to c4ef41e Compare December 27, 2025 00:24

zoecarver merged commit 3fd3933 into main Dec 27, 2025
5 checks passed

zoecarver deleted the zoecarver/ttl-python-bindings-take-three branch December 27, 2025 00:27

brnorris03 mentioned this pull request Jan 2, 2026

metal/tt-lang single and multi-core matmul #67

Merged

7 tasks

[ttl] The one python bindings PR #159

[ttl] The one python bindings PR #159

Uh oh!

Conversation

zoecarver commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brnorris03 Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brnorris03 Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brnorris03 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zoecarver Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brnorris03 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zoecarver commented Dec 23, 2025 •

edited

Loading

brnorris03 Dec 26, 2025 •

edited

Loading

brnorris03 Dec 26, 2025 •

edited

Loading

zoecarver Dec 26, 2025 •

edited

Loading