[TTL] Fuse DMA tile loops via pre-conversion grouping #95

brnorris03 · 2025-12-15T07:44:32Z

What?

Restructure TTL-to-TTKernel lowering to emit setup ops before tile loops, enabling DMA tile loop fusion directly during conversion.

Why?

Previously, setup ops (tensor accessor creation, CB pointer retrieval) were emitted inline before each tile loop, blocking the FuseSiblingTileLoops pass from fusing adjacent loops. This change emits all setup ops first, then a single fused loop for copies with matching tile grids.

Pre-conversion grouping is more efficient than pattern-based lowering followed by post-hoc fusion. Setup ops are emitted once before the fused loop rather than inside each individual loop.

How?

Add pre-conversion grouping in ConvertTTLToTTKernel.cpp that collects adjacent copy ops with matching tile grid bounds
Emit fused loops directly during conversion (setup block → single tile loop with all DMAs)
Recursively process nested regions (e.g., scf.for loop bodies)
Partial fusion: When dominance fails mid-group, the code splits into subgroups and fuses what it can (e.g., 4 copies with CB bind in middle -> two fused loops instead of four separate loops).
Remove fuse-tile-loops pipeline option (no longer needed)

How to Test?

llvm-lit -sv test/ttlang/Conversion/TTLToTTKernel/
llvm-lit -sv test/ttlang/Translate/TTLToCpp/

Checklist:

Self-reviewed (style, logic)
Added tests (or justified none needed)
PR is small and focused (one task)

…-lowering-fuse-sibling-loops

Added assertSupportedLayoutForTileLoop() function that asserts sharded layouts are not yet supported, with a reference to issue #118 Updated getTileGridShape() to take a Location parameter and call the assertion Updated getTileGridShapeFromValue() to pass v.getLoc() to getTileGridShape()

zoecarver · 2025-12-17T15:43:39Z

Thinking out loud: so the main motivator for the compute op is that we can fuse trivially, but these are datamovement so we can't use a compute op, so we do the fusing more manually? Is there any way we could abstract that further?

zoecarver · 2025-12-17T15:46:34Z

lib/Dialect/TTKernel/Transforms/FuseSiblingTileLoops.cpp

+
+namespace {
+
+static constexpr llvm::StringLiteral kTileLoopMarker = "ttkernel.tile_loop";


nit: duplicate definition

zoecarver · 2025-12-17T15:56:26Z

lib/Dialect/TTKernel/Transforms/FuseSiblingTileLoops.cpp

+}
+
+/// Check if two loops are adjacent in the same block with only constants
+/// between them.


What if these alias?

Lowered UNFUSED (correct): for tile in 0..4: noc_async_read_tile(tile, A, cb0) // issue all reads for tile in 0..4: noc_async_write_tile(tile, cb0, B) // issue all writes to DRAM for tile in 0..4: noc_async_read_tile(tile, B, cb1) // issue all reads from DRAM noc_async_read_barrier() // wait for loop 1 noc_async_write_barrier() // wait for loop 2 - ALL writes to B complete noc_async_read_barrier() // wait for loop 3 Lowered FUSED (loops 2+3 have same bounds): for tile in 0..4: noc_async_read_tile(tile, A, cb0) for tile in 0..4: noc_async_write_tile(tile, cb0, B) // issue write to B[tile] noc_async_read_tile(tile, B, cb1) // immediately read B[tile] - RACE! // write is async, hasn't landed yet noc_async_read_barrier() noc_async_write_barrier() noc_async_read_barrier()

zoecarver · 2025-12-17T15:57:37Z

lib/Dialect/TTL/Transforms/ConvertTTLToTTKernel.cpp

    // corresponding global barrier. Untyped handles are rejected by the
    // verifier, but we also fail the rewrite defensively.
-    auto kind = getTransferKindFromHandleType(adaptor.getXf().getType());
+    auto kind = getTransferKindFromHandleType(op.getXf().getType());


Can you explain this change?

The ttl type is required to be able to figure out the direction.

zoecarver · 2025-12-17T15:59:34Z

test/ttlang/Translate/TTLToCpp/dma_multi_tile_1d_fused.mlir

+// CHECK:     }
+
+// Consecutive barriers deduplicated to single barrier.
+// CHECK:   noc_async_read_barrier();


Check not/check next to make sure this is actually deduplicated?

there are many -NOT checks in other tests... I don't think they need to be everywhere

…-lowering-fuse-sibling-loops

creation, CB pointer retrieval) before tile loops, enabling DMA tile loop fusion directly during conversion. - Add pre-conversion grouping that collects adjacent copy ops with matching tile grid bounds and emits fused loops - Recursively process nested regions (e.g., scf.for loop bodies) - Remove fuse-tile-loops pipeline option (no longer needed) - Update test expectations for fused output

- Add dominance check in emitGroupedCopies to prevent use-before-def when CB/tensor operands are defined between copy operations - Remove TTKernelFuseSiblingTileLoops pass (pre-conversion grouping handles fusion during ConvertTTLToTTKernel) - Add edge case tests for grouping rejection and multi-tile writes - Update dma_single_core.mlir to remove FUSED pipeline checks

…_loop marker

…fails, enabling partial fusion instead of falling back to no fusion. - Add partial_fusion_four_copies test verifying two fused loops are generated when CB bindings break the chain.

zoecarver

LGTM! Thank you!

brnorris03 added 9 commits December 14, 2025 16:01

remove unrealized cast

ddfa470

updte tests -- ops.mlir separate from conversions; simplify lowering

d27b90a

add ttl -> C++ tests

ea38edc

fix tests

0add69a

add tile offset computation; add more tests

470e1fe

add more tests

ecf5b54

hoist tensoraccessor creation to func entry

645caa2

hoist tensoraccessors to beginning of function; update tests

b460baa

fuse sibling dm kernel loops

d548ec6

brnorris03 changed the base branch from main to bnorris/ttl-dm-kernel-lowering December 15, 2025 07:45

move pass to ttkernel dir

99dfbfb

brnorris03 force-pushed the bnorris/ttl-dm-kernel-lowering-fuse-sibling-loops branch from a52dfaa to 99dfbfb Compare December 15, 2025 14:48

add tile loop marker

88f4f60

brnorris03 mentioned this pull request Dec 15, 2025

[ttl] Implement block lowering of ttl data movement threads #94

Merged

3 tasks

Base automatically changed from bnorris/ttl-dm-kernel-lowering to main December 15, 2025 22:12

brnorris03 added 3 commits December 15, 2025 17:58

Merge remote-tracking branch 'origin/main' into bnorris/ttl-dm-kernel…

84e5497

…-lowering-fuse-sibling-loops

post-merge fixes; add ttl-to-ttkernel tests for fused sibling loops

168e0ea

brnorris03 force-pushed the bnorris/ttl-dm-kernel-lowering-fuse-sibling-loops branch from 4806159 to c5913c0 Compare December 16, 2025 03:01

add actual implementation

6d6b1e8

brnorris03 changed the title ~~[ttl] kernel lowering fuse sibling loops in dm threads~~ [ttkernel] Fuse sibling loops in dm threads Dec 16, 2025

brnorris03 marked this pull request as ready for review December 16, 2025 03:33

brnorris03 requested a review from a team as a code owner December 16, 2025 03:33

brnorris03 requested a review from zoecarver December 16, 2025 03:33

brnorris03 changed the title ~~[ttkernel] Fuse sibling loops in dm threads~~ [ttkernel] Fuse generated sibling loops in dm threads Dec 16, 2025

brnorris03 force-pushed the bnorris/ttl-dm-kernel-lowering-fuse-sibling-loops branch 2 times, most recently from e5a511b to 9554082 Compare December 16, 2025 04:04

use tile loop marker and remove it when done; add a couple more tests

d952f07

brnorris03 force-pushed the bnorris/ttl-dm-kernel-lowering-fuse-sibling-loops branch from 9554082 to d952f07 Compare December 16, 2025 04:05

zoecarver reviewed Dec 17, 2025

View reviewed changes

brnorris03 added 4 commits December 24, 2025 15:44

update agents.md

a958363

Merge remote-tracking branch 'origin/main' into bnorris/ttl-dm-kernel…

5c3cc39

…-lowering-fuse-sibling-loops

post-merge fixes

ebdc74b

brnorris03 changed the title ~~[ttkernel] Fuse generated sibling loops in dm threads~~ [TTL] Fuse DMA tile loops via pre-conversion grouping Dec 25, 2025

brnorris03 added 3 commits December 24, 2025 20:24

precommit format

ab5baba

add tests for more edge cases for dma groupings; remove ttkernel.tile…

046cb6c

…_loop marker

brnorris03 force-pushed the bnorris/ttl-dm-kernel-lowering-fuse-sibling-loops branch from 6685250 to 7be41b3 Compare December 25, 2025 06:20

- Improve DMA grouping to split groups into subgroups when dominance …

5429a7c

…fails, enabling partial fusion instead of falling back to no fusion. - Add partial_fusion_four_copies test verifying two fused loops are generated when CB bindings break the chain.

brnorris03 force-pushed the bnorris/ttl-dm-kernel-lowering-fuse-sibling-loops branch from 7be41b3 to 5429a7c Compare December 25, 2025 06:21

brnorris03 added 2 commits December 26, 2025 10:38

clean up

726d562

fix non-portable 1LL literal usage

8f67eec

zoecarver approved these changes Dec 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TTL] Fuse DMA tile loops via pre-conversion grouping #95

[TTL] Fuse DMA tile loops via pre-conversion grouping #95

Uh oh!

brnorris03 commented Dec 15, 2025 •

edited

Loading

Uh oh!

zoecarver commented Dec 17, 2025

Uh oh!

zoecarver Dec 17, 2025

Uh oh!

zoecarver Dec 17, 2025

Uh oh!

zoecarver Dec 17, 2025

Uh oh!

brnorris03 Dec 26, 2025

Uh oh!

zoecarver Dec 17, 2025

Uh oh!

brnorris03 Dec 26, 2025

Uh oh!

zoecarver left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		namespace {

		static constexpr llvm::StringLiteral kTileLoopMarker = "ttkernel.tile_loop";

[TTL] Fuse DMA tile loops via pre-conversion grouping #95

Are you sure you want to change the base?

[TTL] Fuse DMA tile loops via pre-conversion grouping #95

Uh oh!

Conversation

brnorris03 commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Why?

How?

How to Test?

Checklist:

Uh oh!

zoecarver commented Dec 17, 2025

Uh oh!

zoecarver Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

zoecarver Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

zoecarver Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

brnorris03 Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

zoecarver Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

brnorris03 Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

zoecarver left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brnorris03 commented Dec 15, 2025 •

edited

Loading