[TTL] Add block transfer optimization framework with layout analysis #166

brnorris03 · 2025-12-25T04:48:43Z

What?

Adds layout-aware transfer optimization to TTL-to-TTKernel lowering. The lowering analyzes tensor layouts and selects optimal DMA strategies based on data contiguity.

Why?

Block transfers (noc_async_read/noc_async_write) are more efficient than tile-by-tile transfers for row-major data. This enables faster transfers when data is contiguous in memory.

How?

Introduces a ContiguityLevel classification:

FullyContiguous: Row-major + interleaved → single block transfer
RowContiguous: Row-major + sharded → per-row transfers (TODO: [ttl] Implement sharded layout support for emitTileLoop #118)
TileContiguous: Tiled layout → per-tile transfers (default)

The lowering inspects TTNNLayoutAttr to determine if the tensor uses tiles or scalar elements, then dispatches to the appropriate transfer strategy.

Deferred

Sharded row-major layouts (RowContiguous) require per-row block transfers with stride handling ([ttl] Implement sharded layout support for emitTileLoop #118)

How to Test?

ninja -C build check-ttlang

Checklist:

Self-reviewed (style, logic)
Added tests (documents optimization levels, exercises tile and block paths)
PR is small and focused (one task)

Analyzes tensor layouts to select optimal DMA strategies: - FullyContiguous (row-major + interleaved): single noc_async_read/write - RowContiguous (row-major + sharded): per-row transfers (TODO: #118) - TileContiguous (tiled): per-tile noc_async_read_tile (current default) New files: - LayoutUtils.h/cpp: ContiguityLevel enum, analyzeLayoutContiguity() - block_transfers.mlir: tests for tiled and row-major layouts The lowering inspects TTNNLayoutAttr to determine if tensors use tiles or scalar elements, then dispatches to the appropriate transfer strategy.

brnorris03 force-pushed the bnorris/block-transfers branch 2 times, most recently from 6abc94d to 75da024 Compare December 25, 2025 05:10

brnorris03 force-pushed the bnorris/ttl-dm-kernel-lowering-fuse-sibling-loops branch from 7be41b3 to 5429a7c Compare December 25, 2025 06:21

brnorris03 added 4 commits December 24, 2025 22:23

add CHECK-NOT for duplicate barriers

5e5cbd7

add more CHECK-NOT lines, fix formatting in lit test

69adf80

use subgroup.size() when iterating over subgroup

1ad773a

brnorris03 force-pushed the bnorris/block-transfers branch from 75da024 to 1ad773a Compare December 25, 2025 06:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TTL] Add block transfer optimization framework with layout analysis #166

[TTL] Add block transfer optimization framework with layout analysis #166

Uh oh!

brnorris03 commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[TTL] Add block transfer optimization framework with layout analysis #166

Are you sure you want to change the base?

[TTL] Add block transfer optimization framework with layout analysis #166

Uh oh!

Conversation

brnorris03 commented Dec 25, 2025

What?

Why?

How?

Deferred

How to Test?

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants