Skip to content

Conversation

@brnorris03
Copy link
Contributor

Description

This pull request introduces a fused elementwise operation example for TTNN, showing how to efficiently perform chained elementwise computations (such as Output = exp(A + B) + C) without storing intermediate results in memory. The implementation maximizes DST register utilization, processes tiles in blocks for better memory access patterns, and reduces memory bandwidth by eliminating intermediate reads/writes. The changes include a detailed README, a compute kernel for the fused operation, and a ternary reader kernel for block-based input loading.

  • Implemented kernels/compute/fused_elementwise.cpp: a block-based compute kernel that fuses three operations (A + B, unary op (exp or relu), then + C) in DST registers, processing up to 4 tiles per DST cycle and handling remainder tiles for arbitrary input sizes.

  • Added kernels/dataflow/reader_ternary.cpp: a reader kernel that loads tiles from three input tensors in blocks, reserves buffer space efficiently, and synchronizes reads with a single barrier for improved performance.

  • The kernels avoid intermediate buffer storage, use block-based synchronization, and maximize hardware register usage for reduced DRAM accesses and improved throughput.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Build/CI changes

Related Issues

Testing

Test Configuration

  • OS:
  • Python version:
  • Compiler:

Tests Added/Modified

  • Unit tests added
  • Integration tests added
  • Existing tests updated
  • All tests pass locally

MLIR Changes (if applicable)

  • Verified with TTLANG_VERBOSE_PASSES=1
  • Checked initial and final MLIR outputs
  • No unexpected IR transformations

Checklist

  • Code follows the project's style guidelines
  • Pre-commit hooks pass (pre-commit run --all-files)
  • Self-review completed
  • Comments added for complex code
  • Documentation updated (if needed)
  • No new compiler warnings
  • Breaking changes documented (if any)
  • CHANGELOG.md updated (if needed)

Additional Notes

@brnorris03 brnorris03 force-pushed the bnorris/ttnn-elementwise-example branch from 91f7a49 to 4cd07ef Compare December 8, 2025 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants