Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing L1 base address increment instructions with CFGSHIFTMASK #17723

Merged
merged 4 commits into from
Feb 19, 2025

Conversation

atatuzunerTT
Copy link
Contributor

@atatuzunerTT atatuzunerTT commented Feb 7, 2025

Ticket

Link to Github Issue

Problem description

Blackhole has new CFGSHIFTMASK that can update addresses for the unpacker instructions inside the mop/replay buffers. If an operation is unpacker bound, then using this instruction should increase performance.

What's changed

Replaced L1 base address increment code that uses cfg read/write and tdma gpr operations with the new CFGSHIFTMASK instruction in the unpack AB matmul llk api. This replacement saves 6 instructions in the mop replay buffer. No notable performance improvements.

Only affects BH and addresses an issue in BH third party repo.

Checklist

@atatuzunerTT atatuzunerTT force-pushed the atuzuner/CFGSHIFTMASK branch 2 times, most recently from 24313ee to 3bc159f Compare February 18, 2025 20:50
@atatuzunerTT atatuzunerTT marked this pull request as ready for review February 19, 2025 14:27
@atatuzunerTT atatuzunerTT merged commit b26e037 into main Feb 19, 2025
12 checks passed
@atatuzunerTT atatuzunerTT deleted the atuzuner/CFGSHIFTMASK branch February 19, 2025 14:52
dgomezTT pushed a commit that referenced this pull request Feb 19, 2025
…17723)

### Ticket
[Link to Github
Issue](tenstorrent/tt-llk-bh#4)

### Problem description
Blackhole has new `CFGSHIFTMASK` that can update addresses for the
unpacker instructions inside the mop/replay buffers. If an operation is
unpacker bound, then using this instruction should increase performance.

### What's changed
Replaced L1 base address increment code that uses cfg read/write and
tdma gpr operations with the new `CFGSHIFTMASK` instruction in the
unpack AB matmul llk api. This replacement saves 6 instructions in the
mop replay buffer. No notable performance improvements.

Only affects BH and addresses an issue in BH third party repo.

### Checklist
- [x] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml)
[CI
passes](https://github.com/tenstorrent/tt-metal/actions/runs/13399863311)
- [x] [Blackhole Post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml)
[CI
passes](https://github.com/tenstorrent/tt-metal/actions/runs/13399865409)
(if applicable)
- [ ] [Model
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
CI passes (if applicable)
- [ ] [Device performance
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
CI passes (if applicable)
- [ ] **(For models and ops writers)** Full [new models
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
CI passes (if applicable)
- [ ] New/Existing tests provide coverage for changes
hschoi4448 pushed a commit that referenced this pull request Feb 20, 2025
…17723)

### Ticket
[Link to Github
Issue](tenstorrent/tt-llk-bh#4)

### Problem description
Blackhole has new `CFGSHIFTMASK` that can update addresses for the
unpacker instructions inside the mop/replay buffers. If an operation is
unpacker bound, then using this instruction should increase performance.

### What's changed
Replaced L1 base address increment code that uses cfg read/write and
tdma gpr operations with the new `CFGSHIFTMASK` instruction in the
unpack AB matmul llk api. This replacement saves 6 instructions in the
mop replay buffer. No notable performance improvements.

Only affects BH and addresses an issue in BH third party repo.

### Checklist
- [x] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml)
[CI
passes](https://github.com/tenstorrent/tt-metal/actions/runs/13399863311)
- [x] [Blackhole Post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml)
[CI
passes](https://github.com/tenstorrent/tt-metal/actions/runs/13399865409)
(if applicable)
- [ ] [Model
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
CI passes (if applicable)
- [ ] [Device performance
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
CI passes (if applicable)
- [ ] **(For models and ops writers)** Full [new models
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
CI passes (if applicable)
- [ ] New/Existing tests provide coverage for changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants