Xuncai/all gather fixes #17155

caixunshiren · 2025-01-27T18:05:52Z

This PR fixes two issues exposed when integrating ccl async to TG llama:

…ze devices.

SeanNijjar

Do you know if there are any tests you can add along with the PR? Is there one from model team that is usable?

caixunshiren · 2025-01-30T20:47:28Z

Do you know if there are any tests you can add along with the PR? Is there one from model team that is usable?

I think it would be better if I add the test along with my CCL minimal pr? So that these specific CCL shapes can be added.

### Ticket This PR fixes two issues exposed when integrating ccl async to TG llama: - Program caching does not handle input tensor specs properly - All gather does not handle subdevice properly ### Checklist - [x] Post commit CI passes: https://github.com/tenstorrent/tt-metal/actions/runs/13059537328 - [x] TG Nightly: https://github.com/tenstorrent/tt-metal/actions/runs/13059546044 - [x] TG unit frequent: https://github.com/tenstorrent/tt-metal/actions/runs/13059554924 - [x] T3K: https://github.com/tenstorrent/tt-metal/actions/runs/13059564134 --------- Co-authored-by: avoraTT <[email protected]>

caixunshiren self-assigned this Jan 27, 2025

caixunshiren requested review from SeanNijjar, jvegaTT and tt-aho as code owners January 27, 2025 18:05

caixunshiren added P1 op_cat: ccl labels Jan 27, 2025

caixunshiren and others added 2 commits January 30, 2025 14:04

added proper program caching in all gather and reduce scatter

433f602

Adding subdevice support in all gather. doesn't fix hang on synchroni…

e5564d4

…ze devices.

caixunshiren force-pushed the xuncai/all-gather-fixes branch from dfc4c8b to e5564d4 Compare January 30, 2025 19:04

SeanNijjar approved these changes Jan 30, 2025

View reviewed changes

caixunshiren merged commit a53f8dd into main Jan 31, 2025
205 of 245 checks passed

caixunshiren deleted the xuncai/all-gather-fixes branch January 31, 2025 15:26

Provide feedback