Skip to content

Conversation

@Aya-ZIbra
Copy link
Contributor

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2159

This diff adds swizzling to the tile scheduling in the CUTLASS Blackwell(BWD) kernel. The changes include adding L2 cache swizzle parameters, calculating them on the host, and using them to swizzle the tile scheduling. The swizzling is designed to improve cache utilization and reduce memory accesses.

Differential Revision: D87572384

Summary:
X-link: facebookresearch/FBGEMM#2158


as title

Reviewed By: sryap

Differential Revision: D86386070
Summary:

X-link: facebookresearch/FBGEMM#2073

For causal and local causal mask, we better use the order below for deterministic.  

```
qi\ki  ki=0  ki=1  ki=2  ki=3  ki=4  ki=5  ki=6  ki=7  ki=8
-----------------------------------------------------------
qi=0   1     0     -     -     -     -     -     -     -
qi=1   2     1     0     -     -     -     -     -     -
qi=2   -     2     1     0     -     -     -     -     -
qi=3   -     -     2     1     0     -     -     -     -
qi=4   -     -     -     2     1     0     -     -     -
qi=5   -     -     -     -     2     1     0     -     -
qi=6   -     -     -     -     -     2     1     0     -
qi=7   -     -     -     -     -     -     2     1     0
```

Reviewed By: v0i0

Differential Revision: D85308820
Summary:
X-link: facebookresearch/FBGEMM#2159

This diff adds swizzling to the tile scheduling in the CUTLASS Blackwell(BWD) kernel. The changes include adding L2 cache swizzle parameters, calculating them on the host, and using them to swizzle the tile scheduling. The swizzling is designed to improve cache utilization and reduce memory accesses.

Differential Revision: D87572384
@meta-cla meta-cla bot added the cla signed label Nov 20, 2025
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 20, 2025

@Aya-ZIbra has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87572384.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant