-
Notifications
You must be signed in to change notification settings - Fork 12
Init moe support #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Init moe support #16
Conversation
airMeng
commented
Oct 17, 2025
- moe scatter and gather update to the latest main
- cutlass based MoE GEMM
6589915 to
0004dab
Compare
| } | ||
| }; | ||
|
|
||
| void moe_grouped_mm_nt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From BLAS perspective, this one's actually tt. That might confuse some folks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I will align with cutlass side later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I wasn't referring to cutlass convention. I meant that the terms nt, tt etc. are BLAS convention.
cutlass uses terms such as K-major, etc. instead.
If A is [M, K] with strides [K, 1], then BLAS still considers it transposed, and uses the notation t for it.
| using GmemTiledCopyB = XE_2D_U16x16x16_LD_T; | ||
|
|
||
| // Workgroup-level tile | ||
| using TileShape = Shape<_256, _256, _32>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider using a smaller value of Workgroup tile M dimension for decoding. Maybe 8 or 16.