-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] how to use groupwise scaling along M for FP8 gemm to impelement per-token-per-128-channel and blockwise? #2087
Comments
@yizhang2077 I am also working on this using similar codes, which can be compiled successfully, but produce incorrect results. Have you solved this problem? |
Vllm's version works. |
@xuzhenqi Still not,if you have any progress, please let me know, thank you very much! |
The stride of |
PR opened: #2095 |
What is your question?
Hi, I try to use
KernelTmaWarpSpecializedCooperativeFP8BlockScaledAccum
to implement deepseekv3 block-wise FP8 as well as per-token-per-128-channel, but I find it does not work. While when I just replace thesm90_mma_tma_gmma_ss_warpspecialized_fp8_blockwise_scaling.hpp
with the same file in vllm it can work correctly.I think this commit is critical
I am not familar with cutlass, can someone help me to figure out what the problem is? Thank you very much!
Base code
The text was updated successfully, but these errors were encountered: