Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flashinfer shrink vs cutlass #25

Closed
YLGH opened this issue Nov 30, 2023 · 6 comments
Closed

flashinfer shrink vs cutlass #25

YLGH opened this issue Nov 30, 2023 · 6 comments

Comments

@YLGH
Copy link

YLGH commented Nov 30, 2023

Hi, I really enjoyed learning about SGMV.

I was grokking through the code and wanted to check my understanding. It seems that there are two implementations of SGMV, one based on Grouped GEMM cutlass and another hand written one (using some utils from flashinfer). Just wondering, what is the performance benchmark between the two?

@abcdabcd987
Copy link
Contributor

Thanks for taking a close look!

We'll deprecate the cutlass implementation in the future. See discussions here: #2

@YLGH
Copy link
Author

YLGH commented Nov 30, 2023

Makes sense, thanks!

So it seems like the recommendation would be to use the hand written version for shrink:

https://github.com/punica-ai/punica/blob/master/csrc/sgmv_flashinfer/sgmv_flashinfer.cuh

and in the meantime use the cutlass based version for expand
https://github.com/punica-ai/punica/blob/master/csrc/sgmv/sgmv_cutlass.cuh#L81C3-L81C3
?

@abcdabcd987
Copy link
Contributor

abcdabcd987 commented Nov 30, 2023

Correct. Once we got time to push out custom expand, we'll deprecate cutlass. You can use punica.add_lora_sgmv_custom_cutlass() for LoRA for now.

Related: #11

@YLGH
Copy link
Author

YLGH commented Nov 30, 2023

Sounds great, thanks!

@YLGH YLGH closed this as completed Nov 30, 2023
@jcao-ai
Copy link

jcao-ai commented Dec 1, 2023

@abcdabcd987 Can't wait to the customized version. So far we use the current version in production and performance seems good for multi-lora deployment.

@abcdabcd987
Copy link
Contributor

@jcao-ai Glad that Punica got deployed and serves your usage :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants