Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Cohere models #248

Open
nyxkrage opened this issue Sep 15, 2024 · 1 comment
Open

Support for Cohere models #248

nyxkrage opened this issue Sep 15, 2024 · 1 comment

Comments

@nyxkrage
Copy link

🚀 The feature, motivation and pitch

I would love to see support for the Cohere models. (https://huggingface.co/CohereForAI/c4ai-command-r-08-2024 & https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024)
As far as I can tell the FusedLinearCrossEntropy kernel should just need to support scaling the logits by the logit_scale from the config, though I'm unsure whether the rest of the rest of the kernels would or would not work as is.

Thanks for the work

Alternatives

No response

Additional context

No response

@nyxkrage
Copy link
Author

nyxkrage commented Sep 15, 2024

Ok, after some experimentation, and editing of the tests, the SwiGLU and LayerNorm kernels pass the correctness tests when compared with the reference ones from the cohere modelling code, however it seems that with Cohere something is different in regards to rope, the tests dont pass, but from the error it seems like its the same values, I assume its something with how Cohere calculates the RoPE in float32 and downcasts after. Seeing the comment on the rotate_half function in the cohere modeling code that was just added, it's seems obvious. Cohere slices by odds and evens rather than splitting in half.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant