-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When will Gemma 3 be supported? #3143
Comments
Hi bebill, We haven't finalized the plan to support Gemma 3 yet. And if you have interest, you are welcome to contribute this model support to TensorRT-LLM and we can provide needed support and consulting. June |
I'm just an AI application developer. Does adapting to Gemma3 require a strong and professional AI development background? If not, could you give me some guidance? |
Hi, I would recommend you use the PyTorch workflow to add Gemma 3 model support which can be less steep for AI application developers. You can follow this guide: and this example code(LLaMA) to add Gemma3 For any specific questions when you hit with adding Gemma 3, pls let us know. Thanks |
@juney-nvidia If this method you mentioned is used, is it necessary to convert to the native TensorRT format before inference? If conversion is not required, can the performance match that of the native TensorRT format? |
For the PyTorch workflow, you don't need to convert the PyTorch model to TensorRT format, rather you need to follow this step-by-step guide to add your new model, including adding your model definition based on TensorRT-LLM PyTorch modeling API, implementing the weight loading logics. As to performance, based on our internal benchmark, on key models such as LLaMA/Mistral/Mixtral, PyTorch workflow performance is on-par(or even faster) than the TensorRT workflow, due to that the customized performant kernels are reused in both TensorRT workflow(as TensorRT plugins) and PyTorch workflow(as torch custom op), and also the high performant C++ runtime building blocks(such as Batch Scheduler, KV Cache Manager, Dis-agg serving related logics) are also reused in both TensorRT and PyTorch workflow. Also due to the flexibility of PyTorch, more optimizations can be quickly added to further push performance boundary. For the latest announced world-class DeepSeek R1 performance number on Blackwell, they are all measured with the PyTorch workflow and we only support DeepSeek R1 in the PyTorch workflow for now. Pls let me know if there is any further question. Thanks |
Thank you for your guidance. I'll go and give it a try. |
Thanks, looking forward to your contribution MR :) June |
No description provided.
The text was updated successfully, but these errors were encountered: