When will Gemma 3 be supported? #3143

bebilli · 2025-03-29T03:08:17Z

No description provided.

juney-nvidia · 2025-03-29T07:47:35Z

Hi bebill,

We haven't finalized the plan to support Gemma 3 yet. And if you have interest, you are welcome to contribute this model support to TensorRT-LLM and we can provide needed support and consulting.

June

bebilli · 2025-03-29T19:20:40Z

I'm just an AI application developer. Does adapting to Gemma3 require a strong and professional AI development background? If not, could you give me some guidance?

juney-nvidia · 2025-03-30T00:33:03Z

@bebilli

Hi,

I would recommend you use the PyTorch workflow to add Gemma 3 model support which can be less steep for AI application developers. You can follow this guide:

https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/torch/adding_new_model.md

and this example code(LLaMA) to add Gemma3

https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/models/modeling_llama.py

For any specific questions when you hit with adding Gemma 3, pls let us know.

Thanks
June

bebilli · 2025-03-30T00:45:11Z

@juney-nvidia If this method you mentioned is used, is it necessary to convert to the native TensorRT format before inference? If conversion is not required, can the performance match that of the native TensorRT format?

juney-nvidia · 2025-03-30T01:16:20Z

@juney-nvidia If this method you mentioned is used, is it necessary to convert to the native TensorRT format before inference? If conversion is not required, can the performance match that of the native TensorRT format?

For the PyTorch workflow, you don't need to convert the PyTorch model to TensorRT format, rather you need to follow this step-by-step guide to add your new model, including adding your model definition based on TensorRT-LLM PyTorch modeling API, implementing the weight loading logics.

As to performance, based on our internal benchmark, on key models such as LLaMA/Mistral/Mixtral, PyTorch workflow performance is on-par(or even faster) than the TensorRT workflow, due to that the customized performant kernels are reused in both TensorRT workflow(as TensorRT plugins) and PyTorch workflow(as torch custom op), and also the high performant C++ runtime building blocks(such as Batch Scheduler, KV Cache Manager, Dis-agg serving related logics) are also reused in both TensorRT and PyTorch workflow.

Also due to the flexibility of PyTorch, more optimizations can be quickly added to further push performance boundary.

For the latest announced world-class DeepSeek R1 performance number on Blackwell, they are all measured with the PyTorch workflow and we only support DeepSeek R1 in the PyTorch workflow for now.

Pls let me know if there is any further question.

Thanks
June

bebilli · 2025-03-30T02:59:01Z

Thank you for your guidance. I'll go and give it a try.

juney-nvidia · 2025-03-30T13:06:04Z

Thank you for your guidance. I'll go and give it a try.

Thanks, looking forward to your contribution MR :)

June

juney-nvidia self-assigned this Mar 29, 2025

juney-nvidia added triaged Issue has been triaged by maintainers feature request New feature or request labels Mar 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When will Gemma 3 be supported? #3143

When will Gemma 3 be supported? #3143

bebilli commented Mar 29, 2025

juney-nvidia commented Mar 29, 2025

bebilli commented Mar 29, 2025

juney-nvidia commented Mar 30, 2025

bebilli commented Mar 30, 2025

juney-nvidia commented Mar 30, 2025 •

edited

Loading

bebilli commented Mar 30, 2025

juney-nvidia commented Mar 30, 2025

When will Gemma 3 be supported? #3143

When will Gemma 3 be supported? #3143

Comments

bebilli commented Mar 29, 2025

juney-nvidia commented Mar 29, 2025

bebilli commented Mar 29, 2025

juney-nvidia commented Mar 30, 2025

bebilli commented Mar 30, 2025

juney-nvidia commented Mar 30, 2025 • edited Loading

bebilli commented Mar 30, 2025

juney-nvidia commented Mar 30, 2025

juney-nvidia commented Mar 30, 2025 •

edited

Loading