How to build TensorRT-LLM engine on host and deploy to Jetson Orin Nano Super?

Hi, I’m currently working with TensorRT-LLM and trying to deploy a model (e.g., Qwen2-VL-2B-Instruct) on a Jetson Orin Nano Super. However, due to limited memory on the Nano, I’m unable to build the TensorRT engine directly on the device.

Is there any official or recommended approach to build the TensorRT-LLM engine on a more powerful host machine (with sufficient memory and GPU), and then transfer the generated engine file to the Jetson Orin Nano Super for inference?

If so, are there any considerations or compatibility issues I should be aware of when cross-building the engine on x86 and deploying it on Jetson (aarch64)?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to build TensorRT-LLM engine on host and deploy to Jetson Orin Nano Super? #3149

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to build TensorRT-LLM engine on host and deploy to Jetson Orin Nano Super? #3149

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions