-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lookahead decoding and multimodal input support #3137
Comments
@lfr-0531 may provide some quick comment on this issue. June |
Thank you for the fast response. Do you know if its a bug or an inherent limitation of current implementation? |
Currently, lookahead decoding cannot support multimodal cases. Can you share your cmd? We can try to have a fix. |
@lfr-0531 Thank you for the reply. Currently I am testing llama 3.2 1B with the following command
Compilation always works but when the engine is processing and i submit batch_size > 1 it crashes with the following error
But for batch size == 1 always works. Maybe shapes are not adjusted correctly for the case batch_size > 1? |
I can reproduce this issue. It is because tensorrt-llm cannot support PromptTuning/multimodal + Lookahead decoding now. When setting For the batch size = 1 case, the |
So we just need to broadcast the second dimension when batch_size > 1 ? I can PR |
Yes, we need to expand the You are welcome to contribute the code to TensorRT-LLM directly. |
Hi,
I get the following error when:
Model is Llama 8B.
Does the max_multimodal_len or the lookahead decoding parameters need to match a specific shape in this case?
The text was updated successfully, but these errors were encountered: