RTX 5090 + FP4 + Open WebUI via TensorRT-LLM (because VLLM made me cry at 2am) #8334
rdumasia303
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
So… after a late-night slap fight with VLLM on Blackwell and FP4, I did the unthinkable: I got GPT5 to read the docs and tried this - NVIDIA’s own TensorRT-LLM. Turns out the fix was hiding in plain sight (right next to my empty coffee mug).
Repo: https://github.com/rdumasia303/tensorrt-llm_with_open-webui
Why you might care
I haven't got multimodal models working, but
Works, and it's fast - so that's me done for tonight.
Apologies if this has been done before - but all I could find were folks saying 'Can it be done?' So I made it.
Beta Was this translation helpful? Give feedback.
All reactions