Skip to content

How to reproduce 150 TPS using FP8 + MTP=0 + BSZ=1 on H200? #3108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ghostplant opened this issue Mar 26, 2025 · 2 comments
Open

How to reproduce 150 TPS using FP8 + MTP=0 + BSZ=1 on H200? #3108

ghostplant opened this issue Mar 26, 2025 · 2 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@ghostplant
Copy link

ghostplant commented Mar 26, 2025

So glad to see a great update of TRT-LLM which largely improves H200x8 to 150 TPS for R1. But what I get locally is just 7 TPS. What's the correct command to enjoy 150 TPS?

@ghostplant ghostplant changed the title What's the TPS for R1 on H200 respectively for Prefill & Decode? How to reproduce 150 TPS using FP8 + MTP=0 + BSZ=1 on H200? Mar 26, 2025
@juney-nvidia
Copy link
Collaborator

@jiahanc Hi Cyrus, I think you are the right person to answer this question? :)

cc @NVGaryJi for vis also.

@juney-nvidia juney-nvidia added the triaged Issue has been triaged by maintainers label Mar 27, 2025
@jiahanc
Copy link
Collaborator

jiahanc commented Apr 2, 2025

Hi @ghostplant ,
The 150 TPS is with MTP = 3. We have a PR to document the reproduction steps on both Hopper and Blackwell: #3232

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants