Skip to content

Is the NCCL communication performance normal when running deepseek-r1 on the H200 #3559

Closed Answered by hcyz33
hcyz33 asked this question in Q&A
Discussion options

You must be logged in to vote

It seems that the reason is dp attention. I enable dp attention. In MLA, the dp process who has no req, almost doesn't consume any time and quickly enters the allgather operator. then wait for other process to finish MLA. This is why the allgather operator seems to take so long.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by hcyz33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant