-
Notifications
You must be signed in to change notification settings - Fork 78
DP: dispatch tensor in FusedMoEMethod #680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
DP: dispatch tensor in FusedMoEMethod #680
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the data parallel (DP) dispatching logic for mixture of experts (MoE) models by moving the tensor dispatch operation from the HPU communicator into the FusedMoEMethod implementations. The key change is that instead of dispatching router_logits, the code now dispatches the already-computed topk_ids and topk_weights tensors directly in the FusedMoE forward pass.
Key changes:
- Modified
HPUDPMetadatato storetopk_ids_across_dpandtopk_weights_across_dpinstead ofrouter_logits_across_dp - Added a new
dispatch_tensorutility function to handle all-gather operations across DP ranks - Updated FusedMoE implementations to perform tensor dispatching inline rather than in the communicator
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| vllm_gaudi/v1/worker/hpu_dp_utils.py | Refactored metadata to store topk tensors instead of router logits; added dispatch_tensor utility function |
| vllm_gaudi/ops/hpu_fused_moe.py | Integrated dispatch logic for topk_ids and topk_weights directly in forward pass |
| vllm_gaudi/ops/hpu_fp8.py | Integrated dispatch logic for topk_ids and topk_weights in FP8 variant |
| vllm_gaudi/distributed/device_communicators/hpu_communicator.py | Removed dispatch implementation as it's now handled in FusedMoE methods |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "No expert found in the model config. Please check the model config." | ||
| num_experts_per_tok = 0 | ||
| num_experts_per_tok = getattr(vllm_config.model_config.hf_text_config, "num_experts_per_tok", 0) | ||
| assert num_experts_per_tok > 0, ("No expert found in the model config. Please check the model config.") |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message 'No expert found in the model config' is misleading since the assertion checks num_experts_per_tok (number of experts per token), not the existence of experts. Update the message to reflect what is actually being validated, e.g., 'num_experts_per_tok must be greater than 0 in model config'.
| assert num_experts_per_tok > 0, ("No expert found in the model config. Please check the model config.") | |
| assert num_experts_per_tok > 0, ("num_experts_per_tok must be greater than 0 in model config. Please check the model config.") |
| break | ||
| assert num_experts > 0, \ | ||
| "No expert found in the model config. Please check the model config." | ||
| num_experts_per_tok = 0 |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 32 initializes num_experts_per_tok to 0, which is immediately overwritten on line 33. Remove the redundant initialization on line 32.
| num_experts_per_tok = 0 |
| topk_weights = topk_weights.to(x.dtype) | ||
|
|
||
| topk_ids = topk_ids.to(torch.int64) | ||
| topk_weights = topk_weights.to(x.dtype) |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These type conversions are performed unconditionally but may be redundant if the tensors are already in the correct dtype. Consider adding guards to skip the conversion when unnecessary, e.g., if topk_ids.dtype != torch.int64: topk_ids = topk_ids.to(torch.int64).
| topk_weights = topk_weights.to(x.dtype) | |
| if topk_weights.dtype != x.dtype: | |
| topk_weights = topk_weights.to(x.dtype) |
| topk_weights = topk_weights.to(x.dtype) | ||
|
|
||
| topk_ids = topk_ids.to(torch.int64) | ||
| topk_weights = topk_weights.to(x.dtype) |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These type conversions are performed unconditionally but may be redundant if the tensors are already in the correct dtype. Consider adding guards to skip the conversion when unnecessary, e.g., if topk_ids.dtype != torch.int64: topk_ids = topk_ids.to(torch.int64).
| topk_weights = topk_weights.to(x.dtype) | |
| if topk_weights.dtype != x.dtype: | |
| topk_weights = topk_weights.to(x.dtype) |
Signed-off-by: Xinyu Chen <[email protected]>
ad8b9cc to
1aeb69b
Compare
✅ CI PassedAll checks passed successfully against the following vllm commit: |
No description provided.