Skip to content

Conversation

@xinyu-intel
Copy link
Contributor

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the data parallel (DP) dispatching logic for mixture of experts (MoE) models by moving the tensor dispatch operation from the HPU communicator into the FusedMoEMethod implementations. The key change is that instead of dispatching router_logits, the code now dispatches the already-computed topk_ids and topk_weights tensors directly in the FusedMoE forward pass.

Key changes:

  • Modified HPUDPMetadata to store topk_ids_across_dp and topk_weights_across_dp instead of router_logits_across_dp
  • Added a new dispatch_tensor utility function to handle all-gather operations across DP ranks
  • Updated FusedMoE implementations to perform tensor dispatching inline rather than in the communicator

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
vllm_gaudi/v1/worker/hpu_dp_utils.py Refactored metadata to store topk tensors instead of router logits; added dispatch_tensor utility function
vllm_gaudi/ops/hpu_fused_moe.py Integrated dispatch logic for topk_ids and topk_weights directly in forward pass
vllm_gaudi/ops/hpu_fp8.py Integrated dispatch logic for topk_ids and topk_weights in FP8 variant
vllm_gaudi/distributed/device_communicators/hpu_communicator.py Removed dispatch implementation as it's now handled in FusedMoE methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"No expert found in the model config. Please check the model config."
num_experts_per_tok = 0
num_experts_per_tok = getattr(vllm_config.model_config.hf_text_config, "num_experts_per_tok", 0)
assert num_experts_per_tok > 0, ("No expert found in the model config. Please check the model config.")
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message 'No expert found in the model config' is misleading since the assertion checks num_experts_per_tok (number of experts per token), not the existence of experts. Update the message to reflect what is actually being validated, e.g., 'num_experts_per_tok must be greater than 0 in model config'.

Suggested change
assert num_experts_per_tok > 0, ("No expert found in the model config. Please check the model config.")
assert num_experts_per_tok > 0, ("num_experts_per_tok must be greater than 0 in model config. Please check the model config.")

Copilot uses AI. Check for mistakes.
break
assert num_experts > 0, \
"No expert found in the model config. Please check the model config."
num_experts_per_tok = 0
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 32 initializes num_experts_per_tok to 0, which is immediately overwritten on line 33. Remove the redundant initialization on line 32.

Suggested change
num_experts_per_tok = 0

Copilot uses AI. Check for mistakes.
topk_weights = topk_weights.to(x.dtype)

topk_ids = topk_ids.to(torch.int64)
topk_weights = topk_weights.to(x.dtype)
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These type conversions are performed unconditionally but may be redundant if the tensors are already in the correct dtype. Consider adding guards to skip the conversion when unnecessary, e.g., if topk_ids.dtype != torch.int64: topk_ids = topk_ids.to(torch.int64).

Suggested change
topk_weights = topk_weights.to(x.dtype)
if topk_weights.dtype != x.dtype:
topk_weights = topk_weights.to(x.dtype)

Copilot uses AI. Check for mistakes.
topk_weights = topk_weights.to(x.dtype)

topk_ids = topk_ids.to(torch.int64)
topk_weights = topk_weights.to(x.dtype)
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These type conversions are performed unconditionally but may be redundant if the tensors are already in the correct dtype. Consider adding guards to skip the conversion when unnecessary, e.g., if topk_ids.dtype != torch.int64: topk_ids = topk_ids.to(torch.int64).

Suggested change
topk_weights = topk_weights.to(x.dtype)
if topk_weights.dtype != x.dtype:
topk_weights = topk_weights.to(x.dtype)

Copilot uses AI. Check for mistakes.
@xinyu-intel xinyu-intel force-pushed the dev/xinyu/hpu-dispatch-tensor branch from ad8b9cc to 1aeb69b Compare December 5, 2025 06:21
@github-actions
Copy link

github-actions bot commented Dec 5, 2025

✅ CI Passed

All checks passed successfully against the following vllm commit:
1b7c7f5159484063af28cb47809d79e83d3301ec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant