DP: dispatch tensor in FusedMoEMethod #680

xinyu-intel · 2025-12-04T05:48:33Z

No description provided.

Copilot

Pull request overview

This PR refactors the data parallel (DP) dispatching logic for mixture of experts (MoE) models by moving the tensor dispatch operation from the HPU communicator into the FusedMoEMethod implementations. The key change is that instead of dispatching router_logits, the code now dispatches the already-computed topk_ids and topk_weights tensors directly in the FusedMoE forward pass.

Key changes:

Modified HPUDPMetadata to store topk_ids_across_dp and topk_weights_across_dp instead of router_logits_across_dp
Added a new dispatch_tensor utility function to handle all-gather operations across DP ranks
Updated FusedMoE implementations to perform tensor dispatching inline rather than in the communicator

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
vllm_gaudi/v1/worker/hpu_dp_utils.py	Refactored metadata to store topk tensors instead of router logits; added `dispatch_tensor` utility function
vllm_gaudi/ops/hpu_fused_moe.py	Integrated dispatch logic for topk_ids and topk_weights directly in forward pass
vllm_gaudi/ops/hpu_fp8.py	Integrated dispatch logic for topk_ids and topk_weights in FP8 variant
vllm_gaudi/distributed/device_communicators/hpu_communicator.py	Removed dispatch implementation as it's now handled in FusedMoE methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-04T05:49:23Z

vllm_gaudi/v1/worker/hpu_dp_utils.py

-            "No expert found in the model config. Please check the model config."
+        num_experts_per_tok = 0
+        num_experts_per_tok = getattr(vllm_config.model_config.hf_text_config, "num_experts_per_tok", 0)
+        assert num_experts_per_tok > 0, ("No expert found in the model config. Please check the model config.")


The error message 'No expert found in the model config' is misleading since the assertion checks num_experts_per_tok (number of experts per token), not the existence of experts. Update the message to reflect what is actually being validated, e.g., 'num_experts_per_tok must be greater than 0 in model config'.

Suggested change

assert num_experts_per_tok > 0, ("No expert found in the model config. Please check the model config.")

assert num_experts_per_tok > 0, ("num_experts_per_tok must be greater than 0 in model config. Please check the model config.")

Copilot · 2025-12-04T05:49:23Z

vllm_gaudi/v1/worker/hpu_dp_utils.py

-                break
-        assert num_experts > 0, \
-            "No expert found in the model config. Please check the model config."
+        num_experts_per_tok = 0


Line 32 initializes num_experts_per_tok to 0, which is immediately overwritten on line 33. Remove the redundant initialization on line 32.

Suggested change

num_experts_per_tok = 0

Copilot · 2025-12-04T05:49:24Z

vllm_gaudi/ops/hpu_fused_moe.py

            topk_weights = topk_weights.to(x.dtype)
+
+        topk_ids = topk_ids.to(torch.int64)
+        topk_weights = topk_weights.to(x.dtype)


These type conversions are performed unconditionally but may be redundant if the tensors are already in the correct dtype. Consider adding guards to skip the conversion when unnecessary, e.g., if topk_ids.dtype != torch.int64: topk_ids = topk_ids.to(torch.int64).

Suggested change

topk_weights = topk_weights.to(x.dtype)

if topk_weights.dtype != x.dtype:

topk_weights = topk_weights.to(x.dtype)

Copilot · 2025-12-04T05:49:24Z

vllm_gaudi/ops/hpu_fp8.py

            topk_weights = topk_weights.to(x.dtype)
+
+        topk_ids = topk_ids.to(torch.int64)
+        topk_weights = topk_weights.to(x.dtype)


These type conversions are performed unconditionally but may be redundant if the tensors are already in the correct dtype. Consider adding guards to skip the conversion when unnecessary, e.g., if topk_ids.dtype != torch.int64: topk_ids = topk_ids.to(torch.int64).

Suggested change

topk_weights = topk_weights.to(x.dtype)

if topk_weights.dtype != x.dtype:

topk_weights = topk_weights.to(x.dtype)

Signed-off-by: Xinyu Chen <[email protected]>

github-actions · 2025-12-05T09:02:45Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
1b7c7f5159484063af28cb47809d79e83d3301ec

Copilot AI review requested due to automatic review settings December 4, 2025 05:48

xinyu-intel requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners December 4, 2025 05:48

Copilot AI reviewed Dec 4, 2025

View reviewed changes

xinyu-intel mentioned this pull request Dec 4, 2025

DP: dispatch fp8 hidden_states in INC #684

Draft

DP: dispatch tensor in FusedMoEMethod

1aeb69b

Signed-off-by: Xinyu Chen <[email protected]>

xinyu-intel force-pushed the dev/xinyu/hpu-dispatch-tensor branch from ad8b9cc to 1aeb69b Compare December 5, 2025 06:21

github-actions bot mentioned this pull request Dec 8, 2025

🚦 Team Review Dashboard #701

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DP: dispatch tensor in FusedMoEMethod #680

DP: dispatch tensor in FusedMoEMethod #680

Uh oh!

xinyu-intel commented Dec 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

Copilot AI Dec 4, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	assert num_experts_per_tok > 0, ("No expert found in the model config. Please check the model config.")
	assert num_experts_per_tok > 0, ("num_experts_per_tok must be greater than 0 in model config. Please check the model config.")

	topk_weights = topk_weights.to(x.dtype)
	if topk_weights.dtype != x.dtype:
	topk_weights = topk_weights.to(x.dtype)

DP: dispatch tensor in FusedMoEMethod #680

Are you sure you want to change the base?

DP: dispatch tensor in FusedMoEMethod #680

Uh oh!

Conversation

xinyu-intel commented Dec 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 5, 2025

✅ CI Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant