-
Notifications
You must be signed in to change notification settings - Fork 78
DP: dispatch fp8 hidden_states in INC #684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
DP: dispatch fp8 hidden_states in INC #684
Conversation
Signed-off-by: Xinyu Chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements FP8 hidden state dispatching in INC (Intel Neural Compressor) for data parallel (DP) execution. The main purpose is to optimize MoE (Mixture of Experts) layer communication by dispatching FP8-quantized hidden states and routing information across DP ranks, rather than full precision tensors.
Key changes:
- Replaces router logits dispatching with topk IDs and weights dispatching for more efficient communication
- Adds FP8 dtype support for hidden states when INC quantization is enabled
- Introduces dispatch functions that are passed to MoE operators for flexible tensor distribution
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| vllm_gaudi/v1/worker/hpu_dp_utils.py | Adds dispatch_tensor and dispatch_hidden_states functions; updates HPUDPMetadata to store topk_ids and topk_weights instead of router_logits; adds FP8 dtype detection for INC quantization |
| vllm_gaudi/ops/hpu_fused_moe.py | Integrates dispatch functions into unquantized MoE processing; dispatches topk_ids and topk_weights when DP is enabled |
| vllm_gaudi/ops/hpu_fp8.py | Integrates dispatch functions into FP8 MoE processing; dispatches topk_ids and topk_weights when DP is enabled |
| vllm_gaudi/extension/ops.py | Adds dispatch_fn parameter to VllmMixtureOfExpertsOp, VllmMixtureOfExpertsOpFP8, and VllmMixtureOfExpertsOpFP8PerChannel constructors with _get_dispatch_func accessor method |
| vllm_gaudi/distributed/device_communicators/hpu_communicator.py | Removes dispatch implementation, delegating to plugin FusedMoEMethod for better performance |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| quant_config = os.getenv("QUANT_CONFIG", None) is not None | ||
| is_quant_with_inc = vllm_config.model_config.quantization == "inc" or quant_config |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name quant_config is ambiguous as it suggests it holds configuration data, but it actually holds a boolean indicating whether the environment variable exists. Consider renaming to has_quant_config or quant_config_enabled for clarity.
| quant_config = os.getenv("QUANT_CONFIG", None) is not None | |
| is_quant_with_inc = vllm_config.model_config.quantization == "inc" or quant_config | |
| quant_config_enabled = os.getenv("QUANT_CONFIG", None) is not None | |
| is_quant_with_inc = vllm_config.model_config.quantization == "inc" or quant_config_enabled |
| fn = self.dispatch_func | ||
| return fn |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intermediate variable fn is unnecessary. The method can directly return self.dispatch_func, simplifying the code and improving readability.
| fn = self.dispatch_func | |
| return fn | |
| return self.dispatch_func |
| fn = self.dispatch_func | ||
| return fn | ||
|
|
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intermediate variable fn is unnecessary. The method can directly return self.dispatch_func, simplifying the code and improving readability.
| fn = self.dispatch_func | |
| return fn | |
| return self.dispatch_func |
| fn = self.dispatch_func | ||
| return fn |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intermediate variable fn is unnecessary. The method can directly return self.dispatch_func, simplifying the code and improving readability.
| fn = self.dispatch_func | |
| return fn | |
| return self.dispatch_func |
Signed-off-by: Xinyu Chen <[email protected]>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
depends on #680