Enable inc dynamic quant for MoE models #688

mandy-li · 2025-12-04T18:59:21Z

This PR enables INC dynamic quantization for MoE models by adding dequant channel-wise weight to MoE OP

Signed-off-by: mandy-li <[email protected]>

github-actions · 2025-12-04T20:35:32Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
1b7c7f5159484063af28cb47809d79e83d3301ec

yiliu30 · 2025-12-08T11:27:17Z

vllm_gaudi/extension/ops.py

+        if weight_shape[0] / scale_shape[0] == self.block_size[0] and \
+           weight_shape[1] / scale_shape[1] == self.block_size[1]:   # block-wise
+            return dequant_block_fp8_weight_naive(
+                self.weight,


This check feels a bit fragile. Could we infer the block-wise scale by inspecting the quant_method instead?https://github.com/vllm-project/vllm/blob/408cf42f67dbcd50027fcd0f6ba35df83ced9107/vllm/model_executor/layers/fused_moe/layer.py#L1326 cc @xuechendi

Please add UTs and e2e test as well, thx!

Is this copy from vllm-fork? If so, please paste original link

No, partially from vllm-hpu-extension . For the channel-wise part to work, we need a bug fix in inc-fork. Please confirm if vllm-gaudi CI uses latest commit from inc-fork or not

No, CI only uses public habana docker.

Enable inc dynamic quant for MoE models

0bb3cc5

Signed-off-by: mandy-li <[email protected]>

mandy-li requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners December 4, 2025 18:59

yiliu30 reviewed Dec 8, 2025

View reviewed changes

xuechendi self-assigned this Dec 8, 2025

github-actions bot mentioned this pull request Dec 8, 2025

🚦 Team Review Dashboard #701

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable inc dynamic quant for MoE models #688

Enable inc dynamic quant for MoE models #688

Uh oh!

mandy-li commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

yiliu30 Dec 8, 2025

Uh oh!

xuechendi Dec 8, 2025

Uh oh!

mandy-li Dec 9, 2025 •

edited

Loading

Uh oh!

xuechendi Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Enable inc dynamic quant for MoE models #688

Are you sure you want to change the base?

Enable inc dynamic quant for MoE models #688

Uh oh!

Conversation

mandy-li commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 4, 2025

✅ CI Passed

Uh oh!

yiliu30 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

xuechendi Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

mandy-li Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuechendi Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mandy-li Dec 9, 2025 •

edited

Loading