Skip to content

Conversation

@mandy-li
Copy link
Contributor

@mandy-li mandy-li commented Dec 4, 2025

This PR enables INC dynamic quantization for MoE models by adding dequant channel-wise weight to MoE OP

@github-actions
Copy link

github-actions bot commented Dec 4, 2025

✅ CI Passed

All checks passed successfully against the following vllm commit:
1b7c7f5159484063af28cb47809d79e83d3301ec

Comment on lines +901 to +904
if weight_shape[0] / scale_shape[0] == self.block_size[0] and \
weight_shape[1] / scale_shape[1] == self.block_size[1]: # block-wise
return dequant_block_fp8_weight_naive(
self.weight,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check feels a bit fragile. Could we infer the block-wise scale by inspecting the quant_method instead?https://github.com/vllm-project/vllm/blob/408cf42f67dbcd50027fcd0f6ba35df83ced9107/vllm/model_executor/layers/fused_moe/layer.py#L1326 cc @xuechendi

Please add UTs and e2e test as well, thx!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this copy from vllm-fork? If so, please paste original link

Copy link
Contributor Author

@mandy-li mandy-li Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, partially from vllm-hpu-extension . For the channel-wise part to work, we need a bug fix in inc-fork. Please confirm if vllm-gaudi CI uses latest commit from inc-fork or not

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, CI only uses public habana docker.

@xuechendi xuechendi self-assigned this Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants