Optimize MoE via chunk settings #658

xinyu-intel · 2025-11-28T08:47:13Z

add new feature VLLM_MOE_CHUNK, with this, chunk_size and global_num_experts will be passed to torch.ops.hpu.mixture_of_experts for better performance.

Copilot

Pull request overview

This PR adds optimization support for Mixture of Experts (MoE) operations through configurable chunk settings. The changes introduce a new moe_chunk configuration option and implement dynamic chunk sizing based on token count to improve MoE performance.

Key Changes:

Added global_num_experts parameter to all MoE operator classes to support chunk configuration
Implemented dynamic chunk size selection based on token count (64 for ≤1536 tokens, 256 for 1536-4096 tokens, 512 for >4096 tokens)
Added new moe_chunk feature flag controlled by VLLM_MOE_CHUNK environment variable

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
vllm_gaudi/extension/ops.py	Added `global_num_experts` parameter and `_get_extra_kwargs` method to MoE operator classes; implemented dynamic chunk sizing logic
vllm_gaudi/extension/features.py	Added `moe_chunk` configuration option with environment variable support
vllm_gaudi/ops/hpu_fused_moe.py	Updated VllmMixtureOfExpertsOp instantiation to pass `global_num_experts`
vllm_gaudi/ops/hpu_fp8.py	Updated FP8 MoE operator instantiations to pass `global_num_experts`
vllm_gaudi/ops/hpu_compressed_tensors.py	Updated compressed tensors MoE operator instantiation to pass `global_num_experts`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-28T08:48:27Z

vllm_gaudi/extension/ops.py

+    def _get_extra_kwargs(self, tokens_num: int):
+        if self.enable_moe_chunk:
+            if tokens_num <= 1536:
+                chunk_size = 64
+            elif tokens_num > 1536 and tokens_num <= 4096:
+                chunk_size = 256
+            else:
+                chunk_size = 512
+            kwargs = {
+                "chunk_size": chunk_size,
+                "total_experts": self.global_num_experts,
+            }
+        else:
+            kwargs = {}
+        return kwargs


The _get_extra_kwargs method is duplicated identically across three classes (VllmMixtureOfExpertsOp, VllmMixtureOfExpertsOpFP8, and VllmMixtureOfExpertsOpFP8PerChannel). Consider extracting this logic into a shared base class or mixin to reduce code duplication and improve maintainability.

Copilot · 2025-11-28T08:48:27Z

vllm_gaudi/extension/ops.py

+            if tokens_num <= 1536:
+                chunk_size = 64
+            elif tokens_num > 1536 and tokens_num <= 4096:
+                chunk_size = 256
+            else:
+                chunk_size = 512


The magic numbers for token thresholds (1536, 4096) and chunk sizes (64, 256, 512) should be defined as named constants at the module or class level to improve code maintainability and make it easier to tune these values.

Suggested change

if tokens_num <= 1536:

chunk_size = 64

elif tokens_num > 1536 and tokens_num <= 4096:

chunk_size = 256

else:

chunk_size = 512

if tokens_num <= TOKEN_THRESHOLD_1:

chunk_size = CHUNK_SIZE_SMALL

elif tokens_num > TOKEN_THRESHOLD_1 and tokens_num <= TOKEN_THRESHOLD_2:

chunk_size = CHUNK_SIZE_MEDIUM

else:

chunk_size = CHUNK_SIZE_LARGE

Copilot · 2025-11-28T08:48:28Z

vllm_gaudi/extension/ops.py

+        if self.enable_moe_chunk:
+            if tokens_num <= 1536:
+                chunk_size = 64
+            elif tokens_num > 1536 and tokens_num <= 4096:


The condition tokens_num > 1536 is redundant in the elif clause since the previous if statement already handles tokens_num <= 1536. Simplify to elif tokens_num <= 4096:.

Suggested change

elif tokens_num > 1536 and tokens_num <= 4096:

elif tokens_num <= 4096:

xinyu-intel · 2025-11-28T08:50:03Z

@yiliu30 @ranzhejiang please help review.

Signed-off-by: Xinyu Chen <[email protected]>

github-actions · 2025-11-28T10:53:57Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
0353d2e162cbda776d9dbfe026e65303204a7f1f

yiliu30 · 2025-11-28T14:18:54Z

vllm_gaudi/extension/ops.py

+                chunk_size = 256
+            else:
+                chunk_size = 512
+            kwargs = {


As suggested by Copilot, it would be better to define a name for these magic numbers.
Also please add some UTs.
Others LGTM.

ranzhejiang · 2025-12-01T07:40:12Z

Agree with liuyi‘s comments, LGTM

Copilot AI review requested due to automatic review settings November 28, 2025 08:47

xinyu-intel requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski, vivekgoe and xuechendi as code owners November 28, 2025 08:47

Copilot AI reviewed Nov 28, 2025

View reviewed changes

Optimize MoE via chunk settings

0962aaa

Signed-off-by: Xinyu Chen <[email protected]>

yiliu30 reviewed Nov 28, 2025

View reviewed changes

github-actions bot mentioned this pull request Dec 8, 2025

🚦 Team Review Dashboard #701

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize MoE via chunk settings #658

Optimize MoE via chunk settings #658

Uh oh!

xinyu-intel commented Nov 28, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

xinyu-intel commented Nov 28, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

yiliu30 Nov 28, 2025

Uh oh!

ranzhejiang commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	elif tokens_num > 1536 and tokens_num <= 4096:
	elif tokens_num <= 4096:

Optimize MoE via chunk settings #658

Are you sure you want to change the base?

Optimize MoE via chunk settings #658

Uh oh!

Conversation

xinyu-intel commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

xinyu-intel commented Nov 28, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

✅ CI Passed

Uh oh!

yiliu30 Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

ranzhejiang commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xinyu-intel commented Nov 28, 2025 •

edited

Loading