DP: Fix for torch.compile #722

xinyu-intel · 2025-12-16T03:09:39Z

No description provided.

Copilot

Pull request overview

This PR implements fixes to enable torch.compile compatibility for data parallel operations on HPU devices. The main issue addressed is that stack-style all-gather operations are incompatible with torch.compile, requiring a switch to concat-style operations.

Key Changes:

Modified all-gather operations to use concat-style instead of stack-style for torch.compile compatibility
Added padding logic to ensure token counts are divisible by tensor parallel size for sequence parallel MOE
Added null checks for DP metadata access to prevent potential errors during compilation

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
vllm_gaudi/v1/worker/hpu_dp_utils.py	Adds padding to ensure num_tokens is divisible by tp_size for sequence parallel MOE
vllm_gaudi/ops/hpu_fused_moe.py	Adds null-safety checks when accessing DP metadata fields
vllm_gaudi/ops/hpu_fp8.py	Adds null-safety checks when accessing DP metadata fields
vllm_gaudi/distributed/device_communicators/hpu_communicator.py	Converts all-gather from stack-style to concat-style for torch.compile compatibility
tests/full_tests/ci_tests.sh	Enables and adds tests for DP2xTP2 configuration with both lazy mode and torch.compile mode

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-16T03:10:27Z

vllm_gaudi/ops/hpu_fused_moe.py

+            dp_metadata = get_hpu_dp_metadata()
+            hidden_states_across_dp = dp_metadata.hidden_states_across_dp if dp_metadata is not None else None


The code retrieves dp_metadata once but then checks if it's None on every subsequent attribute access. If dp_metadata is None on line 60, all three conditional expressions (lines 61, 64, 67) will set their variables to None, which could cause dispatch_tensor to fail since it may not handle None for the output parameter correctly. Consider adding an early check after line 60 or ensuring that dispatch_tensor properly handles None values.

Copilot · 2025-12-16T03:10:27Z

vllm_gaudi/ops/hpu_fp8.py

+            dp_metadata = get_hpu_dp_metadata()
+            hidden_states_across_dp = dp_metadata.hidden_states_across_dp if dp_metadata is not None else None


Same issue as in hpu_fused_moe.py: if dp_metadata is None, all three variables will be set to None, potentially causing issues in dispatch_tensor. The None handling should be consistent with the function's expectations or an early guard should be added.

Copilot · 2025-12-16T03:10:28Z

tests/full_tests/ci_tests.sh

+echo HABANA_VISIBLE_DEVICES=all VLLM_SKIP_WARMUP=true PT_HPU_LAZY_MODE=0 python -u vllm-gaudi/examples/data_parallel.py --dp-size 2 --tp-size 2
+HABANA_VISIBLE_DEVICES=all VLLM_SKIP_WARMUP=true PT_HPU_LAZY_MODE=0 python -u vllm-gaudi/examples/data_parallel.py --dp-size 2 --tp-size 2
+if [ $? -ne 0 ]; then
+    echo "Error: Test failed for data parallel size 2 + tensor parallel size 2" >&2


The error message is ambiguous as it doesn't distinguish between the lazy mode test and the torch.compile mode test. The message should specify 'torch.compile mode' to match the test being run.

Suggested change

echo "Error: Test failed for data parallel size 2 + tensor parallel size 2" >&2

echo "Error: Test failed for data parallel size 2 + tensor parallel size 2 (torch.compile mode)" >&2

michalkuligowski · 2025-12-16T09:56:44Z

vllm_gaudi/ops/hpu_fused_moe.py

        if layer.dp_size > 1:
-            hidden_states_across_dp = get_hpu_dp_metadata().hidden_states_across_dp
+            dp_metadata = get_hpu_dp_metadata()
+            hidden_states_across_dp = dp_metadata.hidden_states_across_dp if dp_metadata is not None else None


if dp_size >1 the get_hpu_dp_metadata() should not return None, instead of checking if None, please add assert here

it's only available under lazy+hpugraph mode since it can reuse the memory.

github-actions · 2025-12-16T13:09:06Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
17fec3af0942da83bcebe2ca0cb4f6ae81c634d8

github-actions · 2025-12-16T23:28:16Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Signed-off-by: Xinyu Chen <[email protected]>

github-actions · 2025-12-17T03:55:40Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
f21f5ea38c6fa0e824bc00d5762d17e049199cd3

Copilot AI review requested due to automatic review settings December 16, 2025 03:09

xinyu-intel requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners December 16, 2025 03:09

Copilot AI reviewed Dec 16, 2025

View reviewed changes

github-actions bot mentioned this pull request Dec 16, 2025

🚦 Team Review Dashboard #701

Open

wuxun-zhang approved these changes Dec 16, 2025

View reviewed changes

xinyu-intel force-pushed the dev/xinyu/fix-dp-compile branch from b4b9690 to 2076a64 Compare December 16, 2025 08:03

michalkuligowski reviewed Dec 16, 2025

View reviewed changes

xinyu-intel force-pushed the dev/xinyu/fix-dp-compile branch from 9f163f4 to 9639ca5 Compare December 17, 2025 01:37

xinyu-intel added 3 commits December 17, 2025 10:01

DP: fix for torch.compile and eager mode

ac8cb6e

Signed-off-by: Xinyu Chen <[email protected]>

TP: fix output size of allgather

ace59fc

Signed-off-by: Xinyu Chen <[email protected]>

DP: re-enable hourly tests

30289bf

Signed-off-by: Xinyu Chen <[email protected]>

xinyu-intel force-pushed the dev/xinyu/fix-dp-compile branch from 9639ca5 to 30289bf Compare December 17, 2025 02:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DP: Fix for torch.compile #722

DP: Fix for torch.compile #722

Uh oh!

xinyu-intel commented Dec 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 16, 2025

Uh oh!

Copilot AI Dec 16, 2025

Uh oh!

Copilot AI Dec 16, 2025

Uh oh!

michalkuligowski Dec 16, 2025

Uh oh!

xinyu-intel Dec 16, 2025

Uh oh!

github-actions bot commented Dec 16, 2025

Uh oh!

github-actions bot commented Dec 16, 2025

Uh oh!

github-actions bot commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		dp_metadata = get_hpu_dp_metadata()
		hidden_states_across_dp = dp_metadata.hidden_states_across_dp if dp_metadata is not None else None

	echo "Error: Test failed for data parallel size 2 + tensor parallel size 2" >&2
	echo "Error: Test failed for data parallel size 2 + tensor parallel size 2 (torch.compile mode)" >&2

DP: Fix for torch.compile #722

Are you sure you want to change the base?

DP: Fix for torch.compile #722

Uh oh!

Conversation

xinyu-intel commented Dec 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

michalkuligowski Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

xinyu-intel Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 16, 2025

✅ CI Passed

Uh oh!

github-actions bot commented Dec 16, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Dec 17, 2025

✅ CI Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants