Skip to content

Conversation

@xinyu-intel
Copy link
Contributor

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements fixes to enable torch.compile compatibility for data parallel operations on HPU devices. The main issue addressed is that stack-style all-gather operations are incompatible with torch.compile, requiring a switch to concat-style operations.

Key Changes:

  • Modified all-gather operations to use concat-style instead of stack-style for torch.compile compatibility
  • Added padding logic to ensure token counts are divisible by tensor parallel size for sequence parallel MOE
  • Added null checks for DP metadata access to prevent potential errors during compilation

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
vllm_gaudi/v1/worker/hpu_dp_utils.py Adds padding to ensure num_tokens is divisible by tp_size for sequence parallel MOE
vllm_gaudi/ops/hpu_fused_moe.py Adds null-safety checks when accessing DP metadata fields
vllm_gaudi/ops/hpu_fp8.py Adds null-safety checks when accessing DP metadata fields
vllm_gaudi/distributed/device_communicators/hpu_communicator.py Converts all-gather from stack-style to concat-style for torch.compile compatibility
tests/full_tests/ci_tests.sh Enables and adds tests for DP2xTP2 configuration with both lazy mode and torch.compile mode

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 60 to 61
dp_metadata = get_hpu_dp_metadata()
hidden_states_across_dp = dp_metadata.hidden_states_across_dp if dp_metadata is not None else None
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code retrieves dp_metadata once but then checks if it's None on every subsequent attribute access. If dp_metadata is None on line 60, all three conditional expressions (lines 61, 64, 67) will set their variables to None, which could cause dispatch_tensor to fail since it may not handle None for the output parameter correctly. Consider adding an early check after line 60 or ensuring that dispatch_tensor properly handles None values.

Copilot uses AI. Check for mistakes.
Comment on lines 159 to 160
dp_metadata = get_hpu_dp_metadata()
hidden_states_across_dp = dp_metadata.hidden_states_across_dp if dp_metadata is not None else None
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as in hpu_fused_moe.py: if dp_metadata is None, all three variables will be set to None, potentially causing issues in dispatch_tensor. The None handling should be consistent with the function's expectations or an early guard should be added.

Copilot uses AI. Check for mistakes.
echo HABANA_VISIBLE_DEVICES=all VLLM_SKIP_WARMUP=true PT_HPU_LAZY_MODE=0 python -u vllm-gaudi/examples/data_parallel.py --dp-size 2 --tp-size 2
HABANA_VISIBLE_DEVICES=all VLLM_SKIP_WARMUP=true PT_HPU_LAZY_MODE=0 python -u vllm-gaudi/examples/data_parallel.py --dp-size 2 --tp-size 2
if [ $? -ne 0 ]; then
echo "Error: Test failed for data parallel size 2 + tensor parallel size 2" >&2
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is ambiguous as it doesn't distinguish between the lazy mode test and the torch.compile mode test. The message should specify 'torch.compile mode' to match the test being run.

Suggested change
echo "Error: Test failed for data parallel size 2 + tensor parallel size 2" >&2
echo "Error: Test failed for data parallel size 2 + tensor parallel size 2 (torch.compile mode)" >&2

Copilot uses AI. Check for mistakes.
@xinyu-intel xinyu-intel force-pushed the dev/xinyu/fix-dp-compile branch from b4b9690 to 2076a64 Compare December 16, 2025 08:03
if layer.dp_size > 1:
hidden_states_across_dp = get_hpu_dp_metadata().hidden_states_across_dp
dp_metadata = get_hpu_dp_metadata()
hidden_states_across_dp = dp_metadata.hidden_states_across_dp if dp_metadata is not None else None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if dp_size >1 the get_hpu_dp_metadata() should not return None, instead of checking if None, please add assert here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's only available under lazy+hpugraph mode since it can reuse the memory.

@github-actions
Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
17fec3af0942da83bcebe2ca0cb4f6ae81c634d8

@github-actions
Copy link

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

@xinyu-intel xinyu-intel force-pushed the dev/xinyu/fix-dp-compile branch from 9f163f4 to 9639ca5 Compare December 17, 2025 01:37
@xinyu-intel xinyu-intel force-pushed the dev/xinyu/fix-dp-compile branch from 9639ca5 to 30289bf Compare December 17, 2025 02:36
@github-actions
Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
f21f5ea38c6fa0e824bc00d5762d17e049199cd3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants