Skip to content

enable ut test for xpu devices#11712

Merged
Kangyan-Zhou merged 49 commits intosgl-project:mainfrom
DiweiSun:molly/ut_enabling_xpu
Feb 3, 2026
Merged

enable ut test for xpu devices#11712
Kangyan-Zhou merged 49 commits intosgl-project:mainfrom
DiweiSun:molly/ut_enabling_xpu

Conversation

@DiweiSun
Copy link
Collaborator

@DiweiSun DiweiSun commented Oct 16, 2025

(Please be kindly informed that this PR encompasses a large scope. It will be split into smaller PRs as requested.)

This PR is to enable Sglang UTs on XPU. What we do in this PR:

  1. Enabling multi-hardware config in test/runners.py
  2. Enabling multi-hardware config in test/test_utils.py
  3. Enabling multi-hardware on test_*.py as required.

How to Run UTs on XPU:
Apply this PR/diff on Sglang main, and then build sglang env via docker/Dockerfile.xpu.
Pypi Packages might be required by some cases and shall be manually installed:
flashinfer-python
sentencepiece
ray
accelerate
nest-asyncio

pytest -v test_*.py

@@ -1837,3 +1840,34 @@ def wrapper(self):
return wrapper

return decorator


def get_gpu_rank():
Copy link
Contributor

@chunyuan-w chunyuan-w Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is getting device count, not the current rank or device index. I feel like it's better to name the function to something like get_gpu_count and rename the variable from gpu_rank to gpu_count.

gpu_rank = torch.cuda.device_count()
elif is_rocm():
gpu_rank = torch.rocm.device_count()
return gpu_rank
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding a final else to handle the case where none of the device backends apply:

Suggested change
return gpu_rank
else:
gpu_count = 0
return gpu_count

if is_cuda():
return torch.cuda.device_memory_used() / 1024**3
elif is_xpu():
return torch.xpu.device_memory_used() / 1024**3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding a final else

return torch.xpu.device_memory_used() / 1024**3


def get_gpu_capability():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to directly reuse the below function?

def get_device_capability(device_id: int = 0) -> Tuple[int, int]:

@@ -149,10 +153,8 @@ def causal_conv1d_opcheck_fn(
@pytest.mark.parametrize("width", [4])
@pytest.mark.parametrize("dim", [2048, 2048 + 16, 4096])
def test_causal_conv1d_update(dim, width, seqlen, has_bias, silu_activation, itype):
if not torch.cuda.is_available():
pytest.skip("CUDA device not available")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By removing this check, are we expecting this test to run on all devices, or only on cuda and xpu?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is only in CUDA CI.

@@ -188,10 +190,8 @@ def test_causal_conv1d_update(dim, width, seqlen, has_bias, silu_activation, ity
def test_causal_conv1d_update_with_batch_gather(
batch_size, with_padding, dim, width, seqlen, has_bias, silu_activation, itype
):
if not torch.cuda.is_available():
pytest.skip("CUDA device not available")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -268,11 +268,9 @@ def test_causal_conv1d_update_with_batch_gather(
def test_causal_conv1d_varlen(
batch, with_padding, dim, seqlen, width, has_bias, silu_activation, itype
):
if not torch.cuda.is_available():
pytest.skip("CUDA device not available")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


class TestCreateKvIndices(CustomTestCase):
@classmethod
def setUpClass(cls):
if not torch.cuda.is_available():
raise unittest.SkipTest("CUDA is not available")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

device_type = getattr(torch.accelerator.current_accelerator(), "type", "cpu")


def get_gpu_rank():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've added this util function in python/sglang/test/test_utils.py in this PR. Can we directly reuse that one?

from sglang.test.test_utils import empty_gpu_cache

device_type = getattr(torch.accelerator.current_accelerator(), "type", "cpu")
torch.set_default_device(device_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like you could add a util function for this device setting in python/sglang/test/test_utils.py and then you can reuse this util function in all these files.

@airMeng
Copy link
Collaborator

airMeng commented Nov 7, 2025

@ping1jing2 The PR can benefit ascend as well, could you give a review?

Copy link
Collaborator

@mingfeima mingfeima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally LGTM, just some minor changes required.

@@ -1837,3 +1840,34 @@ def wrapper(self):
return wrapper

return decorator


def get_gpu_rank():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_gpu_rank():
def get_device_count():
"""
Returns the number of available devices depending on the backend.
Supports CUDA, ROCm, and XPU.
"""

Comment on lines 1855 to 1941
def empty_gpu_cache():
if is_xpu():
torch.xpu.empty_cache()
elif is_cuda():
torch.cuda.empty_cache()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to have rocm here?

and also it needs a final else.

@@ -23,6 +23,9 @@
from sglang.srt.utils.hf_transformers_utils import get_tokenizer
from sglang.test.test_utils import DEFAULT_SMALL_MODEL_NAME_FOR_TEST, CustomTestCase

device_type = getattr(torch.accelerator.current_accelerator(), "type", "cpu")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of this line?

@mingfeima mingfeima added xpu intel gpu with device `torch.xpu` intel run-ci labels Nov 7, 2025
@mingfeima
Copy link
Collaborator

enable CI to verify.

@mingfeima
Copy link
Collaborator

@DiweiSun make sure that you install pre-commit according to https://docs.sglang.ai/developer_guide/contribution_guide.html#format-code-with-pre-commit

@ping1jing2
Copy link
Collaborator

@ping1jing2 The PR can benefit ascend as well, could you give a review?

thank you, can't agree more with chunyuan-w and mingfeima's comments

@github-actions github-actions bot added documentation Improvements or additions to documentation quant LLM Quantization amd dependencies Pull requests that update a dependency file lora Multi-modal multi-modal language model deepseek speculative-decoding labels Nov 21, 2025
@1pikachu
Copy link
Contributor

1pikachu commented Jan 4, 2026

/rerun-failed-ci

@1pikachu
Copy link
Contributor

1pikachu commented Jan 9, 2026

/rerun-failed-ci

@1pikachu
Copy link
Contributor

/rerun-failed-ci

@1pikachu
Copy link
Contributor

/rerun-failed-ci

device: Device type ("auto", "cuda", "rocm" or "cpu").
If "auto", will detect available platforms automatically.
"""
# Auto-detect device if needed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue is here:

except (RuntimeError, ImportError) as e:

This change here may not be ideal, but I think we should not fall back to CPU and should raise an error directly.

@1pikachu
Copy link
Contributor

/rerun-failed-ci

@1pikachu
Copy link
Contributor

/rerun-failed-ci

@1pikachu
Copy link
Contributor

1pikachu commented Feb 2, 2026

/rerun-failed-ci

@Kangyan-Zhou
Copy link
Collaborator

/rerun-failed-ci

1 similar comment
@1pikachu
Copy link
Contributor

1pikachu commented Feb 2, 2026

/rerun-failed-ci

@1pikachu
Copy link
Contributor

1pikachu commented Feb 2, 2026

Hello, @Kangyan-Zhou Those PR issues aren’t caused by my change. Could you help review it?

@Kangyan-Zhou
Copy link
Collaborator

Hello, @Kangyan-Zhou Those PR issues aren’t caused by my change. Could you help review it?

Yes I think the PR generally looks good, just wanted to have all the CI to pass to get more confidence. I'll keep a closer look at it.

@Kangyan-Zhou Kangyan-Zhou self-assigned this Feb 2, 2026
@Kangyan-Zhou Kangyan-Zhou merged commit 495290a into sgl-project:main Feb 3, 2026
267 of 321 checks passed
charlesHsuGG pushed a commit to charlesHsuGG/sglang that referenced this pull request Feb 5, 2026
Co-authored-by: jundu <jun.du@intel.com>
Co-authored-by: Gao, Pengfei <pengfei.gao@intel.com>
sfiisf pushed a commit to sfiisf/sglang that referenced this pull request Feb 5, 2026
Co-authored-by: jundu <jun.du@intel.com>
Co-authored-by: Gao, Pengfei <pengfei.gao@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

amd deepseek dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation hicache Hierarchical Caching for SGLang high priority intel lora model-gateway Multi-modal multi-modal language model npu quant LLM Quantization run-ci sgl-kernel speculative-decoding xpu intel gpu with device `torch.xpu`

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants