Skip to content

[ROCm][CI] Stage C mirrors#42793

Open
AndreasKaratzas wants to merge 4 commits into
vllm-project:mainfrom
ROCm:akaratza_gating_stage_c
Open

[ROCm][CI] Stage C mirrors#42793
AndreasKaratzas wants to merge 4 commits into
vllm-project:mainfrom
ROCm:akaratza_gating_stage_c

Conversation

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator

@AndreasKaratzas AndreasKaratzas commented May 15, 2026

Mirrored/gated test groups:

  • V1 attention (H100-MI300)
  • Engine
  • OpenAI API correctness
  • LM Eval Small Models
  • Multi-Modal Models Pooling
  • Spec Decode Draft Model
  • Spec Decode Ngram + Suffix
  • Spec Decode Speculators + MTP

Agent count: 6 x mi300_1, 2 x mi325_1

Notes:
Added Speculators Correctness and Extract Hidden States Integration in test-amd.yaml under mi300_1.

cc @kenroche

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@AndreasKaratzas AndreasKaratzas marked this pull request as ready for review May 15, 2026 23:04
@mergify mergify Bot added ci/build rocm Related to AMD ROCm labels May 15, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD May 15, 2026
@AndreasKaratzas AndreasKaratzas requested a review from khluu May 15, 2026 23:05
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Buildkite CI configuration to expand AMD GPU testing support for MI300 and MI325 hardware. Key changes include adding AMD mirror configurations across multiple test areas—such as attention, engine, entrypoints, and speculative decoding—and introducing new integration tests for speculators and hidden states. Reviewers suggested marking the new "Extract Hidden States Integration" test as optional to avoid CI bottlenecks and recommended updating parent step dependencies to include ROCm-specific files, ensuring that relevant code changes properly trigger the mirrored AMD tests.

Comment thread .buildkite/test-amd.yaml
@@ -13,6 +13,20 @@ steps:
- tests/v1/attention
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The parent step's source_file_dependencies should include all files that the mirrors depend on. Currently, changes to vllm/_aiter_ops.py, vllm/envs.py, or vllm/platforms/rocm.py will not trigger this step, and consequently, the AMD mirror will not run for those changes.

    - tests/v1/attention
    - vllm/_aiter_ops.py
    - vllm/envs.py
    - vllm/platforms/rocm.py

@@ -149,3 +149,21 @@ steps:
- vllm/model_executor/models/whisper.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The parent step is missing several dependencies that are explicitly required by the AMD mirror (e.g., vllm/platforms/rocm.py, vllm/_aiter_ops.py). If these files are modified, the parent step will not trigger, preventing the AMD mirror from running. The parent's source_file_dependencies should be the union of all dependencies across mirrors.

  - vllm/model_executor/models/whisper.py
  - vllm/model_executor/layers/
  - vllm/v1/attention/backends/
  - vllm/v1/attention/selector.py
  - vllm/_aiter_ops.py
  - vllm/platforms/rocm.py
  - vllm/model_executor/model_loader/

@@ -36,6 +36,21 @@ steps:
- tests/v1/e2e/spec_decode/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The parent step should include vllm/platforms/rocm.py in its source_file_dependencies to ensure that ROCm-specific changes trigger the mirrored tests.

    - tests/v1/e2e/spec_decode/
    - vllm/platforms/rocm.py

@@ -60,6 +75,20 @@ steps:
- tests/v1/e2e/spec_decode/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The parent step should include vllm/platforms/rocm.py in its source_file_dependencies to ensure that ROCm-specific changes trigger the mirrored tests.

    - tests/v1/e2e/spec_decode/
    - vllm/platforms/rocm.py

@@ -71,6 +100,20 @@ steps:
- tests/v1/e2e/spec_decode/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The parent step should include vllm/platforms/rocm.py in its source_file_dependencies to ensure that ROCm-specific changes trigger the mirrored tests.

    - tests/v1/e2e/spec_decode/
    - vllm/platforms/rocm.py

@AndreasKaratzas AndreasKaratzas added the ready ONLY add when PR is ready to merge/full CI is needed label May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant