-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Reland] ROCm CI (Infra + Skips) #1581
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1581
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 900cf5b with merge base f6f3322 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Warning: Unknown label
Please add the new label to .github/pytorch-probot.yml |
Thanks @andrewor14 I will work with AMD team on that |
@amdfaa just cherry-pick your infra changes into this PR so we can have a clearer CI signal. Please help review the changes. thx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm from the infra side
The breakage on cuda doesns't seem related to you, seems like it's this test FAILED test/quantization/test_quant_api.py::TestQuantFlow::test_quantized_tensor_subclass_int8_dyn_quant - torch._inductor.exc.CppCompileError: C++ compile error in which case @jerryzh168 might need to take a look |
@petrex Looks like more fixes/skips are needed: https://github.com/pytorch/ao/actions/runs/12934492712/job/36084046951?pr=1581 |
I see ROCm tests clean : but the job later fails with
|
Just to update on the latest status of this PR, we are almost done with enabling/skipping the functionality for the unit tests on ROCm, but we are finalizing the changes to ensure that the ROCm CI runs will run only on push to main branch for now, in light of our limited CI capacity. In a follow-up PR, we intend to expand torchao ROCm CI testing to torchao PRs as well. |
@jithunnair-amd maybe lint the changes? |
2da6cd8
to
61e86c2
Compare
80711c5
to
da9f271
Compare
da9f271
to
a6958d7
Compare
Add a skip decorator for ROCm to prevent test failures during ongoing ROCm enablement
Add ROCm skip decorator to prevent test failures during ongoing ROCm enablement
@msaroufim @supriyar Can we please get an approval from a torchao maintainer so we can merge this PR when we have a clean signal on ROCm CI (just adding more skips at this point)? |
@jithunnair-amd, for sure. @jcaip will be the one reviewing and approving from torchao side. |
@jcaip ROCm nightly CI passed: https://github.com/pytorch/ao/actions/runs/13464088074/job/37625918055?pr=1581 Please approve and merge, before any more unit test failures creep in :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jithunnair-amd LGTM, thanks for the PR!
This PR to skip the unit test failures for ROCm + infra changes to enable ROCm CI.
NOTE:
This PR aims to enable the ROCm CI testing for torchao only for pushes to main branch. The ROCm tests should start showing up here once this PR is merged: https://hud.pytorch.org/hud/pytorch/ao/main/1?per_page=50&name_filter=regression
Torchao PRs can also trigger the ROCm CI runs using the
ciflow/rocm
PR label (#1749).Enabling ROCm CI testing on all torchao PRs will be done in a follow-up PR.
This pull request introduces the
skip_if_rocm
decorator across various test files to skip tests that are not yet supported on ROCm. The changes ensure that tests are conditionally skipped if ROCm is detected, improving the test suite's compatibility with different environments.Key changes include:
Cherry-pick ROCm CI infra changes from #999
Configure workflow to trigger ROCm CI only for pushes to main branch, OR on PRs with the
ciflow/rocm
labelIntroduction of
skip_if_rocm
decorator:skip_if_rocm
import in multiple test files to conditionally skip tests not supported on ROCm. (test/dtypes/test_affine_quantized.py
,test/dtypes/test_floatx.py
,test/float8/test_base.py
,test/hqq/test_hqq_affine.py
,test/integration/test_integration.py
,test/kernel/test_galore_downproj.py
,test/prototype/test_awq.py
,test/prototype/test_low_bit_optim.py
,test/prototype/test_splitk.py
,test/quantization/test_galore_quant.py
,test/quantization/test_marlin_qqq.py
,test/sparsity/test_marlin.py
,test/test_ops.py
,test/test_s8s4_linear_cutlass.py
,torchao/utils.py
) [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]Application of
skip_if_rocm
decorator:@skip_if_rocm("ROCm development in progress")
to multiple test functions to skip them when running on ROCm. (test/dtypes/test_affine_quantized.py
,test/dtypes/test_floatx.py
,test/float8/test_base.py
,test/hqq/test_hqq_affine.py
,test/integration/test_integration.py
,test/kernel/test_galore_downproj.py
,test/prototype/test_awq.py
,test/prototype/test_low_bit_optim.py
,test/prototype/test_splitk.py
,test/quantization/test_galore_quant.py
,test/quantization/test_marlin_qqq.py
,test/sparsity/test_marlin.py
) [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]Module-level skips for ROCm:
test/test_ops.py
,test/test_s8s4_linear_cutlass.py
) [1] [2]