-
Notifications
You must be signed in to change notification settings - Fork 154
CI: Run some tests with compute-sanitizer #566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
CI: Run some tests with compute-sanitizer #566
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
FYI right now we use the mini-CTK approach in the CI:
So compute-sanitizer is currently not available in the CI. But I assume it can be grabbed easily. @cryos is refactoring our CI (#555). I suggest we perhaps add another standalone pipeline for running compute sanitizer? |
I was going to look at tools like this next, that is a great point and something I can factor in. Looking at the proposal here picking out a test run would be reasonable, I know there are other tools we would like to run too. |
/ok to test f27e1f4 |
|
/ok to test 05a7068 |
/ok to test dde857b |
Yay! The linux-64 tests are failing for the correct reason! (the compute sanitizer returns non-zero because it has detected issues). https://github.com/NVIDIA/cuda-python/actions/runs/14628267586/job/41045781037?pr=566 Windows tests are failing because I have disabled them partially. |
/ok to test a1ea51e |
There is no compute-sanitizer wheel, so we can only run when the ctk is installed system-wide
Because the sanitizer commands depend on the version of the sanitizer we need to be able to run the sanitzer to set the sanitizer cmd. Thus, we need to setup the sanitzer after it is installed.
a1ea51e
to
0430930
Compare
/ok to test 0430930 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving some quick comments
Whatever changes to cuda.bindings that prevented |
/ok to test 9c25910 |
/ok to test 7fb013e |
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
||
## Test-Time Environment Variables | ||
|
||
- `CUDA_PYTHON_SANTIZER_RUNNING` : When set to 1, tests are skipped that would cause [compute-sanitizer](https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html) to raise an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `CUDA_PYTHON_SANTIZER_RUNNING` : When set to 1, tests are skipped that would cause [compute-sanitizer](https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html) to raise an error. | |
- `CUDA_PYTHON_SANITIZER_RUNNING` : When set to 1, tests are skipped that would cause [compute-sanitizer](https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html) to raise an error. |
if [[ "$COMPUTE_SANITIZER_VERSION" -ge 202111 ]]; then | ||
SANITIZER_CMD="${SANITIZER_CMD} --padding=32" | ||
fi | ||
echo "CUDA_PYTHON_SANITIZER_RUNNING=1" >> $GITHUB_ENV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the CUDA_PYTHON_SANITIZER_RUNNING
variable?
I see below you're doing this:
echo "COMPUTE_SANITIZER_VERSION=${COMPUTE_SANITIZER_VERSION}" >> $GITHUB_ENV
Could we just query that in the tests? E.g. for most cases:
@pytest.mark.skipif(
os.environ.get("COMPUTE_SANITIZER_VERSION") is not None,
reason="The compute-sanitizer is running, and this test intentionally causes an API error.",
)
Maybe later we'll have cases that want to look at the version number, then we wouldn't need anything new or special.
@@ -83,6 +84,10 @@ def test_cuda_memcpy(): | |||
assert err == cuda.CUresult.CUDA_SUCCESS | |||
|
|||
|
|||
@pytest.mark.skipif( | |||
os.environ.get("CUDA_PYTHON_SANITIZER_RUNNING", "0") == "1", | |||
reason="The compute-sanitzer is running, and this test intentionally causes an API error.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: sanitzer → sanitizer
(this typo has 10 copies)
We could reduce the copy-paste via a helper in tests/conftest.py
(we don't have that yet, but good to start one):
skipif_compute_sanitizer_is_running = pytest.mark.skipif(
os.environ.get("CUDA_PYTHON_SANITIZER_RUNNING", "0") == "1",
reason="The compute-sanitizer is running, and this test intentionally causes an API error.",
)
Then here it would become:
@skipif_compute_sanitizer_is_running
def test_cuda_array():
Maybe a refinement:
COMPUTE_SANITIZER_IS_RUNNING = os.environ.get("CUDA_PYTHON_SANITIZER_RUNNING", "0") == "1"
skipif_compute_sanitizer_is_running = pytest.mark.skipif(
COMPUTE_SANITIZER_IS_RUNNING,
reason="The compute-sanitizer is running, and this test intentionally causes an API error.",
)
Then further down (test_timing
) you could use COMPUTE_SANITIZER_IS_RUNNING
instead of spelling out the os.environ
code again.
Description
Runs python 3.12 pytests in the context of compute-sanitizer to check for memory issues and errors from the CUDA API.
closes #565
closes #562
Checklist