-
Notifications
You must be signed in to change notification settings - Fork 579
Remove test skip logic for GEMM-AR tests #2516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove test skip logic for GEMM-AR tests #2516
Conversation
Signed-off-by: Vladimir Cherepanov <[email protected]>
6d511cc to
4fc6c3c
Compare
Greptile OverviewGreptile SummaryRemoves the multicast support check from Key changes:
Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Test as GemmAr Test
participant GemmAr as GemmAr Fixture
participant NVTE as nvte_gemm_all_reduce
participant cuBLASMp as cuBLASMp Library
participant HW as Hardware/Device
Note over Test,HW: Before this PR (cuBLASMp < 0.6)
Test->>GemmAr: SetUp()
GemmAr->>HW: Check multicast support
alt Multicast not supported
GemmAr->>Test: GTEST_SKIP()
Note over Test: Test skipped
else Multicast supported
GemmAr->>Test: Continue
Test->>NVTE: Run gemm_all_reduce
NVTE->>cuBLASMp: Execute GEMM+AR with multicast
end
Note over Test,HW: After this PR (cuBLASMp >= 0.6)
Test->>GemmAr: SetUp() [no override]
GemmAr->>Test: Continue (no check)
Test->>NVTE: Run gemm_all_reduce
NVTE->>cuBLASMp: Execute GEMM+AR
alt Hardware has multicast/NVL
cuBLASMp->>HW: Use multicast
else No multicast support
cuBLASMp->>cuBLASMp: Fallback to NCCL
cuBLASMp->>HW: Use NCCL allreduce
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
tests/cpp_distributed/test_comm_gemm.cu, line 373 (link)style: if
CUBLASMP_VERSIONis not defined by the cuBLASMp headers, this will evaluate to#if 0 < 600(true), causing the old behavior. Consider adding a check like:#if !defined(CUBLASMP_VERSION) || CUBLASMP_VERSION < 600to make the intent explicit and avoid silent failures if the macro isn't defined.
1 file reviewed, 1 comment
Signed-off-by: Vladimir Cherepanov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
tests/cpp_distributed/test_comm_gemm.cu, line 336-366 (link)style: Consider verifying cuBLASMp >= 0.6 at build time. If older versions are still supported, tests will fail on hardware without multicast.
1 file reviewed, 1 comment
timmoon10
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
cuBLASMp 0.6 and later uses NCCL fallback for GEMM+AR if NVL / multicast is not available.
This change removes test skip logic.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: