Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] Reduce for algorithm testing with debug builds to prevent timeouts #2058

Merged
merged 7 commits into from
Feb 11, 2025

Conversation

mmichel11
Copy link
Contributor

@mmichel11 mmichel11 commented Feb 10, 2025

Description

The input sizes used for testing for-based algorithms is dependent on the number of execution units on the SYCL queue. This approach was used to ensure we generate large enough inputs to test the large for submitter path in oneDPL. A hard cap is set at 10,000,000 elements which is sufficient to test the large submitter path on high-performance GPUs with many EUs.

However, we recently spotted that this approach results in test timeouts on client GPUs such as BMG and Arc in debug mode. To fix this issue, a new approach has been added for debug mode to avoid such timeouts while ensuring a single test case for the large submitter path. The "normal" set of inputs is used (factors of PI up to 100k elements) followed by a single test case above the threshold to invoke the large submitter. For release builds, testing remains unchanged and is more extensive. Test times on BMG with debug builds now in several hundred seconds (as opposed to timing out at 1200) and CI is able to complete with several hours before the timeout threshold.

Details

The previous utility get_pattern_for_max_n utility which returned a single maximum input is replaced with get_pattern_for_test_sizes which returns a monotonically increasing sequence of elements to test with. Each for-based test is touched to accommodate these changes.

@danhoeflinger
Copy link
Contributor

Are we losing coverage on full kernels by shrinking this space or are we still at least hitting all the same kernels on debug?

@mmichel11
Copy link
Contributor Author

mmichel11 commented Feb 10, 2025

Are we losing coverage on full kernels by shrinking this space or are we still at least hitting all the same kernels on debug?

On integrated graphics, we will be hitting all of the kernels in debug mode which make up most of our CI testing. However, on PVC / BMG we will not. I had experimented with a cap of 2 million which would ensure this for PVC, but I saw very long times with debug builds on battlemage (500 - 600 seconds per test).

@mmichel11
Copy link
Contributor Author

mmichel11 commented Feb 11, 2025

Are we losing coverage on full kernels by shrinking this space or are we still at least hitting all the same kernels on debug?

On integrated graphics, we will be hitting all of the kernels in debug mode which make up most of our CI testing. However, on PVC / BMG we will not. I had experimented with a cap of 2 million which would ensure this for PVC, but I saw very long times with debug builds on battlemage (500 - 600 seconds per test).

Moving to draft while I test a new strategy on the following branch that addresses the point @danhoeflinger brought up: https://github.com/uxlfoundation/oneDPL/compare/dev/mmichel11/pfor_debug_input_size_reduce_tmp

@mmichel11 mmichel11 marked this pull request as draft February 11, 2025 02:31
@mmichel11
Copy link
Contributor Author

Are we losing coverage on full kernels by shrinking this space or are we still at least hitting all the same kernels on debug?

@danhoeflinger This issue has been addressed. get_pattern_for_max_n is replaced with get_pattern_for_test_sizes which returns a sequence of elements to test with. In the case of debug mode, we inject an input past the estimated upper-bound threshold for the large submitter. Release build test inputs are unchanged. This fixes the BMG / Arc timeout issue without compromising testing in the debug case.

@mmichel11 mmichel11 marked this pull request as ready for review February 11, 2025 14:56
Copy link
Contributor

@dmitriy-sobolev dmitriy-sobolev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good enough to me leaving aside some minor comments.

test/support/utils.h Show resolved Hide resolved
test/support/utils.h Outdated Show resolved Hide resolved
Signed-off-by: Matthew Michel <[email protected]>
Copy link
Contributor

@dmitriy-sobolev dmitriy-sobolev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good strategy with some flexibility going forward, allowing us to keep general coverage on debug builds. LGTM

@mmichel11 mmichel11 merged commit 4f216b4 into main Feb 11, 2025
22 checks passed
@mmichel11 mmichel11 deleted the dev/mmichel11/pfor_debug_input_size_reduce branch February 11, 2025 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants