-
-
Notifications
You must be signed in to change notification settings - Fork 509
Changed reduce implementation and added tests #6877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Can one of the admins verify this patch? |
hkaiser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please separate the formatting changes into a different PR? It's close to impossible to review your actual changes here.
Also, for the formatting - are you sure you're using clang-format V20?
|
Yeah, I was not using V20. |
Simply keep (force) pushing to your branch, this wil update the PR. |
94b487d to
22a1f44
Compare
Signed-off-by: Bhoomish <[email protected]>
Signed-off-by: Bhoomish <[email protected]>
22a1f44 to
25bccb7
Compare
Signed-off-by: Bhoomish <[email protected]>
| { | ||
| if (part_size == 1) | ||
| { | ||
| return HPX_INVOKE(r, *part_begin, *part_begin); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not doing the correct thing as it would apply the reduction to the element of a single-element partition 'twice'. We have to avoid creating single-element partitions to begin with.
Currently, (almost) all algorithms use this functionality to determine the chunk size to use:
hpx/libs/core/algorithms/include/hpx/parallel/util/detail/chunk_size.hpp
Lines 182 to 261 in f36e02d
| HPX_CXX_EXPORT template <typename ExPolicy, typename Future, typename F1, | |
| typename IterOrR, typename Stride = std::size_t> | |
| hpx::util::iterator_range<chunk_size_iterator<IterOrR>> | |
| get_bulk_iteration_shape(ExPolicy& policy, std::vector<Future>& workitems, | |
| F1&& f1, IterOrR& it_or_r, std::size_t& count, std::size_t& cores, | |
| Stride s = Stride(1)) | |
| { | |
| if (count == 0) | |
| { | |
| cores = 1; | |
| auto it = chunk_size_iterator(it_or_r, 1); | |
| return hpx::util::iterator_range(it, it); | |
| } | |
| Stride stride = parallel::detail::abs(s); | |
| auto test_function = [&](std::size_t test_chunk_size) -> std::size_t { | |
| if (test_chunk_size == 0) | |
| return 0; | |
| if (stride != 1) | |
| { | |
| // rounding up | |
| test_chunk_size = (std::max) (static_cast<std::size_t>(stride), | |
| (test_chunk_size + stride - 1) / stride * stride); | |
| } | |
| add_ready_future(workitems, f1, it_or_r, test_chunk_size); | |
| test_chunk_size = (std::min) (count, test_chunk_size); | |
| count -= test_chunk_size; | |
| it_or_r = next_or_subrange(it_or_r, test_chunk_size, count); | |
| return test_chunk_size; | |
| }; | |
| // note: running the test function will modify 'count' | |
| auto iteration_duration = | |
| hpx::execution::experimental::measure_iteration( | |
| policy.parameters(), policy.executor(), test_function, count); | |
| cores = hpx::execution::experimental::processing_units_count( | |
| policy.parameters(), policy.executor(), iteration_duration, count); | |
| std::size_t max_chunks = | |
| hpx::execution::experimental::maximal_number_of_chunks( | |
| policy.parameters(), policy.executor(), cores, count); | |
| std::size_t chunk_size = | |
| hpx::execution::experimental::get_chunk_size(policy.parameters(), | |
| policy.executor(), iteration_duration, cores, count); | |
| // make sure, chunk size and max_chunks are consistent | |
| adjust_chunk_size_and_max_chunks(cores, count, max_chunks, chunk_size); | |
| auto last = next_or_subrange(it_or_r, count, 0); | |
| if (stride != 1) | |
| { | |
| chunk_size = (std::max) (static_cast<std::size_t>(stride), | |
| (chunk_size + stride - 1) / stride * stride); | |
| } | |
| // Report the calculated parameters to the corresponding parameters | |
| // object. | |
| hpx::execution::experimental::collect_execution_parameters( | |
| policy.parameters(), policy.executor(), count, cores, max_chunks, | |
| chunk_size); | |
| // update executor with new values | |
| policy = hpx::experimental::prefer( | |
| hpx::execution::experimental::with_processing_units_count, policy, | |
| cores); | |
| auto shape_begin = chunk_size_iterator(it_or_r, chunk_size, count); | |
| auto shape_end = chunk_size_iterator(last, chunk_size, count, count); | |
| return hpx::util::iterator_range(shape_begin, shape_end); | |
| } |
Especially the invocation of the customization point here is responsible for computing the chunk size:
hpx/libs/core/algorithms/include/hpx/parallel/util/detail/chunk_size.hpp
Lines 231 to 233 in f36e02d
| std::size_t chunk_size = | |
| hpx::execution::experimental::get_chunk_size(policy.parameters(), | |
| policy.executor(), iteration_duration, cores, count); |
I'm inclined to think that we should introduce a new customization point replacing the final chunk size adjustments here
| adjust_chunk_size_and_max_chunks(cores, count, max_chunks, chunk_size); |
That would allow to install a custom implementation of said customization point for the reduce algorithm without interfering with any other, possibly user-supplied customizations.
Fixes #6647
Proposed Changes
Created a helper function that reduces the partition value without the need of initial value..
Added a test case that was mentioned in the issue #6647.
Corrected a typo in documentation.
I only did changes in :
And created this :
Any background context you want to provide?
Previously, the implementation assumed that the *first was convertible to T. This assumption does not hold for the actual function logic required here. This PR removes that dependency, ensuring the type handling is correct.
Checklist
Not all points below apply to all pull requests.