-
-
Notifications
You must be signed in to change notification settings - Fork 491
parallel_scheduler for P2079 Section 4.1 User Facing API (ref #6601) #6655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Can one of the admins verify this patch? |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
cb9d1a7 to
1a77a29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang format
libs/core/executors/include/hpx/executors/thread_pool_scheduler.hpp
Outdated
Show resolved
Hide resolved
|
Update:
@hkaiser — ready for re-review when convenient. Thanks for the guidance! |
|
@hkaiser Can you please verify it now? I'm not sure why those 2 tests are failing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I'm impressed! I have a couple of minor comments, though.
libs/core/executors/CMakeLists.txt
Outdated
|
|
||
| # Default location is $HPX_ROOT/libs/executors/include | ||
| set(executors_headers | ||
| hpx/executors/parallel_scheduler.hpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind retaining an alphabetical order of the files?
| : scheduler(HPX_FORWARD(S, s)) | ||
| , receiver(r) | ||
| { | ||
| // std::cout << "Operation state created" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove commented code
| parallel_scheduler(const parallel_scheduler&) noexcept = default; | ||
| parallel_scheduler(parallel_scheduler&&) noexcept = default; | ||
| parallel_scheduler& operator=( | ||
| const parallel_scheduler&) noexcept = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use east const syntax everywhere.
| hpx::detail::try_catch_exception_ptr( | ||
| [&]() { | ||
| thread_pool_scheduler exec{os.scheduler.get_thread_pool()}; | ||
| for (Shape i = 0; i < os.shape; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the parallel_scheduler supposed to run the tasks concurrently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, you should be able to rely on the bulk implementation for the thread_pool_scheduler here: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/core/executors/include/hpx/executors/thread_pool_scheduler_bulk.hpp
| return lhs.pool_ == rhs.pool_; | ||
| } | ||
|
|
||
| hpx::threads::thread_pool_base* get_thread_pool() const noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| hpx::threads::thread_pool_base* get_thread_pool() const noexcept | |
| constexpr hpx::threads::thread_pool_base* get_thread_pool() const noexcept |
| hpx::execution::experimental::set_stopped_t()>; | ||
|
|
||
| template <typename Env> | ||
| friend auto tag_invoke( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| friend auto tag_invoke( | |
| friend constexpr auto tag_invoke( |
| hpx::execution::experimental::set_stopped_t()>; | ||
|
|
||
| template <typename Env> | ||
| friend auto tag_invoke( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| friend auto tag_invoke( | |
| friend constexpr auto tag_invoke( |
| { | ||
| exec.execute([i, &os]() mutable { | ||
| // std::cout << "Bulk task executing for index: " << i <<; | ||
| os.f(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| os.f(i); | |
| HPX_INVOKE(os.f, i); |
| exec.execute([i, &os]() mutable { | ||
| // std::cout << "Bulk task executing for index: " << i <<; | ||
| os.f(i); | ||
| if (--(*os.tasks_remaining) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using std::atomic_ref instead of a std::shared_ptr would be preferrable (at least for C++20)
| } | ||
|
|
||
| template <typename Shape, typename F> | ||
| friend auto tag_invoke(bulk_t, parallel_sender&& s, Shape shape, F&& f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For bulk, we have implemented an extension allowing to pass an arbitrary range as the Shape (in addition to the integral value), see here, for instance: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/core/execution/include/hpx/execution/algorithms/bulk.hpp#L273-L300. It might be a good idea to support this here as well.
|
Also, you may want to move all types that are don't have to visible by users into |
Sure, I'll do that soon |
|
i'm not sure why it's failing for clang_format. i used |
|
@hkaiser The custom bulk chunking might duplicate functionality in Offers a parallel execution option that leverages HPX’s thread pool, maintaining high performance for CPU-bound tasks. While this version lacks P2079R7’s replaceability API (section 4.2) and many more, it could be extended to support it ( will try to work on this :)) |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
The CI uses an older version of clang-format that does not always agree with newer versions on how to format things. Simply surround the offending lines with to disable it checking those. |
| for (std::size_t t = 0; t < num_threads; ++t) | ||
| { | ||
| std::size_t start = t * chunk_size; | ||
| std::size_t end = | ||
| (std::min)(start + chunk_size, os.size); | ||
| if (start >= os.size) | ||
| break; | ||
|
|
||
| exec.execute([start, end, &os]() mutable { | ||
| if constexpr (std::is_integral_v<Shape>) | ||
| { | ||
| for (std::size_t i = start; i < end; ++i) | ||
| { | ||
| HPX_INVOKE(os.f, static_cast<Shape>(i)); | ||
| } | ||
| } | ||
| else | ||
| { | ||
| // clang-format off | ||
| auto it = std::next( | ||
| hpx::util::begin(os.shape), start); | ||
| // clang-format off | ||
| for (std::size_t i = start; i < end; | ||
| ++i, ++it) | ||
| // clang-format on | ||
| { | ||
| HPX_INVOKE(os.f, *it); | ||
| } // clang-format on | ||
| } | ||
| if (--(*os.tasks_remaining) == 0) | ||
| { | ||
| hpx::execution::experimental::set_value( | ||
| os.receiver); | ||
| } | ||
| }); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will still execute the tasks sequentially and not concurrently.
As I said before, wouldn't we be able to simply create a type alias using parallel_scheduler = thread_pool_scheduler; instead of trying to re-implement all of it?
If there is an API difference between parallel_scheduler and the existing thread_pool_scheduler (beyond the name), or if there is a semantic difference, please highlight that. In that case a simple wrapper may still be feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hkaiser
Thanks for your feedback!. I’ve updated it to use hpx::async, hpx::future, and hpx::when_all in default_parallel_scheduler to ensure concurrent execution of bulk tasks (chunks run in parallel via HPX’s thread pool), addressing your comment about sequential execution. Single tasks also use hpx::async for lightweight scheduling.
Is it okay to rely on hpx::async, hpx::future, and hpx::when_all for task scheduling and execution, given they’re part of HPX’s lightweight threading system? Or would you recommend something else?
|
@hkaiser @isidorostsa Implement parallel_scheduler wrapping thread_pool_policy_schedulerhpx::launch, aligning with P2079R7 user-facing API:
replaceability API , bulk operations is the next thing i will keep working on. |
| #include <exception> | ||
| #include <utility> | ||
|
|
||
| // Forward declarations for execution::experimental |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are those necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, those includes are necessary <exception> provided the std::exception_ptr for error handling.
<utility> gives std::move and std::forward and also forward declarations. I used this to prevent circular dependency while allowing type usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I apologize, I meant the forward declarations declared in the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, it's not that necessary. just a declaration
| } | ||
|
|
||
| friend void tag_invoke(start_t, operation_state& op) noexcept | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm impressed that you are using tag_invoke correctly! However, P2300 has elected not to use tag_invoke anywhere, so you should make this a member function, and do the same for connect, then etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, @hkaiser, do you think we should follow along P2300 and avoid tag_invoke for future senders, or use it anyways for backwards compatibility with our stuff (not sure if there are any real issues)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, but we were using tag_invoke in thread_pool_scheduler.hpp as well so i used it here.
this framework relies on tag_invoke> for operations like schedule. if this is not how we want it, i can update this such that it is a member function.
@hkaiser any inputs on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, we need to switch to the non-tag_invoke implementation. Not sure if it's possible to do piecewise, though.
|
|
||
| #ifndef HPX_EXECUTION_EXPERIMENTAL_SENDER_T | ||
| #define HPX_EXECUTION_EXPERIMENTAL_SENDER_T | ||
| struct sender_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also please help me understand where the HPX_EXECUTION_EXPERIMENTAL_SENDER_T directive is coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the #ifndef HPX_EXECUTION_EXPERIMENTAL_SENDER_T prevents redefinition of sender_t, which marks sender types. it's a custom guard I defined to ensure sender_t is only defined once, keeping dependencies minimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an easier way to prevent redefinition is to define the tag in a centralized header that will be included in the translation units that use it. For example sender.hpp.
However, you might have observed the #if defined(HPX_HAVE_STDEXEC) clauses around HPX. We have stopped developing the sender/receiver primitives in HPX and have chosen to use the ones provided by the reference stdexec implementation provided by NVIDIA. This would include the sender_t, etc.
You can build HPX with STDEXEC on linux by defining the following build flags:
-DHPX_WITH_CXX_STANDARD=20
-DHPX_WITH_STDEXEC=ON
I apologize this was not communicated to you earlier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@isidorostsa just want to confirm if code to rely on STDEXEC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should. Eventually this will be replaced with the std native implementation
| friend void tag_invoke( | ||
| ex::set_error_t, test_receiver&& r, std::exception_ptr ep) noexcept | ||
| { | ||
| (void) ep; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks interesting, what is the reason behind it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this ensures errors (via std::exception_ptr) are passed to the receiver when a task fails. (void) ep; avoids an unused parameter warning since we mark the errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FWIW, instead of suppressing warnings by writing (void) ep; we usually use [[maybe_unused]] std::exception_ptr ep
| #endif | ||
|
|
||
| template <typename Scheduler> | ||
| struct parallel_sender; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this maybe be nested inside the scheduler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I'm not sure I understand why the scheduler needs to be templated. Would we instantiate it with anything but parallel_scheduler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm, i'm keeping sender types separate for flexibility as P2079 is still being updated and maybe reused across scheduler types.
|
@hkaiser @isidorostsa this version is using stdexec I'm not sure of the failing test case, any hint on the failing cases ? |
9803589 to
3b4121f
Compare
|
@hkaiser @isidorostsa |
Signed-off-by: Sai Charan Arvapally <[email protected]>
ba47be6 to
4e5c18c
Compare
Signed-off-by: Sai Charan Arvapally <[email protected]>
6814725 to
25416ac
Compare
|
@charan-003 please rebase your branch onto top of master to resolve the merge conflicts. |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
Implement P2079R7 parallel_scheduler in HPX Core
This issue tracks the implementation of
hpx::execution::experimental::parallel_schedulerin HPX core, aligning with P2079R7 (Parallel scheduler, Section 4.1) by wrappingthread_pool_policy_schedulerwithhpx::launchpolicies. The goal is to provide a standards-compliant parallel scheduler with global access, cancellation support, task chaining, and bulk operations.Implementation Checklist
Phase 1: Core Implementation
Implement
hpx::execution::experimental::parallel_schedulerclass per P2079R7 Section 4.1.Add
schedule()returningparallel_senderwithsender_conceptandcompletion_signatures:Supports
set_value_t(),set_stopped_t(),set_error_t(std::exception_ptr).Wrap
thread_pool_policy_schedulerwithhpx::launch(async/sync policies).Implement
noexceptmove/copy constructors and assignment operators.Add
operator==returningtruefor scheduler comparison.Return
forward_progress_guarantee::parallelviaget_forward_progress_guarantee_t.Implement
thenoperation for task chaining withparallel_sender.Code to rely on
STDEXEC.Phase 2: Bulk Support
Extend
bulk_tcustomization to delegate tothread_pool_scheduler_bulk.Implement
bulk_senderfor parallel bulk execution per P2079R7.Support
bulk_chunkedandbulk_unchunkedoperations.Phase 3: Replaceability API
Implement
std::execution::system_context_replaceabilitynamespace per P2079R7 Section 4.2.Add
query_parallel_scheduler_backend()returningshared_ptr<parallel_scheduler>.Support link-time replaceability using weak symbols.
Implement
receiverandbulk_item_receiverinterfaces for frontend-backend interaction.Ensure
storagehandling for scheduling operations.