-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: remove threadpool and use boost asio thread_pool #9768
base: master
Are you sure you want to change the base?
src: remove threadpool and use boost asio thread_pool #9768
Conversation
Do you know why the boost thread_pool wasn't used originally? Is there a benefit / downside to using the boost version or does this just simplify the codebase? |
boost asio thread_pool support in monerod is relatively new. It has been added to boost in 1.66 (2017) [1], which we supported pretty recently (I believe few weeks ago).
Boost thread_pool is extremely battle tested. Removing extra complexity from our code base, and using a high performance / battle tested library to handle our core threading is a big plus imho. |
Pinging @hyc for feedback since he added the original thread_pool. |
9144bf0
to
a369d77
Compare
The threadpool code I wrote is pretty simple and low overhead. I would request some performance measurements to show that switching to boost offers any benefit before making any changes. What problems have seen with the existing code? In what way is it not already battle tested after so many years of use? |
Fails to build on Debian 10 and Ubuntu 20.04. Would prefer not replacing known-good implementations with third-party libraries. |
@@ -487,17 +494,17 @@ bool load_txt_records_from_dns(std::vector<std::string> &good_records, const std | |||
|
|||
// send all requests in parallel | |||
std::deque<bool> avail(dns_urls.size(), false), valid(dns_urls.size(), false); | |||
tools::threadpool& tpool = tools::threadpool::getInstanceForIO(); | |||
tools::threadpool::waiter waiter(tpool); | |||
int threads = std::thread::hardware_concurrency(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change (and similar ones elsewhere) doesn't just alter the backing implementation, but the semantics, because you end up with many thread pools (one per usage site), rather than just two (one for IO, and one for compute). The prior pattern makes more sense in the vast majority of cases (arguably there should be a pool for disk IO, in addition to the existing IO pool, which appears to be used for network IO).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each thread_pool, will be destroyed when goes out of scope.
So your objection factually is not correct about “we will end up with multiple thread polls”.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So your objection factually is not correct about “we will end up with multiple thread polls”.
Yes, it is: Call site one creates pool, enqueues work. So does call site 2, And 3. Now you have three pools; and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lifetime of the pools doesn't matter; if anything, one could argue a shorter lifetime makes the problem worse. My objection is purely that this model in no way matches underlying resources, which is the whole reason for pooling in the first place.
Wrong! No it does not fail. It is because second update I did. Which will be fixed in few seconds. You can see here the build is successful: https://github.com/monero-project/monero/actions/runs/13138855698 |
Respectfully I disagree. We need to revamp parallelism mechanism, at it is broken. (Cuprate, an alpha project is beating us ). About requesting numbers, that is fair ask. I will try update this with numbers. |
You're linking to the Boost 1.66.0 |
I would overall agree with hyc's sentiment: Building, and running faster, make this proposal viable. |
Wrong again. Please check the runs before claiming that. You could just look at other action to get your answer, as you can see clearly building without any issue: https://github.com/monero-project/monero/actions/runs/13135976164 The wait method is experiment to see what we can get away. The default one is and was join. |
Well, you can't "get away" with using wait because it doesn't build. If you don't want developers to comment on your unfinished PR then put [WIP] in the title. Or run your experiment on a separate branch. |
No. It is not about comment. It is about insisting that it does not work, while it works. I believe if you looked at the link at first time I sent you the link, there was no dispute. |
I think these number should be very interesting for anyone who is willing to take a look at this. Look at the starvation issue monero thread pools suffers from (max/min). Patch included, you can run this test with this command:
Patch:
Boost provides 1-3% improvement, although I would still say boost worth it due the simplification it provides. But the TBB [1] is deal breaker. |
Work-stealing queues certainly have their merit. That said, I don't like this benchmark:
|
Please investigate the benchmarks more deeply. The purpose of that benchmark was not to prove tbb js faster ( which it is ), the purpose was to prove our thread pool suffers from severe task starvation. So no need for numbers or std dev across 1 minute run. You haven’t paid attention about the exact reason what those numbers prove. |
The "numbers" are meaningless without the experimental setup, and, per my comment, which remains unaddressed, throughput is the relevant metric, not fairness, and the numbers you posted (insufficient as they are), do not even suggest a meaningful gap in throughput. |
No they are not. The purpose is not benchmarking. The purpose is showing / proving starvation.
Wrong, this shows you are not familiar with the internal of our thread pool. Our thread pool has a notion of leaf job. Which basically puts it on the forward of running queue. Basically you can starve a task. Fairnes is extremely important for us. Since we want our task to finish in roughly relative order.
Wrong. You haven’t understood the numbers. Pay attention to max to min ratio. |
If you want to make the case that queue responsiveness for leaf jobs heavily impacts overall daemon performance, this benchmark doesn't do that job, nor any job, as it's missing a basic requisite of any scientific experiment, the experimental setup. Furthermore, if you want to measure latency, this benchmark plainly doesn't, and it imposes an asymmetric constraint (1/3 of tasks leaf tasks, vs. no such requirement), plus additional compute burden (an extra divide), on the existing queue implementation. |
Again, wrong. Repeatedly stating that doesn’t make it a fact. I explained it in detail in my previous comment. That sole purpose is to prove the variance in running time of same task. Please read our discussion from beginning again to comprehend the objectives. |
No description provided.