-
-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP (Not ready): STYLE: Replace once_flag of ThreadPoolGlobals with static local variable #4171
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, but it should be tested, preferably with Python multiprocessing via forking.
8e2709d
to
2e8ad9f
Compare
Thanks @dzenanz but let's first see if both ITK.Linux.Python and ITK.macOS.Python like the PR! |
Wow, segfaults and timeouts! https://open.cdash.org/test/1187815471 says:
https://open.cdash.org/test/1187815689 says:
https://open.cdash.org/test/1187868304 says:
To be continued... |
Does anyone have an explanation about what goes wrong here, on both ITK.Linux.Python and ITK.macOS.Python? Is there a fundamental reason why a static local variable initialization in an ITK core CXX file might not be thread-safe, when doing a Python wrapping? And if so, why would |
|
Static variables get constructed at initialization time, while |
This introduces a race condition with other global static variable and their initializations. The |
Do ITK.Linux.Python and ITK.macOS.Python possibly load the ITKCommon lib multiple times concurrently, getting multiple instances of ITKCommon at the same time? |
"Race condition" is not precise. It is more of a dependence on the order of the static initialization across the modules that is problematic and unspecified. |
Sorry, I still don't get it! Looking at the code of this PR, the static local variable only gets initialized when ThreadPool::GetInstance() is called for the very first time. Which ensures that the lambda code only gets executed when ThreadPool::GetInstance() is called for the very first time. From C++11, static local variable initialization is officially thread-safe. So this should do exactly the same as the original ITK/Modules/Core/Common/src/itkThreadPool.cxx Lines 69 to 89 in 2e8ad9f
|
Maybe compiler does some special optimizations for local static variables, which it doesn't do for |
AFAIK, it is possible that compiler with eg
P.S. In fact, it does not prove what exactly happens in the PR. Maybe something else. Edit: |
Small demonstration that the above statement is correct #include <iostream>
#include <chrono>
#include <thread>
void f()
{
static unsigned long long t0 = []()
{
return
std::chrono::duration_cast<std::chrono::milliseconds>
(std::chrono::system_clock::now().time_since_epoch()).count();
}();
std::cout << t0 << std::endl;
}
int main(int, char **)
{
std::cout <<
std::chrono::duration_cast<std::chrono::milliseconds>
(std::chrono::system_clock::now().time_since_epoch()).count()
<< std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(3333));
f();
std::this_thread::sleep_for(std::chrono::milliseconds(1111));
f();
std::this_thread::sleep_for(std::chrono::milliseconds(1111));
f();
return 0;
}
|
Thanks for having a look, @issakomi !
This particular PR is just a style improvement. If it works, I think a static local variable is preferable to a public once_flag member in a global (file scope) data structure and a Moreover, I think it's worth knowing in general whether we can safely assume that static local variable initialization is thread-safe. And if so, how to use it properly!
The compiler is only allowed to do optimizations that do not change the run-time behavior. (According to the as-if rule.) So if the lambda doesn't do anything observable, it may indeed be optimized away. But if the lambda has an observable side effect, it cannot be optimized away, e.g. (https://godbolt.org/z/7PPKe3Er4): [[maybe_unused]] static bool x = []{ std::cout << 42; return true; }();`
Initialization is the correct term 😃 If a static local variable is initialized by a user-defined constructor, you may see that this constructor is called only once, by putting a print statement inside the constructor. But as both of you (@dzenanz @issakomi ) mention compiler optimization as a possible cause of the CI failures, I can still try to see what happens if optimization is suppressed by adding a |
A static local `bool` variable is more _lightweight_ than a `std::once_flag` data member, while its (static) initialization just as thread-safe as `std::call_once`.
2e8ad9f
to
b1a2004
Compare
I run the above test with and without -O3, there is a big difference, specially related to lambda. The f() in your example is not reduced to just 'ret', but reduced to printing '42' once. Strictly, there is no lambda (s. assembler).
IMHO, I still think it is a little bit a hack, to replace |
Maybe my 1st example is oversimplified, but AFAIK, |
If you really want to call ThreadPool::Pointer
ThreadPool::GetInstance()
{
itkInitGlobalsMacro(PimplGlobals);
static std::once_flag localOnceFlag{};
std::call_once(localOnceFlag, []() {
m_PimplGlobals->m_ThreadPoolInstance = ObjectFactory<Self>::Create();
// ... some more things here...
});
return m_PimplGlobals->m_ThreadPoolInstance;
} Of course, it would then still depend on proper (thread-safe) support of static local variables... 🤔 |
Maybe make it private? In fact, I am still not 100% sure that the problem is in optimization, it seems possible. But maybe something else? Edit: |
Could you then make all data members of PimplGlobals private? Would that compile? If so, please feel free to make it a pull request 😃 |
This (everything is private, except the ctor) works: struct ThreadPoolGlobals
{
ThreadPoolGlobals() = default;
friend class ThreadPool;
private:
// To lock on the various internal variables.
std::mutex m_Mutex;
// To allow singleton creation of ThreadPool.
std::once_flag m_ThreadPoolOnceFlag;
// The singleton instance of ThreadPool.
ThreadPool::Pointer m_ThreadPoolInstance;
#if defined(_WIN32) && defined(ITKCommon_EXPORTS)
// ThreadPool's destructor is called during DllMain's DLL_PROCESS_DETACH.
// Because ITKCommon-5.X.dll is usually being detached due to process termination,
// lpvReserved is non-NULL meaning that "all threads in the process
// except the current thread either have exited already or have been
// explicitly terminated by a call to the ExitProcess function".
// Therefore we must not wait for the condition_variable.
std::atomic<bool> m_WaitForThreads{ false };
#else // In a static library, we have to wait.
std::atomic<bool> m_WaitForThreads{ true };
#endif
}; The ctor has to be public, otherwise there is the error:
I didn't try Python wrapping and test, probably they should work, the |
A static local
bool
variable is more lightweight than astd::once_flag
data member, while its (static) initialization just as thread-safe asstd::call_once
.