Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP (Not ready): STYLE: Replace once_flag of ThreadPoolGlobals with static local variable #4171

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

N-Dekker
Copy link
Contributor

A static local bool variable is more lightweight than a std::once_flag data member, while its (static) initialization just as thread-safe as std::call_once.

@github-actions github-actions bot added area:Core Issues affecting the Core module type:Style Style changes: no logic impact (indentation, comments, naming) labels Aug 23, 2023
Copy link
Member

@dzenanz dzenanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, but it should be tested, preferably with Python multiprocessing via forking.

@N-Dekker N-Dekker force-pushed the Replace-ThreadPoolGlobals-ThreadPoolOnceFlag branch from 8e2709d to 2e8ad9f Compare August 23, 2023 16:00
@N-Dekker
Copy link
Contributor Author

Code looks good, but it should be tested, preferably with Python multiprocessing via forking.

Thanks @dzenanz but let's first see if both ITK.Linux.Python and ITK.macOS.Python like the PR!

@N-Dekker N-Dekker changed the title STYLE: Replace once_flag of ThreadPoolGlobals with static local variable WIP (Not ready): STYLE: Replace once_flag of ThreadPoolGlobals with static local variable Aug 23, 2023
@N-Dekker
Copy link
Contributor Author

Wow, segfaults and timeouts!

https://open.cdash.org/test/1187815471 says:

Test: PythonGetNameOfClass (Failed)
Build: Linux-Build8919-PR4171-Replace-ThreadPoolGlobals-ThreadPoolOnceFlag-Python (Azure.fv-az71-143)
...
Loading ITKVTK... done
itkTestDriver: Process exception: Segmentation fault

https://open.cdash.org/test/1187815689 says:

Test: PythonTemplateTest (Failed)
Build: Linux-Build8919-PR4171-Replace-ThreadPoolGlobals-ThreadPoolOnceFlag-Python (Azure.fv-az71-143) on 2023-08-23 16:04:40
Repository revision: 2e8ad9f
Test Details: Completed (Timeout)

https://open.cdash.org/test/1187868304 says:

Test: PythonExtrasTest (Failed)
Build: Darwin-Build8992-PR4171-Replace-ThreadPoolGlobals-ThreadPoolOnceFlag-Python (Azure.Mac-1692806138896) on 2023-08-23 16:02:47
Repository revision: 2e8ad9f
Test Details: Completed (Failed)
...
Traceback (most recent call last):
File "/Users/runner/work/1/s/Wrapping/Generators/Python/Tests/extras.py", line 328, in
itk.transformwrite(transforms, sys.argv[7], compression=True)
File "/Users/runner/work/1/s-build/Wrapping/Generators/Python/itk/support/extras.py", line 1446, in transformwrite
writer.Update()
RuntimeError: thread::join failed: Invalid argument
libc++abi: terminating with uncaught exception of type std::__1::system_error: thread::join failed: Invalid argument
itkTestDriver: Process exception: Subprocess aborted

To be continued...

@N-Dekker
Copy link
Contributor Author

Does anyone have an explanation about what goes wrong here, on both ITK.Linux.Python and ITK.macOS.Python? Is there a fundamental reason why a static local variable initialization in an ITK core CXX file might not be thread-safe, when doing a Python wrapping? And if so, why would std::call_once then be any better?

@dzenanz
Copy link
Member

dzenanz commented Aug 24, 2023

std::call_once might have some special treatment, while an ordinary local static variable does not.

@dzenanz
Copy link
Member

dzenanz commented Aug 24, 2023

Static variables get constructed at initialization time, while std::call_once gets executed at run time.

@blowekamp
Copy link
Member

Static variables get constructed at initialization time, while std::call_once gets executed at run time.

This introduces a race condition with other global static variable and their initializations. The itk::ObjectFactory depends on static initialization so it cannot be used during static initialization, so ITK objects derived from itk::Object can not be used during static initialization.

@N-Dekker
Copy link
Contributor Author

Do ITK.Linux.Python and ITK.macOS.Python possibly load the ITKCommon lib multiple times concurrently, getting multiple instances of ITKCommon at the same time?

@blowekamp
Copy link
Member

"Race condition" is not precise. It is more of a dependence on the order of the static initialization across the modules that is problematic and unspecified.

@N-Dekker
Copy link
Contributor Author

Sorry, I still don't get it! Looking at the code of this PR, the static local variable only gets initialized when ThreadPool::GetInstance() is called for the very first time. Which ensures that the lambda code only gets executed when ThreadPool::GetInstance() is called for the very first time. From C++11, static local variable initialization is officially thread-safe. So this should do exactly the same as the original std::call_once call: (It should do the same, but it does not.)

ThreadPool::GetInstance()
{
// This is called once, on-demand to ensure that m_PimplGlobals is
// initialized.
itkInitGlobalsMacro(PimplGlobals);
// Create a ThreadPool singleton as part of the initialization of a static local variable, to ensure thread-safety.
[[maybe_unused]] static const bool isCalledOnce = [] {
m_PimplGlobals->m_ThreadPoolInstance = ObjectFactory<Self>::Create();
if (m_PimplGlobals->m_ThreadPoolInstance.IsNull())
{
new ThreadPool(); // constructor sets m_PimplGlobals->m_ThreadPoolInstance
}
#if defined(ITK_USE_PTHREADS)
pthread_atfork(ThreadPool::PrepareForFork, ThreadPool::ResumeFromFork, ThreadPool::ResumeFromFork);
#endif
return true;
}();
return m_PimplGlobals->m_ThreadPoolInstance;
}

@dzenanz
Copy link
Member

dzenanz commented Aug 24, 2023

Maybe compiler does some special optimizations for local static variables, which it doesn't do for std::call_once?

@issakomi
Copy link
Member

issakomi commented Aug 24, 2023

Maybe compiler does some special optimizations for local static variables, which it doesn't do for std::call_once?

AFAIK, it is possible that compiler with eg -O3 can optimize away unused variable and doesn't run the lambda at all.

test with -O3

void f()
{
    [[maybe_unused]] static bool x = []{ return true; }();
}

int main()
{
    f();
    return 0;
}
f():
        ret
main:
        xor     eax, eax
        ret

P.S. In fact, it does not prove what exactly happens in the PR. Maybe something else.

Edit:
perhaps it could be possible to test: __attribute__((optimize(0))) before GetInstance() or some #pragma or other compiler stuff (probably bad). But even if the assumption is correct ... maybe it would be better to leave the code as it is (once_flag), the reward is not high, IMHO?

@issakomi
Copy link
Member

issakomi commented Aug 25, 2023

the static local variable only gets initialized when ThreadPool::GetInstance() is called for the very first time. Which ensures that the lambda code only gets executed when ThreadPool::GetInstance() is called for the very first time.

Small demonstration that the above statement is correct

#include <iostream>
#include <chrono>
#include <thread>

void f()
{
	static unsigned long long t0 = []()
	{
		return
			std::chrono::duration_cast<std::chrono::milliseconds>
				(std::chrono::system_clock::now().time_since_epoch()).count();
	}();
	std::cout << t0 << std::endl;
}

int main(int, char **)
{
	std::cout <<
		std::chrono::duration_cast<std::chrono::milliseconds>
			(std::chrono::system_clock::now().time_since_epoch()).count()
		<< std::endl;

	std::this_thread::sleep_for(std::chrono::milliseconds(3333));

	f();

	std::this_thread::sleep_for(std::chrono::milliseconds(1111));

	f();

	std::this_thread::sleep_for(std::chrono::milliseconds(1111));

	f();

	return 0;
}
r@deb2:~$ g++ s.cpp 
r@deb2:~$ ./a.out 
1692981873633
1692981876967
1692981876967
1692981876967

@N-Dekker
Copy link
Contributor Author

N-Dekker commented Aug 26, 2023

Thanks for having a look, @issakomi !

maybe it would be better to leave the code as it is (once_flag), the reward is not high, IMHO?

This particular PR is just a style improvement. If it works, I think a static local variable is preferable to a public once_flag member in a global (file scope) data structure and a call_once call. But only if it works, of course 😺

Moreover, I think it's worth knowing in general whether we can safely assume that static local variable initialization is thread-safe. And if so, how to use it properly!

AFAIK, it is possible that compiler with eg -O3 can optimize away unused variable and doesn't run the lambda at all.

The compiler is only allowed to do optimizations that do not change the run-time behavior. (According to the as-if rule.) So if the lambda doesn't do anything observable, it may indeed be optimized away. But if the lambda has an observable side effect, it cannot be optimized away, e.g. (https://godbolt.org/z/7PPKe3Er4):

[[maybe_unused]] static bool x = []{ std::cout << 42; return true; }();`

the value is assigned (maybe re-initialized is correct term?) at the first run of the function.

Initialization is the correct term 😃 If a static local variable is initialized by a user-defined constructor, you may see that this constructor is called only once, by putting a print statement inside the constructor.

But as both of you (@dzenanz @issakomi ) mention compiler optimization as a possible cause of the CI failures, I can still try to see what happens if optimization is suppressed by adding a volatile keyword. Please wait... (just force-pushed)

A static local `bool` variable is more _lightweight_ than a `std::once_flag`
data member, while its (static) initialization just as thread-safe as
`std::call_once`.
@N-Dekker N-Dekker force-pushed the Replace-ThreadPoolGlobals-ThreadPoolOnceFlag branch from 2e8ad9f to b1a2004 Compare August 26, 2023 10:09
@github-actions github-actions bot removed the type:Style Style changes: no logic impact (indentation, comments, naming) label Aug 26, 2023
@issakomi
Copy link
Member

issakomi commented Aug 26, 2023

But if the lambda has an observable side effect, it cannot be optimized away, e.g. (https://godbolt.org/z/7PPKe3Er4)

I run the above test with and without -O3, there is a big difference, specially related to lambda. The f() in your example is not reduced to just 'ret', but reduced to printing '42' once. Strictly, there is no lambda (s. assembler).

If it works, I think a static local variable is preferable to a public once_flag

IMHO, I still think it is a little bit a hack, to replace call_once with initialization of a unused local static variable, for the compiler it may be not clear what we want.

@issakomi
Copy link
Member

issakomi commented Aug 26, 2023

optimization is suppressed by adding a volatile keyword.

Maybe my 1st example is oversimplified, but volatile doesn't change anything

test

AFAIK, volatile prevents some optimization steps, related to read / write.

@N-Dekker
Copy link
Contributor Author

IMHO, I still think it is a little bit a hack, to replace call_once with initialization of a unused local static variable

If you really want to call call_once, I think that at least in theory, it might be preferable to keep the once_flag as a local variable, instead of having it as a public data member of globally shared data:

ThreadPool::Pointer
ThreadPool::GetInstance()
{
  itkInitGlobalsMacro(PimplGlobals);

  static std::once_flag localOnceFlag{};

  std::call_once(localOnceFlag, []() {
    m_PimplGlobals->m_ThreadPoolInstance = ObjectFactory<Self>::Create();
    // ... some more things here...
  });

  return m_PimplGlobals->m_ThreadPoolInstance;
}

Of course, it would then still depend on proper (thread-safe) support of static local variables... 🤔

@issakomi
Copy link
Member

issakomi commented Aug 27, 2023

instead of having it as a public data member of globally shared data

Maybe make it private? In fact, I am still not 100% sure that the problem is in optimization, it seems possible. But maybe something else?

Edit:
tried, it works (with friend class ThreadPool;).

@N-Dekker
Copy link
Contributor Author

tried, it works (with friend class ThreadPool;).

Could you then make all data members of PimplGlobals private? Would that compile? If so, please feel free to make it a pull request 😃

@issakomi
Copy link
Member

issakomi commented Aug 27, 2023

Could you then make all data members of PimplGlobals private? Would that compile? If so, please feel free to make it a pull request 😃

This (everything is private, except the ctor) works:

struct ThreadPoolGlobals
{
  ThreadPoolGlobals() = default;

  friend class ThreadPool;

private:

  // To lock on the various internal variables.
  std::mutex m_Mutex;

  // To allow singleton creation of ThreadPool.
  std::once_flag m_ThreadPoolOnceFlag;

  // The singleton instance of ThreadPool.
  ThreadPool::Pointer m_ThreadPoolInstance;

#if defined(_WIN32) && defined(ITKCommon_EXPORTS)
  // ThreadPool's destructor is called during DllMain's DLL_PROCESS_DETACH.
  // Because ITKCommon-5.X.dll is usually being detached due to process termination,
  // lpvReserved is non-NULL meaning that "all threads in the process
  // except the current thread either have exited already or have been
  // explicitly terminated by a call to the ExitProcess function".
  // Therefore we must not wait for the condition_variable.
  std::atomic<bool> m_WaitForThreads{ false };
#else // In a static library, we have to wait.
  std::atomic<bool> m_WaitForThreads{ true };
#endif
};

The ctor has to be public, otherwise there is the error:

In file included from /home/r/itk/ITK/Modules/Core/Common/src/itkThreadPool.cxx:25:
/home/r/itk/ITK/Modules/Core/Common/include/itkSingleton.h: In instantiation of ‘T* itk::Singleton(const char*, std::function<void(void*)>, std::function<void()>) [with T = ThreadPoolGlobals]’:
/home/r/itk/ITK/Modules/Core/Common/src/itkThreadPool.cxx:66:1:   required from here
/home/r/itk/ITK/Modules/Core/Common/include/itkSingleton.h:121:16: error: ‘constexpr itk::ThreadPoolGlobals::ThreadPoolGlobals()’ is private within this context
  121 |     instance = new T;
      |                ^~~~~
/home/r/itk/ITK/Modules/Core/Common/src/itkThreadPool.cxx:42:3: note: declared private here
   42 |   ThreadPoolGlobals() = default;

I didn't try Python wrapping and test, probably they should work, the ThreadPoolGlobals struct in in itkThreadPool.cxx file, nobody else see it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Core Issues affecting the Core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants