-
-
Notifications
You must be signed in to change notification settings - Fork 243
fix(shutdown): Prevent race condition when GlobalObject destruction routine unlocks global mutex #8652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix(shutdown): Prevent race condition when GlobalObject destruction routine unlocks global mutex #8652
Conversation
…ine unlocks global mutex Unlocking global mutex in GlobalObject destruction routine made it possible for a new attachment to slip in, so it will be creating new GlobalObject and using it, while destroying routine still in action. This can lead to an undefined state of the global objects, such as shared memory, where one thread is actively using it while another thread is destroying it.
Found issue with current fix using bool: If we set |
Looks like instead of using global flag, common for all databases, we need per-database flag. Raw idea: remove When concurrent creator founds that constant, it should wait for no-entry in hash table before attempt to create new instance. Instead of fixed constant, consider to use some sync object (mutex? special instance of GlobalObjectHolder ?) that could be used to wait for, instead of poll + sleep in a loop. |
Looks like as a pattern for mutex + condition variable that waits while g_shuttingDown is true and is notified when it's false. |
…routine for GlobalObjectHolder
I reimplemented the fix by adding a mutex to |
} | ||
// Now we are the one who owned DbId object. | ||
// It also was removed from hash table, so simply delete it and recreate it next. | ||
fb_assert(entry->getRefCount() == 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are many concurrent initializers, then this assert could be violated.
I think it is not needed.
Instead, you may nullify entry->holder
at ~GlobalObjectHolder()
and check it for nullptr here.
// Stole the object from the hash table without incrementing ref counter, so we will be the one who will delete the object | ||
// at the end of this function. | ||
RefPtr<Database::GlobalObjectHolder::DbId> entry(REF_NO_INCR, g_hashTable->lookup(m_id)); | ||
fb_assert(entry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add also fb_assert(entry->holder == this)
?
@@ -616,7 +649,8 @@ namespace Jrd | |||
m_eventMgr = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On enter g_mutex
should be locked.
At line 639 the shutdownMutex
is locked.
So we have order of mutexes: g_mutex
then shutdownMutex
.
At line 643 g_mutex
is unlocked, while shutdownMutex
still locked.
At line 646 g_mutex
will be locked again and we have inverted order of mutexes: shutdownMutex
then g_mutex
.
This is a way for deadlock.
Am I wrong ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I can see every instance of DbId
is linked to a specific GlobalObjectHolder
, so shutdownMutex
is a different object every time the ~GlobalObjectHolder()
is called. Therefore, a deadlock can only occur if the desctructor will be called twice on the same GlobalObjectHolder
, which is not possible, so we are safe here.
shutdownMutex
can be locked by another thread in GlobalObjectHolder::init
, but only when g_mutex
is unlocked. Therefore, a deadlock is also not possible here too.
@@ -616,7 +649,8 @@ namespace Jrd | |||
m_eventMgr = nullptr; | |||
m_replMgr = nullptr; | |||
|
|||
delete entry; | |||
if (!g_hashTable->remove(m_id)) | |||
fb_assert(false); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add entry->holder = nullptr
?
Unlocking global mutex in GlobalObject destruction routine made it possible for a new attachment to slip in, so it will create new GlobalObject and use it, while destroying routine still in action. This can lead to an undefined state of the global objects, such as shared memory, where one thread is actively using it while another thread is destroying it.
v5 is also affected.
Examples of this race condition:
Example 1 - Deadlock.
Thread 1 is holding
sh_mem_mutex
(atJrd::LockManager::~LockManager
), and waiting for flock oninitFile
;Thread 2 is holding flock on
initFile
(atFirebird::SharedMemoryBase::SharedMemoryBase
), and waiting forsh_mem_mutex
;Trace
Example 2 - Crash.
Thread 1 - New attachment trying to use deleted shared file for
LockManager
.Thread 2 - Complete
JRD_shutdown_database
routine, clear GlobalObject, and leave without any trace...Trace