Skip to content

fix(shutdown): Prevent race condition when GlobalObject destruction routine unlocks global mutex #8652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

TreeHunter9
Copy link
Contributor

Unlocking global mutex in GlobalObject destruction routine made it possible for a new attachment to slip in, so it will create new GlobalObject and use it, while destroying routine still in action. This can lead to an undefined state of the global objects, such as shared memory, where one thread is actively using it while another thread is destroying it.

v5 is also affected.

Examples of this race condition:

Example 1 - Deadlock.
Thread 1 is holding sh_mem_mutex (at Jrd::LockManager::~LockManager), and waiting for flock on initFile;
Thread 2 is holding flock on initFile (at Firebird::SharedMemoryBase::SharedMemoryBase), and waiting for sh_mem_mutex;

Trace
thread #1
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7ffff65f76b8) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x7ffff65f76b8) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7ffff65f76b8, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007ffff7693a41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ffff65f7708, cond=0x7ffff65f7690) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x7ffff65f7690, mutex=0x7ffff65f7708) at ./nptl/pthread_cond_wait.c:627
#5  0x00007ffff57e6651 in Firebird::Condition::wait (this=0x7ffff65f7690, m=...) at /src/common/../common/classes/condition.h:192
#6  0x00007ffff5e96ac1 in Firebird::SharedFileInfo::lock (this=0x7ffff65f7680, shared=false, wait=true, init=0x0) at /src/common/isc_sync.cpp:359
#7  0x00007ffff5e91075 in Firebird::FileLock::setlock (this=0x7ffff7849930, mode=Firebird::FileLock::FLM_EXCLUSIVE) at /src/common/isc_sync.cpp:508
#8  0x00007ffff5e910b6 in Firebird::FileLock::setlock (this=0x7ffff7849930, status=0x7ffff15fc8a0, mode=Firebird::FileLock::FLM_EXCLUSIVE) at /src/common/isc_sync.cpp:517
#9  0x00007ffff5e90c7a in (anonymous namespace)::FileLockHolder::FileLockHolder (this=0x7ffff15fc9d0, l=0x7ffff7849930) at /src/common/isc_sync.cpp:178
#10 0x00007ffff5e91cd5 in Firebird::SharedMemoryBase::removeMapFile (this=0x7ffff4e6b650) at /src/common/isc_sync.cpp:1138
#11 0x00007ffff5d9bac2 in Jrd::LockManager::~LockManager (this=0x7ffff319b820, __in_chrg=<optimized out>) at /src/lock/lock.cpp:247
#12 0x00007ffff57f3dad in Firebird::SimpleDelete<Jrd::LockManager>::clear (ptr=0x7ffff319b820) at /src/include/../common/classes/auto.h:46
#13 0x00007ffff57f351f in Firebird::AutoPtr<Jrd::LockManager, Firebird::SimpleDelete>::operator= (this=0x7ffff784fde0, v=0x0) at /src/include/../common/classes/auto.h:122
#14 0x00007ffff57f0ee5 in Jrd::Database::GlobalObjectHolder::~GlobalObjectHolder (this=0x7ffff784fd80, __in_chrg=<optimized out>) at /src/jrd/Database.cpp:634
#15 0x00007ffff57f1010 in Jrd::Database::GlobalObjectHolder::~GlobalObjectHolder (this=0x7ffff784fd80, __in_chrg=<optimized out>) at /src/jrd/Database.cpp:641
#16 0x00007ffff57a162b in Firebird::RefCounted::release (this=0x7ffff784fd80) at /src/include/../common/classes/RefCounted.h:47
#17 0x00007ffff57f0ade in Jrd::Database::GlobalObjectHolder::release (this=0x7ffff784fd80) at /src/jrd/Database.cpp:580
#18 0x00007ffff57f2d43 in Firebird::RefPtr<Jrd::Database::GlobalObjectHolder>::~RefPtr (this=0x7ffff297f1f8, __in_chrg=<optimized out>) at /src/include/../common/classes/RefCounted.h:140
#19 0x00007ffff57ef49d in Jrd::Database::~Database (this=0x7ffff297ecd0, __in_chrg=<optimized out>) at /src/jrd/Database.cpp:185
#20 0x00007ffff5a2b735 in Jrd::Database::destroy (toDelete=0x7ffff297ecd0) at /src/jrd/../jrd/../jrd/Database.h:422
#21 0x00007ffff5a2252c in JRD_shutdown_database (dbb=0x7ffff297ecd0, flags=3) at /src/jrd/jrd.cpp:8274
#22 0x00007ffff5a23793 in purge_attachment (tdbb=0x7ffff15fd518, sAtt=0x7ffff319b040, flags=2) at /src/jrd/jrd.cpp:8678
#23 0x00007ffff5a0fd9f in Jrd::JAttachment::freeEngineData (this=0x7ffff319b250, user_status=0x7ffff15fd6d0, forceFree=false) at /src/jrd/jrd.cpp:3455
#24 0x00007ffff5a0fb50 in Jrd::JAttachment::internalDetach (this=0x7ffff319b250, user_status=0x7ffff15fd6d0) at /src/jrd/jrd.cpp:3392
#25 0x00007ffff5a0fba7 in Jrd::JAttachment::detach (this=0x7ffff319b250, user_status=0x7ffff15fd6d0) at /src/jrd/jrd.cpp:3404
#26 0x00007ffff5a3aad4 in Firebird::IAttachmentBaseImpl<Jrd::JAttachment, Firebird::CheckStatusWrapper, Firebird::IReferenceCountedImpl<Jrd::JAttachment, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IVersionedImpl<Jrd::JAttachment, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IAttachment> > > > >::cloopdetachDispatcher (self=0x7ffff319b258, status=0x7ffff15fd908) at /src/include/firebird/IdlFbInterfaces.h:12325
#27 0x00007ffff7af228b in Firebird::IAttachment::detach<Firebird::CheckStatusWrapper> (this=0x7ffff319b258, status=0x7ffff15fd900) at /src/include/firebird/IdlFbInterfaces.h:2846
#28 0x00007ffff7ad6d4e in operator() (__closure=0x7ffff15fd880) at /src/yvalve/why.cpp:6081
#29 0x00007ffff7ae37f6 in std::__invoke_impl<void, Why::YAttachment::detach(Firebird::CheckStatusWrapper*)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#30 0x00007ffff7ae1b4a in std::__invoke_r<void, Why::YAttachment::detach(Firebird::CheckStatusWrapper*)::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/11/bits/invoke.h:111
#31 0x00007ffff7adf5a5 in std::_Function_handler<void(), Why::YAttachment::detach(Firebird::CheckStatusWrapper*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/11/bits/std_function.h:290
#32 0x00007ffff7af7068 in std::function<void ()>::operator()() const (this=0x7ffff15fd880) at /usr/include/c++/11/bits/std_function.h:590
#33 0x00007ffff7af2366 in Why::done<Why::YAttachment>(Firebird::CheckStatusWrapper*, Why::YEntry<Why::YAttachment>&, Why::YAttachment*, std::function<void ()>, std::function<void ()>) (status=0x7ffff15fd900, entry=..., y=0x7ffff7e74450, newClose=..., oldClose=...) at /src/yvalve/why.cpp:1360
#34 0x00007ffff7ad6ebd in Why::YAttachment::detach (this=0x7ffff7e74450, status=0x7ffff15fd900) at /src/yvalve/why.cpp:6080
#35 0x00007ffff7b1a5e8 in Firebird::IAttachmentBaseImpl<Why::YAttachment, Firebird::CheckStatusWrapper, Firebird::IReferenceCountedImpl<Why::YAttachment, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IVersionedImpl<Why::YAttachment, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IAttachment> > > > >::cloopdetachDispatcher (self=0x7ffff7e74458, status=0x7ffff15fd9a8) at /src/include/firebird/IdlFbInterfaces.h:12325
#36 0x00005555555df102 in Firebird::IAttachment::detach<Firebird::CheckStatusWrapper> (this=0x7ffff7e74458, status=0x7ffff15fd9a0) at /src/include/firebird/IdlFbInterfaces.h:2846
#37 0x00005555555c7fb4 in rem_port::end_database (this=0x7ffff7e54e50, sendL=0x7ffff2d9f068) at /src/remote/server/server.cpp:3274
#38 0x00005555555cf3f3 in process_packet (port=0x7ffff7e54e50, sendL=0x7ffff2d9f068, receive=0x7ffff2d9f640, result=0x7ffff15fdc58) at /src/remote/server/server.cpp:5187
#39 0x00005555555d578a in loopThread () at /src/remote/server/server.cpp:6987
#40 0x00005555555fe5ec in (anonymous namespace)::ThreadArgs::run (this=0x7ffff15fdd90) at /src/common/ThreadStart.cpp:78
#41 0x00005555555fe6bc in (anonymous namespace)::threadStart (arg=0x7ffff7e653d0) at /src/common/ThreadStart.cpp:94

thread #2
#0  futex_wait (private=128, expected=2, futex_word=0x7ffff65eb010) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x7ffff65eb010, private=128) at ./nptl/lowlevellock.c:49
#2  0x00007ffff7698002 in lll_mutex_lock_optimized (mutex=0x7ffff65eb010) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7ffff65eb010) at ./nptl/pthread_mutex_lock.c:93
#4  0x00007ffff5e92f5a in Firebird::SharedMemoryBase::mutexLock (this=0x7ffff4eef950) at /src/common/isc_sync.cpp:2754
#5  0x00007ffff5d9e17f in Jrd::LockManager::acquire_shmem (this=0x7ffff319fbc0, owner_offset=78584) at /src/lock/lock.cpp:1067
#6  0x00007ffff5da6f04 in Jrd::LockManager::LockTableGuard::LockTableGuard (this=0x7fffe26c4110, lm=0x7ffff319fbc0, f=0x7ffff615a6d7 "enqueue", owner=78584) at /src/lock/../lock/lock_proto.h:310
#7  0x00007ffff5d9c2cf in Jrd::LockManager::enqueue (this=0x7ffff319fbc0, tdbb=0x7fffe26c6f78, statusVector=0x7fffe26c4340, prior_request=0, series=25, value=0x7ffff0038ad8 "", length=8, type=3 '\003', ast_routine=0x7ffff5aa5e5c <Jrd::TipCache::tpc_block_blocking_ast(void*)>, ast_argument=0x7ffff0038a50, data=0, lck_wait=1, owner_offset=78584) at /src/lock/lock.cpp:468
#8  0x00007ffff5a440d6 in enqueue (tdbb=0x7fffe26c6f78, statusVector=0x7fffe26c4340, lock=0x7ffff0038a60, level=3, wait=1) at /src/jrd/lck.cpp:948
#9  0x00007ffff5a4590d in ENQUEUE (tdbb=0x7fffe26c6f78, statusVector=0x7fffe26c4340, lock=0x7ffff0038a60, level=3, wait=1) at /src/jrd/lck.cpp:149
#10 0x00007ffff5a43671 in LCK_lock (tdbb=0x7fffe26c6f78, lock=0x7ffff0038a60, level=3, wait=1) at /src/jrd/lck.cpp:675
#11 0x00007ffff5aa49b8 in Jrd::TipCache::StatusBlockData::StatusBlockData (this=0x7ffff0038a50, tdbb=0x7fffe26c6f78, tipCache=0x7ffff0038740, blockSize=4194304, blkNumber=0) at /src/jrd/tpc.cpp:387
#12 0x00007ffff5aa51eb in Jrd::TipCache::createTransactionStatusBlock (this=0x7ffff0038740, blockSize=4194304, blockNumber=0) at /src/jrd/tpc.cpp:500
#13 0x00007ffff5aa45f7 in Jrd::TipCache::loadInventoryPages (this=0x7ffff0038740, tdbb=0x7fffe26c6f78, header=0x7ffff4e24000) at /src/jrd/tpc.cpp:340
#14 0x00007ffff5aa333d in Jrd::TipCache::GlobalTpcInitializer::initialize (this=0x7ffff0038760, sm=0x7fffe2c86b50, initFlag=true) at /src/jrd/tpc.cpp:80
#15 0x00007ffff5e9267d in Firebird::SharedMemoryBase::SharedMemoryBase (this=0x7fffe2c86b50, filename=0x7ffff00389a0 "fb_tpc_0203010000000000f2552c0000000000", length=136, callback=0x7ffff0038760, skipLock=false) at /src/common/isc_sync.cpp:1383
#16 0x00007ffff5aa8587 in Firebird::SharedMemory<Jrd::TipCache::GlobalTpcHeader>::SharedMemory (this=0x7fffe2c86b50, fileName=0x7ffff00389a0 "fb_tpc_0203010000000000f2552c0000000000", size=136, cb=0x7ffff0038760, skipLock=false) at /src/jrd/../jrd/../jrd/../jrd/../lock/../common/isc_s_proto.h:344
#17 0x00007ffff5aa3f60 in Jrd::TipCache::initializeTpc (this=0x7ffff0038740, tdbb=0x7fffe26c6f78) at /src/jrd/tpc.cpp:251
#18 0x00007ffff5a2c27d in Jrd::TipCache::create (tdbb=0x7fffe26c6f78) at /src/jrd/../jrd/tpc_proto.h:64
#19 0x00007ffff5a096a9 in Jrd::JProvider::internalAttach (this=0x7ffff319ec80, user_status=0x7fffe26c8010, filename=0x7ffff2d9caf0 "employee", dpb_length=366, dpb=0x7ffff2d9c950 "\001J>/gen/Debug/firebird/bin/isqlP'LI-T6.0.0.1036-dev Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM", existingId=0x0) at /src/jrd/jrd.cpp:1917
#20 0x00007ffff5a0892f in Jrd::JProvider::attachDatabase (this=0x7ffff319ec80, user_status=0x7fffe26c8010, filename=0x7ffff2d9caf0 "employee", dpb_length=366, dpb=0x7ffff2d9c950 "\001J>/gen/Debug/firebird/bin/isqlP'LI-T6.0.0.1036-dev Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM") at /src/jrd/jrd.cpp:1665
#21 0x00007ffff57ec9b7 in Firebird::IProviderBaseImpl<Jrd::JProvider, Firebird::CheckStatusWrapper, Firebird::IPluginBaseImpl<Jrd::JProvider, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IReferenceCountedImpl<Jrd::JProvider, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IVersionedImpl<Jrd::JProvider, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IProvider> > > > > > >::cloopattachDatabaseDispatcher (self=0x7ffff319ec88, status=0x7fffe26c8668, fileName=0x7ffff2d9caf0 "employee", dpbLength=366, dpb=0x7ffff2d9c950 "\001J>/gen/Debug/firebird/bin/isqlP'LI-T6.0.0.1036-dev Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM") at /src/include/firebird/IdlFbInterfaces.h:12652
#22 0x00007ffff7af39c9 in Firebird::IProvider::attachDatabase<Firebird::CheckStatusWrapper> (this=0x7ffff319ec88, status=0x7fffe26c8660, fileName=0x7ffff2d9caf0 "employee", dpbLength=366, dpb=0x7ffff2d9c950 "\001J>/gen/Debug/firebird/bin/isqlP'LI-T6.0.0.1036-dev Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM") at /src/include/firebird/IdlFbInterfaces.h:3033
#23 0x00007ffff7ad9538 in Why::Dispatcher::attachOrCreateDatabase (this=0x7ffff2d9c7e0, status=0x7fffe26c8660, createFlag=false, filename=0x7ffff7e6fb6c "employee", dpbLength=366, dpb=0x7ffff7e6fbf0 "\001J>/gen/Debug/firebird/bin/isqlP'LI-T6.0.0.1036-dev Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM") at /src/yvalve/why.cpp:6579
#24 0x00007ffff7ad8fa4 in Why::Dispatcher::attachDatabase (this=0x7ffff2d9c7e0, status=0x7fffe26c8660, filename=0x7ffff7e6fb6c "employee", dpbLength=366, dpb=0x7ffff7e6fbf0 "\001J>/gen/Debug/firebird/bin/isqlP'LI-T6.0.0.1036-dev Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM") at /src/yvalve/why.cpp:6489
#25 0x00007ffff7a81c7f in Firebird::IProviderBaseImpl<Why::Dispatcher, Firebird::CheckStatusWrapper, Firebird::IPluginBaseImpl<Why::Dispatcher, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IReferenceCountedImpl<Why::Dispatcher, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IVersionedImpl<Why::Dispatcher, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IProvider> > > > > > >::cloopattachDatabaseDispatcher (self=0x7ffff2d9c7e8, status=0x7fffe26c8758, fileName=0x7ffff7e6fb6c "employee", dpbLength=366, dpb=0x7ffff7e6fbf0 "\001J>/gen/Debug/firebird/bin/isqlP'LI-T6.0.0.1036-dev Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM") at /src/include/firebird/IdlFbInterfaces.h:12652
#26 0x00005555555bd4af in Firebird::IProvider::attachDatabase<Firebird::CheckStatusWrapper> (this=0x7ffff2d9c7e8, status=0x7fffe26c8750, fileName=0x7ffff7e6fb6c "employee", dpbLength=366, dpb=0x7ffff7e6fbf0 "\001J>/gen/Debug/firebird/bin/isqlP'LI-T6.0.0.1036-dev Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM") at /src/include/firebird/IdlFbInterfaces.h:3033
#27 0x00005555555c59ea in (anonymous namespace)::DatabaseAuth::accept (this=0x7ffff7e6faf0, send=0x7ffff2da8568, authBlock=0x7ffff2da95a8) at /src/remote/server/server.cpp:2602
#28 0x00005555555c0199 in (anonymous namespace)::ServerAuth::authenticate (this=0x7ffff7e6faf0, send=0x7ffff2da8568, flags=0) at /src/remote/server/server.cpp:676
#29 0x00005555555c5582 in attach_database (port=0x7ffff2da3ed0, operation=op_attach, attach=0x7ffff2da8ca8, send=0x7ffff2da8568) at /src/remote/server/server.cpp:2539
#30 0x00005555555cf0dc in process_packet (port=0x7ffff2da3ed0, sendL=0x7ffff2da8568, receive=0x7ffff2da8b40, result=0x7fffe26c8c58) at /src/remote/server/server.cpp:5106
#31 0x00005555555d578a in loopThread () at /src/remote/server/server.cpp:6987
#32 0x00005555555fe5ec in (anonymous namespace)::ThreadArgs::run (this=0x7fffe26c8d90) at /src/common/ThreadStart.cpp:78
#33 0x00005555555fe6bc in (anonymous namespace)::threadStart (arg=0x7ffff7e6cb80) at /src/common/ThreadStart.cpp:94

Example 2 - Crash.
Thread 1 - New attachment trying to use deleted shared file for LockManager.
Thread 2 - Complete JRD_shutdown_database routine, clear GlobalObject, and leave without any trace...

Trace
thread #1
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140736995522112) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140736995522112) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140736995522112, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7642476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff76287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff5e4a2b7 in fb_utils::logAndDie (text=0x7fffe29fa290 "Fatal lock manager error: Process disappeared in LockManager::acquire_shmem, errno: 1\n--Operation not permitted") at /src/common/utils.cpp:1452
#6  0x00007ffff5d3971b in Jrd::LockManager::bug (this=0x7ffff4ed7b60, statusVector=0x0, string=0x7ffff60970e0 "Process disappeared in LockManager::acquire_shmem") at /src/lock/lock.cpp:1643
#7  0x00007ffff5d37de5 in Jrd::LockManager::acquire_shmem (this=0x7ffff4ed7b60, owner_offset=78584) at /src/lock/lock.cpp:1075
#8  0x00007ffff5d3f272 in Jrd::LockManager::LockTableGuard::LockTableGuard (this=0x7fffe29fc4e0, lm=0x7ffff4ed7b60, f=0x7ffff6097080 "enqueue", owner=78584) at /src/lock/../lock/lock_proto.h:310
#9  0x00007ffff5d361d7 in Jrd::LockManager::enqueue (this=0x7ffff4ed7b60, tdbb=0x7fffe29fcf98, statusVector=0x7fffe29fc710, prior_request=0, series=29, value=0x7ffff7889f78 "", length=0, type=6 '\006', ast_routine=0x7ffff57cdb68 <Jrd::CryptoManager::blockingAstChangeCryptState(void*)>, ast_argument=0x7ffff54e6e40, data=0, lck_wait=0, owner_offset=78584) at /src/lock/lock.cpp:468
#10 0x00007ffff5a0978d in enqueue (tdbb=0x7fffe29fcf98, statusVector=0x7fffe29fc710, lock=0x7ffff7889f00, level=6, wait=0) at /src/jrd/lck.cpp:948
#11 0x00007ffff5a0a9cf in ENQUEUE (tdbb=0x7fffe29fcf98, statusVector=0x7fffe29fc710, lock=0x7ffff7889f00, level=6, wait=0) at /src/jrd/lck.cpp:149
#12 0x00007ffff5a09069 in LCK_lock (tdbb=0x7fffe29fcf98, lock=0x7ffff7889f00, level=6, wait=0) at /src/jrd/lck.cpp:675
#13 0x00007ffff57c9174 in Jrd::CryptoManager::lockAndReadHeader (this=0x7ffff54e6e40, tdbb=0x7fffe29fcf98, flags=1) at /src/jrd/CryptoManager.cpp:379
#14 0x00007ffff57cbc31 in Jrd::CryptoManager::attach (this=0x7ffff54e6e40, tdbb=0x7fffe29fcf98, att=0x7ffff65ae040) at /src/jrd/CryptoManager.cpp:892
#15 0x00007ffff59d1283 in Jrd::JProvider::internalAttach (this=0x7ffff4ed54c0, user_status=0x7fffe29fe030, filename=0x7ffff7f95e40 "employee", dpb_length=362, dpb=0x7ffff7f9c1c0 "\001J>/gen/Debug/firebird/bin/isqlP#LI-T6.0.0.1036 Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM", existingId=0x0) at /src/jrd/jrd.cpp:1909
#16 0x00007ffff59d05cf in Jrd::JProvider::attachDatabase (this=0x7ffff4ed54c0, user_status=0x7fffe29fe030, filename=0x7ffff7f95e40 "employee", dpb_length=362, dpb=0x7ffff7f9c1c0 "\001J>/gen/Debug/firebird/bin/isqlP#LI-T6.0.0.1036 Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM") at /src/jrd/jrd.cpp:1665
#17 0x00007ffff57d48f7 in Firebird::IProviderBaseImpl<Jrd::JProvider, Firebird::CheckStatusWrapper, Firebird::IPluginBaseImpl<Jrd::JProvider, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IReferenceCountedImpl<Jrd::JProvider, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IVersionedImpl<Jrd::JProvider, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IProvider> > > > > > >::cloopattachDatabaseDispatcher (self=0x7ffff4ed54c8, status=0x7fffe29fe688, fileName=0x7ffff7f95e40 "employee", dpbLength=362, dpb=0x7ffff7f9c1c0 "\001J>/gen/Debug/firebird/bin/isqlP#LI-T6.0.0.1036 Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM") at /src/include/firebird/IdlFbInterfaces.h:12652
#18 0x00007ffff7aee01d in Firebird::IProvider::attachDatabase<Firebird::CheckStatusWrapper> (this=0x7ffff4ed54c8, status=0x7fffe29fe680, fileName=0x7ffff7f95e40 "employee", dpbLength=362, dpb=0x7ffff7f9c1c0 "\001J>/gen/Debug/firebird/bin/isqlP#LI-T6.0.0.1036 Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM") at /src/include/firebird/IdlFbInterfaces.h:3033
#19 0x00007ffff7ad4e1a in Why::Dispatcher::attachOrCreateDatabase (this=0x7ffff7f9ca40, status=0x7fffe29fe680, createFlag=false, filename=0x7ffff7e68afc "employee", dpbLength=362, dpb=0x7ffff7e66b90 "\001J>/gen/Debug/firebird/bin/isqlP#LI-T6.0.0.1036 Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM") at /src/yvalve/why.cpp:6579
#20 0x00007ffff7ad4886 in Why::Dispatcher::attachDatabase (this=0x7ffff7f9ca40, status=0x7fffe29fe680, filename=0x7ffff7e68afc "employee", dpbLength=362, dpb=0x7ffff7e66b90 "\001J>/gen/Debug/firebird/bin/isqlP#LI-T6.0.0.1036 Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM") at /src/yvalve/why.cpp:6489
#21 0x00007ffff7a80693 in Firebird::IProviderBaseImpl<Why::Dispatcher, Firebird::CheckStatusWrapper, Firebird::IPluginBaseImpl<Why::Dispatcher, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IReferenceCountedImpl<Why::Dispatcher, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IVersionedImpl<Why::Dispatcher, Firebird::CheckStatusWrapper, Firebird::Inherit<Firebird::IProvider> > > > > > >::cloopattachDatabaseDispatcher (self=0x7ffff7f9ca48, status=0x7fffe29fe778, fileName=0x7ffff7e68afc "employee", dpbLength=362, dpb=0x7ffff7e66b90 "\001J>/gen/Debug/firebird/bin/isqlP#LI-T6.0.0.1036 Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM") at /src/include/firebird/IdlFbInterfaces.h:12652
#22 0x00005555555ba779 in Firebird::IProvider::attachDatabase<Firebird::CheckStatusWrapper> (this=0x7ffff7f9ca48, status=0x7fffe29fe770, fileName=0x7ffff7e68afc "employee", dpbLength=362, dpb=0x7ffff7e66b90 "\001J>/gen/Debug/firebird/bin/isqlP#LI-T6.0.0.1036 Firebird 6.0 Initial>9/gen/Debug/firebird/binR\006treepcS\ntreehunterM") at /src/include/firebird/IdlFbInterfaces.h:3033
#23 0x00005555555c27fa in (anonymous namespace)::DatabaseAuth::accept (this=0x7ffff7e68a80, send=0x7ffff65a7358, authBlock=0x7ffff65a8398) at /src/remote/server/server.cpp:2602
#24 0x00005555555bd3a5 in (anonymous namespace)::ServerAuth::authenticate (this=0x7ffff7e68a80, send=0x7ffff65a7358, flags=0) at /src/remote/server/server.cpp:676
#25 0x00005555555c23ec in attach_database (port=0x7ffff65a2cc0, operation=op_attach, attach=0x7ffff65a7a98, send=0x7ffff65a7358) at /src/remote/server/server.cpp:2539
#26 0x00005555555cba76 in process_packet (port=0x7ffff65a2cc0, sendL=0x7ffff65a7358, receive=0x7ffff65a7930, result=0x7fffe29fec78) at /src/remote/server/server.cpp:5106
#27 0x00005555555d1ea4 in loopThread () at /src/remote/server/server.cpp:6987
#28 0x00005555555f89c6 in (anonymous namespace)::ThreadArgs::run (this=0x7fffe29fed90) at /src/common/ThreadStart.cpp:78
#29 0x00005555555f8a5d in (anonymous namespace)::threadStart (arg=0x7ffff7e67490) at /src/common/ThreadStart.cpp:94
#30 0x00007ffff7694ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#31 0x00007ffff7726850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

thread #2
He complete `JRD_shutdown_database` routine, clear GlobalObject, and leave without any trace... 

…ine unlocks global mutex

Unlocking global mutex in GlobalObject destruction routine made it possible for a new attachment to slip in, so it will be creating new GlobalObject and using it, while destroying routine still in action. This can lead to an undefined state of the global objects, such as shared memory, where one thread is actively using it while another thread is destroying it.
@TreeHunter9 TreeHunter9 marked this pull request as draft July 17, 2025 14:19
@TreeHunter9
Copy link
Contributor Author

Found issue with current fix using bool:
Thread 1: Disconnecting from db1, AutoSetRestore save false as old value and set g_shuttingDown to true;
Thread 2: Disconnecting from db2, AutoSetRestore save true as old value and set g_shuttingDown to true;
Thread 1: Restore old value, setting g_shuttingDown to false;
Thread 2: Restore old value, setting g_shuttingDown to true;
And g_shuttingDown is set to true forever, no one can connect to any database.

If we set g_shuttingDown manually to true on acquire and false on release, we can see situation where Thread 1 sets g_shuttingDown to false, but Thread 2 is not done with shutdown, but g_shuttingDown is false and we got same race condition.

@hvlad
Copy link
Member

hvlad commented Jul 17, 2025

Looks like instead of using global flag, common for all databases, we need per-database flag.

Raw idea: remove GlobalObjectHolder instance from g_hashTable in two stages: first replace it by some special fixed constant, then delete instance and finally remove entry from hash table.

When concurrent creator founds that constant, it should wait for no-entry in hash table before attempt to create new instance.

Instead of fixed constant, consider to use some sync object (mutex? special instance of GlobalObjectHolder ?) that could be used to wait for, instead of poll + sleep in a loop.

@asfernandes
Copy link
Member

asfernandes commented Jul 18, 2025

Instead of fixed constant, consider to use some sync object (mutex? special instance of GlobalObjectHolder ?) that could be used to wait for, instead of poll + sleep in a loop.

Looks like as a pattern for mutex + condition variable that waits while g_shuttingDown is true and is notified when it's false.

@TreeHunter9
Copy link
Contributor Author

I reimplemented the fix by adding a mutex to DbId, so now there is no global flag as Vlad suggested.
I couldn't find any issues with the new implementation, but maybe I missed something. I added some comments in code to describe underlying logic.

@TreeHunter9 TreeHunter9 marked this pull request as ready for review July 18, 2025 11:29
}
// Now we are the one who owned DbId object.
// It also was removed from hash table, so simply delete it and recreate it next.
fb_assert(entry->getRefCount() == 1);
Copy link
Member

@hvlad hvlad Jul 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are many concurrent initializers, then this assert could be violated.
I think it is not needed.
Instead, you may nullify entry->holder at ~GlobalObjectHolder() and check it for nullptr here.

// Stole the object from the hash table without incrementing ref counter, so we will be the one who will delete the object
// at the end of this function.
RefPtr<Database::GlobalObjectHolder::DbId> entry(REF_NO_INCR, g_hashTable->lookup(m_id));
fb_assert(entry);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add also fb_assert(entry->holder == this) ?

@@ -616,7 +649,8 @@ namespace Jrd
m_eventMgr = nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On enter g_mutex should be locked.
At line 639 the shutdownMutex is locked.
So we have order of mutexes: g_mutex then shutdownMutex.

At line 643 g_mutex is unlocked, while shutdownMutex still locked.
At line 646 g_mutex will be locked again and we have inverted order of mutexes: shutdownMutex then g_mutex.
This is a way for deadlock.

Am I wrong ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I can see every instance of DbId is linked to a specific GlobalObjectHolder, so shutdownMutex is a different object every time the ~GlobalObjectHolder() is called. Therefore, a deadlock can only occur if the desctructor will be called twice on the same GlobalObjectHolder, which is not possible, so we are safe here.
shutdownMutex can be locked by another thread in GlobalObjectHolder::init, but only when g_mutex is unlocked. Therefore, a deadlock is also not possible here too.

@@ -616,7 +649,8 @@ namespace Jrd
m_eventMgr = nullptr;
m_replMgr = nullptr;

delete entry;
if (!g_hashTable->remove(m_id))
fb_assert(false);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add entry->holder = nullptr ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants