-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Fix macOS and Windows shared mem wait for detach #7321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix macOS and Windows shared mem wait for detach #7321
Conversation
TheMarex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, some smaller questions. Thanks for breaking it out!
include/storage/shared_memory.hpp
Outdated
| throw util::exception("shmctl encountered an error: " + errorToMessage(error_code) + | ||
| SOURCE_REF); | ||
| ::shmid_ds xsi_ds; | ||
| if (::shmctl(xsi.get_shmid(), IPC_STAT, &xsi_ds) < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I read this correctly, in the case of an error we swallow that and just return? Previously it would have thrown an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we are expecting an error here. The error being that the region is gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But can we check for a precise error code in that case? There may be other reasons this call errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we got there, the only plausible error that still can occur is: "EIDRM shmid points to a removed identifier". And that is exactly what we are waiting for to happen. I can test for this error and throw if any other error happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO it makes it a bit more clear that we are waiting for something specific to happen.
include/storage/shared_memory.hpp
Outdated
| #else | ||
| // Windows - specific code | ||
|
|
||
| // POSIX shared mem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember exactly why we used XSI over POSIX. Maybe issues with OSX at some point? To be honest we were originally planning to drop the whole hot-swapping thing in favor of mmap. The hotswap/shmem was conceived almost 15 years ago when the typical deployment process was "run it on a big server", not rolling deployments behind a load balancer.
Would have been nice for the 6.0 to clean this up but alas.
d738492 to
8bc6e77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request aims to fix an issue where osrm-datastore on macOS and Windows did not properly wait for other processes to detach from shared memory before exiting, causing test failures and performance issues. The PR refactors the shared memory implementation by:
Changes:
- Extracting shared memory logic from headers into a new
src/storage/shared_memory.cppimplementation file - Implementing a
WaitForDetachfunction for all platforms (Linux/macOS/Windows) - Renaming
shm_keytoproj_idthroughout the codebase for clarity - Adding proper wait logic when removing old shared memory regions in
storage.cpp
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/storage/shared_memory.cpp |
New implementation file containing SharedMemory class and platform-specific helper functions |
include/storage/shared_memory.hpp |
Refactored to be a cleaner header with forward declarations instead of full implementation |
src/storage/storage.cpp |
Updated to call WaitForDetach when removing old memory regions; renamed shm_key to proj_id |
include/storage/shared_datatype.hpp |
Introduced ProjID type alias and renamed shm_key to proj_id throughout |
src/tools/store.cpp |
Updated to use proj_id and new function signatures |
src/engine/datafacade/shared_memory_allocator.cpp |
Updated to use proj_id and new API |
include/engine/datafacade/shared_memory_allocator.hpp |
Updated function signature to use ProjID |
include/engine/data_watchdog.hpp |
Updated to use proj_id and reordered log statement |
include/storage/shared_monitor.hpp |
Added missing includes and minor parentheses fix |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * @param key A ProjID | ||
| * @return bool Returns true if the region exists | ||
| */ | ||
| bool RegionExists(const ProjID proj_id); | ||
| /** | ||
| * @brief Destroys the shared memory region | ||
| * | ||
| * @param key A valid ProjID | ||
| * @return bool returns false on error. | ||
| */ | ||
| bool Remove(const ProjID proj_id); | ||
| /** | ||
| * @brief Waits for all processes to detach from the shared memory region | ||
| * | ||
| * @param key A ProjID |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @param documentation is inconsistent. The parameter name is "proj_id" but the documentation says "key A valid ProjID". It should say "@param proj_id A valid ProjID" to match the actual parameter name.
| * @param key A ProjID | |
| * @return bool Returns true if the region exists | |
| */ | |
| bool RegionExists(const ProjID proj_id); | |
| /** | |
| * @brief Destroys the shared memory region | |
| * | |
| * @param key A valid ProjID | |
| * @return bool returns false on error. | |
| */ | |
| bool Remove(const ProjID proj_id); | |
| /** | |
| * @brief Waits for all processes to detach from the shared memory region | |
| * | |
| * @param key A ProjID | |
| * @param proj_id A ProjID | |
| * @return bool Returns true if the region exists | |
| */ | |
| bool RegionExists(const ProjID proj_id); | |
| /** | |
| * @brief Destroys the shared memory region | |
| * | |
| * @param proj_id A valid ProjID | |
| * @return bool returns false on error. | |
| */ | |
| bool Remove(const ProjID proj_id); | |
| /** | |
| * @brief Waits for all processes to detach from the shared memory region | |
| * | |
| * @param proj_id A ProjID |
|
|
||
| // The second parameter passed to ftok(). See: man 3 ftok | ||
| // It should actually be an int, but for compatibility with earlier versions of OSRM it | ||
| // is an uint16. It should't matter since, according to the man page, only the lowest 8 |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in comment: "should't" should be "shouldn't".
| // is an uint16. It should't matter since, according to the man page, only the lowest 8 | |
| // is an uint16. It shouldn't matter since, according to the man page, only the lowest 8 |
| #include "storage/shared_datatype.hpp" | ||
| #include <boost/interprocess/shared_memory_object.hpp> | ||
| #include <cstring> | ||
| #include <string> | ||
| #include <thread> |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing necessary includes. The file uses util::Log, util::exception, and SOURCE_REF but does not include "util/log.hpp", "util/exception.hpp", or "util/exception_utils.hpp". It also uses std::ofstream (line 203) without including fstream, and std::filesystem without including filesystem (though these might come from shared_datatype.hpp).
| #include "storage/shared_datatype.hpp" | |
| #include <boost/interprocess/shared_memory_object.hpp> | |
| #include <cstring> | |
| #include <string> | |
| #include <thread> | |
| #include "storage/shared_datatype.hpp" | |
| #include "util/log.hpp" | |
| #include "util/exception.hpp" | |
| #include "util/exception_utils.hpp" | |
| #include <boost/interprocess/shared_memory_object.hpp> | |
| #include <cstring> | |
| #include <string> | |
| #include <thread> | |
| #include <fstream> | |
| #include <filesystem> |
| * Returns the contents of the environment variable SHM_LOCK_DIR if set else the system | ||
| * temp directory. | ||
| * | ||
| * @return std::filesystem::path The directory to usew for lock files |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation has a typo: "usew" should be "use".
| * @return std::filesystem::path The directory to usew for lock files | |
| * @return std::filesystem::path The directory to use for lock files |
| * @param key A ProjID | ||
| * @return bool Returns true if the region exists | ||
| */ | ||
| bool RegionExists(const ProjID proj_id); | ||
| /** | ||
| * @brief Destroys the shared memory region | ||
| * | ||
| * @param key A valid ProjID | ||
| * @return bool returns false on error. | ||
| */ | ||
| bool Remove(const ProjID proj_id); | ||
| /** | ||
| * @brief Waits for all processes to detach from the shared memory region | ||
| * | ||
| * @param key A ProjID |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @param documentation is inconsistent. The parameter name is "proj_id" but the documentation says "key A ProjID". It should say "@param proj_id A ProjID" to match the actual parameter name.
| * @param key A ProjID | |
| * @return bool Returns true if the region exists | |
| */ | |
| bool RegionExists(const ProjID proj_id); | |
| /** | |
| * @brief Destroys the shared memory region | |
| * | |
| * @param key A valid ProjID | |
| * @return bool returns false on error. | |
| */ | |
| bool Remove(const ProjID proj_id); | |
| /** | |
| * @brief Waits for all processes to detach from the shared memory region | |
| * | |
| * @param key A ProjID | |
| * @param proj_id A ProjID | |
| * @return bool Returns true if the region exists | |
| */ | |
| bool RegionExists(const ProjID proj_id); | |
| /** | |
| * @brief Destroys the shared memory region | |
| * | |
| * @param proj_id A valid ProjID | |
| * @return bool returns false on error. | |
| */ | |
| bool Remove(const ProjID proj_id); | |
| /** | |
| * @brief Waits for all processes to detach from the shared memory region | |
| * | |
| * @param proj_id A ProjID |
| * @param key A ProjID | ||
| * @return bool Returns true if the region exists | ||
| */ | ||
| bool RegionExists(const ProjID proj_id); | ||
| /** | ||
| * @brief Destroys the shared memory region | ||
| * | ||
| * @param key A valid ProjID | ||
| * @return bool returns false on error. | ||
| */ | ||
| bool Remove(const ProjID proj_id); | ||
| /** | ||
| * @brief Waits for all processes to detach from the shared memory region | ||
| * | ||
| * @param key A ProjID |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @param documentation is inconsistent. The parameter name is "proj_id" but the documentation says "key A ProjID". It should say "@param proj_id A ProjID" to match the actual parameter name.
| * @param key A ProjID | |
| * @return bool Returns true if the region exists | |
| */ | |
| bool RegionExists(const ProjID proj_id); | |
| /** | |
| * @brief Destroys the shared memory region | |
| * | |
| * @param key A valid ProjID | |
| * @return bool returns false on error. | |
| */ | |
| bool Remove(const ProjID proj_id); | |
| /** | |
| * @brief Waits for all processes to detach from the shared memory region | |
| * | |
| * @param key A ProjID | |
| * @param proj_id A ProjID | |
| * @return bool Returns true if the region exists | |
| */ | |
| bool RegionExists(const ProjID proj_id); | |
| /** | |
| * @brief Destroys the shared memory region | |
| * | |
| * @param proj_id A valid ProjID | |
| * @return bool returns false on error. | |
| */ | |
| bool Remove(const ProjID proj_id); | |
| /** | |
| * @brief Waits for all processes to detach from the shared memory region | |
| * | |
| * @param proj_id A ProjID |
| { | ||
| return blocks.find(name) != blocks.end(); | ||
| } | ||
| inline bool HasBlock(const std::string &name) const { return blocks.contains(name); } |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from blocks.find(name) != blocks.end() to blocks.contains(name) requires C++20. Ensure that the project's minimum C++ standard is set to C++20 or this will cause compilation errors on older standards.
| inline bool HasBlock(const std::string &name) const { return blocks.contains(name); } | |
| inline bool HasBlock(const std::string &name) const | |
| { | |
| return blocks.find(name) != blocks.end(); | |
| } |
| try | ||
| { | ||
| while (true) | ||
| { | ||
| shared_memory_object shm_obj(open_only, name, read_write); | ||
| if (--timeout < 0) | ||
| break; | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(1)); | ||
| } | ||
| } | ||
| catch (interprocess_exception &) | ||
| { | ||
| // already detached | ||
| } |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WaitForDetach implementation for Windows/macOS has a fundamental issue. After calling shared_memory_object::remove(), the shared memory name is immediately unlinked from the filesystem. Any subsequent attempt to open it with open_only will fail, even if processes are still attached to the underlying memory. This means the while loop at line 174-180 will immediately throw an exception and exit, not actually waiting for processes to detach. This defeats the purpose of the PR, which is to wait for clients to detach before proceeding. A proper solution would need to either: (1) not call Remove before WaitForDetach, or (2) use a different mechanism to detect when processes have detached (though POSIX shared memory doesn't provide this capability).
| try | |
| { | |
| while (true) | |
| { | |
| shared_memory_object shm_obj(open_only, name, read_write); | |
| if (--timeout < 0) | |
| break; | |
| std::this_thread::sleep_for(std::chrono::milliseconds(1)); | |
| } | |
| } | |
| catch (interprocess_exception &) | |
| { | |
| // already detached | |
| } | |
| // Best-effort wait: as long as we can open the shared memory object by name, | |
| // assume that clients may still be attached and wait in 1ms increments. | |
| // If we can no longer open it (e.g. because it has been removed), we still | |
| // wait out the remaining timeout before returning. | |
| while (timeout-- >= 0) | |
| { | |
| try | |
| { | |
| shared_memory_object shm_obj(open_only, name, read_write); | |
| } | |
| catch (interprocess_exception &) | |
| { | |
| // Shared memory object cannot be opened anymore (e.g. already removed). | |
| // We cannot reliably detect attached processes, but to honour the | |
| // requested timeout we wait for the remaining duration before returning. | |
| if (timeout > 0) | |
| { | |
| std::this_thread::sleep_for(std::chrono::milliseconds(timeout)); | |
| } | |
| return; | |
| } | |
| std::this_thread::sleep_for(std::chrono::milliseconds(1)); | |
| } |
Issue
Fixes #7320
On macOS and Windows
osrm-datastorenow correctly waits for other processes to detach from shared memory before exiting. Previous behaviour:osrm-datastoreslept for 50ms and then boldly announced: 'All clients switched.'This caused
Tasklist
Requirements / Relations
Related to PR #7309