-
Notifications
You must be signed in to change notification settings - Fork 199
Make populating the internal symbol table thread-safe #835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
/ok to test 3fe6a3c |
This comment has been minimized.
This comment has been minimized.
/ok to test 67e543c |
cdef int err, driver_ver | ||
with gil, __symbol_lock: | ||
# Load driver to check version | ||
handle = dlopen('libcuda.so.1', RTLD_NOW | RTLD_GLOBAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR, but... this handle is used to get the driver version, which is fed into the load_library
call which doesn't use the driver version. This is likely a codegen issue, but we should probably just remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, wanted to call it out here but we can defer it to a future PR
/ok to test 095999a |
Stopped CI because they all deadlocked at the start of testing. Likely caused a lock ordering issue here that we need to triage. |
…rocAddress in the init function...
/ok to test |
@kkraus14, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
/ok to test 47c1c52 |
/ok to test 5f4125e |
/ok to test 8e79272 |
if __cuPythonInit: | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this outside of the lock if you have a bunch of threads trying to run this initially at the same time, you can end up in the situation where they all don't hit this early exit and all wait for the symbol lock and the reinitialize all of the symbols. I had done a quick and dirty local benchmark with a single thread acquiring a lock and doing nothing to understand the overhead in the early exit case and it was ~50ns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I thought about this too. Good to know it's only ~50 ns. We should fix the codegen so that we check if symbol is null to avoid re-initialization, but it can be done later.
/ok to test 1672f40 |
Merging as per #836 (comment). |
|
* protect cuPythonInit in driver * add lock for all modules * fixes * fix identation, make consistent * relocate setting __cuPythonInit to avoid deadlock since we use cuGetProcAddress in the init function... * move init check inside lock * make cuPythonInit reentrant + ensure GIL is released when calling underlying C APIs * fix indentation --------- Co-authored-by: Keith Kraus <[email protected]>
Description
TODO
[ ] runtimeChecklist