Skip to content

Make populating the internal symbol table thread-safe #835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 18, 2025

Conversation

leofang
Copy link
Member

@leofang leofang commented Aug 13, 2025

Description

TODO

  • driver
  • [ ] runtime
  • nvrtc
  • nvjitlink
  • nvvm
  • cufile
  • backport the changes to internal codegen
  • backport the changes to 12.9.x

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
Contributor

copy-pr-bot bot commented Aug 13, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@leofang

This comment was marked as outdated.

@leofang

This comment was marked as outdated.

@leofang
Copy link
Member Author

leofang commented Aug 13, 2025

/ok to test 3fe6a3c

This comment has been minimized.

@leofang leofang requested a review from kkraus14 August 13, 2025 14:55
@leofang leofang added this to the cuda-python parking lot milestone Aug 13, 2025
@leofang leofang added enhancement Any code-related improvements triage Needs the team's attention cuda.bindings Everything related to the cuda.bindings module labels Aug 13, 2025
@leofang leofang changed the title WIP: Make internal symbol table thread-safe WIP: Make populating the internal symbol table thread-safe Aug 13, 2025
@kkraus14
Copy link
Collaborator

/ok to test 67e543c

cdef int err, driver_ver
with gil, __symbol_lock:
# Load driver to check version
handle = dlopen('libcuda.so.1', RTLD_NOW | RTLD_GLOBAL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but... this handle is used to get the driver version, which is fed into the load_library call which doesn't use the driver version. This is likely a codegen issue, but we should probably just remove this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, wanted to call it out here but we can defer it to a future PR

@kkraus14
Copy link
Collaborator

/ok to test 095999a

@kkraus14
Copy link
Collaborator

Stopped CI because they all deadlocked at the start of testing. Likely caused a lock ordering issue here that we need to triage.

@kkraus14
Copy link
Collaborator

/ok to test

Copy link
Contributor

copy-pr-bot bot commented Aug 14, 2025

/ok to test

@kkraus14, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@kkraus14
Copy link
Collaborator

/ok to test 47c1c52

@kkraus14 kkraus14 changed the title WIP: Make populating the internal symbol table thread-safe Make populating the internal symbol table thread-safe Aug 14, 2025
@kkraus14
Copy link
Collaborator

/ok to test 5f4125e

@leofang leofang added P0 High priority - Must do! and removed triage Needs the team's attention labels Aug 15, 2025
@leofang
Copy link
Member Author

leofang commented Aug 15, 2025

/ok to test 8e79272

Comment on lines 495 to 496
if __cuPythonInit:
return 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this outside of the lock if you have a bunch of threads trying to run this initially at the same time, you can end up in the situation where they all don't hit this early exit and all wait for the symbol lock and the reinitialize all of the symbols. I had done a quick and dirty local benchmark with a single thread acquiring a lock and doing nothing to understand the overhead in the early exit case and it was ~50ns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I thought about this too. Good to know it's only ~50 ns. We should fix the codegen so that we check if symbol is null to avoid re-initialization, but it can be done later.

kkraus14
kkraus14 previously approved these changes Aug 15, 2025
@github-project-automation github-project-automation bot moved this from Todo to In Review in CCCL Aug 15, 2025
@leofang
Copy link
Member Author

leofang commented Aug 15, 2025

/ok to test 1672f40

@leofang
Copy link
Member Author

leofang commented Aug 18, 2025

Merging as per #836 (comment).

@leofang leofang merged commit db85867 into NVIDIA:main Aug 18, 2025
48 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Aug 18, 2025
@leofang leofang deleted the symbol_lock branch August 18, 2025 11:29
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

@leofang leofang linked an issue Aug 18, 2025 that may be closed by this pull request
1 task
Andy-Jost pushed a commit to Andy-Jost/cuda-python that referenced this pull request Aug 18, 2025
* protect cuPythonInit in driver

* add lock for all modules

* fixes

* fix identation, make consistent

* relocate setting __cuPythonInit to avoid deadlock since we use cuGetProcAddress in the init function...

* move init check inside lock

* make cuPythonInit reentrant + ensure GIL is released when calling underlying C APIs

* fix indentation

---------

Co-authored-by: Keith Kraus <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.bindings Everything related to the cuda.bindings module enhancement Any code-related improvements P0 High priority - Must do!
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[BUG]: The symbol table requires protection for thread-safety
2 participants