Make populating the internal symbol table thread-safe #835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

leofang merged 8 commits into NVIDIA:main from leofang:symbol_lock

Aug 18, 2025

Member

leofang commented Aug 13, 2025 •

edited by kkraus14

Loading

Description

TODO

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

github-project-automation bot added this to CCCL

github-project-automation bot moved this to Todo in CCCL

Contributor

copy-pr-bot bot commented Aug 13, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

This comment was marked as outdated.

Sign in to view

This comment was marked as outdated.

Sign in to view

leofang force-pushed the symbol_lock branch from 29be4b1 to 9031970 Compare

August 13, 2025 14:05


          protect cuPythonInit in driver

3fe6a3c

leofang force-pushed the symbol_lock branch from 9031970 to 3fe6a3c Compare

August 13, 2025 14:31

Member Author

leofang commented Aug 13, 2025

/ok to test 3fe6a3c

This comment has been minimized.

Sign in to view

leofang requested a review from kkraus14

August 13, 2025 14:55

leofang added this to the cuda-python parking lot milestone

leofang added enhancement triage cuda.bindings labels

leofang changed the title ~~WIP: Make internal symbol table thread-safe~~ WIP: Make populating the internal symbol table thread-safe

leofang mentioned this pull request

[Backport] Make populating the internal symbol table thread-safe #836

Merged

2 tasks

pentschev mentioned this pull request

RuntimeError: Function "cuCtxGetCurrent" not found rapidsai/rapidsmpf#434

Closed


          add lock for all modules

67e543c

Collaborator

kkraus14 commented Aug 13, 2025

/ok to test 67e543c

kkraus14 added 2 commits

August 13, 2025 15:37


          fixes

cca86c0


          fix identation, make consistent

095999a

kkraus14 reviewed

View reviewed changes

cuda_bindings/cuda/bindings/_internal/cufile_linux.pyx

+                  cdef int err, driver_ver
+                  with gil, __symbol_lock:
+                      # Load driver to check version
+                      handle = dlopen('libcuda.so.1', RTLD_NOW | RTLD_GLOBAL)

Collaborator

kkraus14 Aug 13, 2025

Unrelated to this PR, but... this handle is used to get the driver version, which is fed into the load_library call which doesn't use the driver version. This is likely a codegen issue, but we should probably just remove this?

Collaborator

kkraus14 Aug 13, 2025

Either way, wanted to call it out here but we can defer it to a future PR

Collaborator

kkraus14 commented Aug 13, 2025

/ok to test 095999a

Collaborator

kkraus14 commented Aug 13, 2025

Stopped CI because they all deadlocked at the start of testing. Likely caused a lock ordering issue here that we need to triage.


          relocate setting __cuPythonInit to avoid deadlock since we use cuGetP…

47c1c52

…rocAddress in the init function...

Collaborator

kkraus14 commented Aug 14, 2025

/ok to test

Contributor

copy-pr-bot bot commented Aug 14, 2025

/ok to test

@kkraus14, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

Collaborator

kkraus14 commented Aug 14, 2025

/ok to test 47c1c52

kkraus14 reviewed

View reviewed changes

cuda_bindings/cuda/bindings/_bindings/cydriver.pyx.in Outdated Show resolved Hide resolved


          move init check inside lock

5f4125e

kkraus14 changed the title ~~WIP: Make populating the internal symbol table thread-safe~~ Make populating the internal symbol table thread-safe

Collaborator

kkraus14 commented Aug 14, 2025

/ok to test 5f4125e

leofang assigned kkraus14 and leofang

leofang added P0 and removed triage labels

leofang commented

View reviewed changes

cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in Outdated Show resolved Hide resolved

cuda_bindings/cuda/bindings/_bindings/cydriver.pyx.in Show resolved Hide resolved

cuda_bindings/cuda/bindings/_bindings/cynvrtc.pyx.in Show resolved Hide resolved

cuda_bindings/cuda/bindings/_bindings/cydriver.pyx.in Outdated Show resolved Hide resolved


          make cuPythonInit reentrant + ensure GIL is released when calling und…

8e79272

…erlying C APIs

Member Author

leofang commented Aug 15, 2025

/ok to test 8e79272

kkraus14 reviewed

View reviewed changes

cuda_bindings/cuda/bindings/_bindings/cydriver.pyx.in Outdated

Comment on lines 495 to 496

    
                  if __cuPythonInit:

                      return 0

Collaborator

kkraus14 Aug 15, 2025

With this outside of the lock if you have a bunch of threads trying to run this initially at the same time, you can end up in the situation where they all don't hit this early exit and all wait for the symbol lock and the reinitialize all of the symbols. I had done a quick and dirty local benchmark with a single thread acquiring a lock and doing nothing to understand the overhead in the early exit case and it was ~50ns.

Member Author

leofang Aug 15, 2025

Yeah I thought about this too. Good to know it's only ~50 ns. We should fix the codegen so that we check if symbol is null to avoid re-initialization, but it can be done later.

kkraus14 previously approved these changes

View reviewed changes

github-project-automation bot moved this from Todo to In Review in CCCL

kkraus14 reviewed

View reviewed changes

cuda_bindings/cuda/bindings/_bindings/cydriver.pyx.in Outdated Show resolved Hide resolved


          fix indentation

1672f40

leofang dismissed kkraus14’s stale review via

1672f40

August 15, 2025 17:40

Member Author

leofang commented Aug 15, 2025

/ok to test 1672f40

kkraus14 approved these changes

View reviewed changes

leofang modified the milestones: cuda-python parking lot, cuda-python 13.0.1 & 12.9.2

leofang mentioned this pull request

Bump version and add release notes #841

Merged

2 tasks

Member Author

leofang commented Aug 18, 2025

Merging as per #836 (comment).

leofang merged commit db85867 into NVIDIA:main

48 checks passed

github-project-automation bot moved this from In Review to Done in CCCL

leofang deleted the symbol_lock branch

August 18, 2025 11:29

github-actions bot commented Aug 18, 2025

Doc Preview CI
Preview removed because the pull request was closed or merged.

pentschev mentioned this pull request

[BUG]: The symbol table requires protection for thread-safety #852

Closed

1 task

leofang linked an issue

that may be closed by this pull request

[BUG]: The symbol table requires protection for thread-safety #852

Closed

1 task

Andy-Jost pushed a commit to Andy-Jost/cuda-python that referenced this pull request


          Make populating the internal symbol table thread-safe (NVIDIA#835)

bfae18b

* protect cuPythonInit in driver

* add lock for all modules

* fixes

* fix identation, make consistent

* relocate setting __cuPythonInit to avoid deadlock since we use cuGetProcAddress in the init function...

* move init check inside lock

* make cuPythonInit reentrant + ensure GIL is released when calling underlying C APIs

* fix indentation

---------

Co-authored-by: Keith Kraus <[email protected]>

leofang mentioned this pull request

Improve #789: Remove cyclical dependency between cuda.bindings.{driver|runtime} and c.b.utils #840

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings enhancement P0