Skip to content

Commit d916308

Browse files
authored
Merge branch 'main' into ipc_suppressed_errors
2 parents bbdbbcd + 84f5aae commit d916308

27 files changed

+292
-65
lines changed

cuda_bindings/DESCRIPTION.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,10 @@
55
cuda-bindings: Low-level CUDA interfaces
66
****************************************
77

8+
.. image:: https://img.shields.io/badge/NVIDIA-black?logo=nvidia
9+
:target: https://www.nvidia.com/
10+
:alt: NVIDIA
11+
812
`cuda.bindings <https://nvidia.github.io/cuda-python/cuda-bindings/>`_ is a standard set of low-level interfaces, providing full coverage of and 1:1 access to the CUDA host APIs from Python. Checkout the `Overview <https://nvidia.github.io/cuda-python/cuda-bindings/latest/overview.html>`_ for the workflow and performance results.
913

1014
* `Repository <https://github.com/NVIDIA/cuda-python/tree/main/cuda_bindings>`_

cuda_bindings/docs/source/install.rst

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Runtime Requirements
1010
``cuda.bindings`` supports the same platforms as CUDA. Runtime dependencies are:
1111

1212
* Linux (x86-64, arm64) and Windows (x86-64)
13-
* Python 3.9 - 3.13
13+
* Python 3.9 - 3.14
1414
* Driver: Linux (580.65.06 or later) Windows (580.88 or later)
1515
* Optionally, NVRTC, nvJitLink, NVVM, and cuFile from CUDA Toolkit 13.x
1616

@@ -20,6 +20,21 @@ Runtime Requirements
2020

2121
Starting from v12.8.0, ``cuda-python`` becomes a meta package which currently depends only on ``cuda-bindings``; in the future more sub-packages will be added to ``cuda-python``. In the instructions below, we still use ``cuda-python`` as example to serve existing users, but everything is applicable to ``cuda-bindings`` as well.
2222

23+
24+
Free-threading Build Support
25+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
26+
27+
As of cuda-bindings 13.0.2 and 12.9.3, **experimental** packages for the `free-threaded interpreter`_ are shipped.
28+
29+
1. Support for these builds is best effort, due to heavy use of `built-in
30+
modules that are known to be thread-unsafe`_, such as ``ctypes``.
31+
2. For now, you are responsible for making sure that calls into the ``cuda-bindings``
32+
library are thread-safe. This is subject to change.
33+
34+
.. _built-in modules that are known to be thread-unsafe: https://github.com/python/cpython/issues/116738
35+
.. _free-threaded interpreter: https://docs.python.org/3/howto/free-threading-python.html
36+
37+
2338
Installing from PyPI
2439
--------------------
2540

cuda_bindings/docs/source/release.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ Release Notes
77
.. toctree::
88
:maxdepth: 3
99

10-
13.X.Y <release/13.X.Y-notes.rst>
10+
13.0.2 <release/13.0.2-notes.rst>
1111
13.0.1 <release/13.0.1-notes.rst>
1212
13.0.0 <release/13.0.0-notes.rst>
13-
12.9.X <release/12.9.X-notes.rst>
13+
12.9.3 <release/12.9.3-notes.rst>
1414
12.9.2 <release/12.9.2-notes.rst>
1515
12.9.1 <release/12.9.1-notes.rst>
1616
12.9.0 <release/12.9.0-notes.rst>

cuda_bindings/docs/source/release/12.9.3-notes.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,24 @@
33
44
.. module:: cuda.bindings
55

6-
``cuda-bindings`` 12.9.X Release notes
6+
``cuda-bindings`` 12.9.3 Release notes
77
======================================
88

9-
Released on TBD
9+
Released on Oct 9, 2025
1010

1111

1212
Highlights
1313
----------
1414

15+
* This is the last release that officially supports Python 3.9.
16+
* Python 3.14 is supported.
17+
* **Experimental** free-threaded builds for Python 3.13/3.14 are made available. Any bugs can be reported to `our GitHub repo <https://github.com/NVIDIA/cuda-python>`_. More details are available in our :ref:`support` docs.
1518
* Automatic CUDA library path detection based on ``CUDA_HOME``, eliminating the need to manually set ``LIBRARY_PATH`` environment variables for installation.
1619
* The Python overhead of calling functions in CUDA bindings in ``driver``, ``runtime`` and ``nvrtc`` has been reduced by approximately 30%.
20+
* On Windows, the ``pywin32`` dependency has been removed. The necessary Windows API functions are now accessed directly.
1721
* Updated the ``cuda.bindings.runtime`` module to statically link against the CUDA Runtime library from CUDA Toolkit 12.9.1.
1822
* ``cyruntime.getLocalRuntimeVersion`` now uses pathfinder to find the CUDA runtime.
19-
* Experimental free-threaded builds are available on PyPI. More details are available in our :ref:`support` docs.
23+
2024

2125
Known issues
2226
------------

cuda_bindings/docs/source/release/13.0.2-notes.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,23 +3,25 @@
33
44
.. module:: cuda.bindings
55

6-
``cuda-bindings`` 13.X.Y Release notes
6+
``cuda-bindings`` 13.0.2 Release notes
77
======================================
88

9-
Released on TBD
9+
Released on Oct 9, 2025
1010

1111

1212
Highlights
1313
----------
1414

15+
* This is the last release that officially supports Python 3.9.
16+
* Python 3.14 is supported.
17+
* **Experimental** free-threaded builds for Python 3.13/3.14 are made available. Any bugs can be reported to `our GitHub repo <https://github.com/NVIDIA/cuda-python>`_. More details are available in our :ref:`support` docs.
1518
* Migrated wheel dependencies from individual NVIDIA packages to the ``cuda-toolkit`` metapackage for improved dependency resolution and version constraints.
1619
* Automatic CUDA library path detection based on ``CUDA_HOME``, eliminating the need to manually set ``LIBRARY_PATH`` environment variables for installation.
1720
* The ``[all]`` optional dependencies now use ``cuda-toolkit`` with appropriate extras instead of individual packages. The NVCC compiler is no longer automatically installed with ``pip install cuda-python[all]`` as it was previously included only to access the NVVM library, which now has its own dedicated wheel. Users who need the NVCC compiler should explicitly install it with ``pip install cuda-toolkit[nvcc]==X.Y`` with the appropriate version for their needs.
1821
* The Python overhead of calling functions in CUDA bindings in ``driver``, ``runtime`` and ``nvrtc`` has been reduced by approximately 30%.
1922
* On Windows, the ``pywin32`` dependency has been removed. The necessary Windows API functions are now accessed directly.
2023
* Updated the ``cuda.bindings.runtime`` module to statically link against the CUDA Runtime library from CUDA Toolkit 13.0.1.
2124
* ``cyruntime.getLocalRuntimeVersion`` now uses pathfinder to find the CUDA runtime.
22-
* Experimental free-threaded builds are available on PyPI. More details are available in our :ref:`support` docs.
2325

2426

2527
Bug fixes

cuda_core/DESCRIPTION.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,10 @@
55
cuda-core: Pythonic access to CUDA core functionalities
66
*******************************************************
77

8+
.. image:: https://img.shields.io/badge/NVIDIA-black?logo=nvidia
9+
:target: https://www.nvidia.com/
10+
:alt: NVIDIA
11+
812
`cuda.core <https://nvidia.github.io/cuda-python/cuda-core/>`_ bridges Python's productivity with CUDA's performance through intuitive and pythonic APIs. The mission is to provide users full access to all of the core CUDA features in Python, such as runtime control, compiler and linker.
913

1014
* `Repository <https://github.com/NVIDIA/cuda-python/tree/main/cuda_core>`_

cuda_core/cuda/core/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
#
33
# SPDX-License-Identifier: Apache-2.0
44

5-
__version__ = "0.3.3a0"
5+
__version__ = "0.4.0"

cuda_core/cuda/core/experimental/_device.pyx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ from cuda.core.experimental._context import Context, ContextOptions
1616
from cuda.core.experimental._event import Event, EventOptions
1717
from cuda.core.experimental._graph import GraphBuilder
1818
from cuda.core.experimental._memory import Buffer, DeviceMemoryResource, MemoryResource, _SynchronousMemoryResource
19-
from cuda.core.experimental._stream import IsStreamT, Stream, StreamOptions, default_stream
19+
from cuda.core.experimental._stream import IsStreamT, Stream, StreamOptions
2020
from cuda.core.experimental._utils.clear_error_support import assert_type
2121
from cuda.core.experimental._utils.cuda_utils import (
2222
ComputeCapability,
@@ -25,6 +25,7 @@ from cuda.core.experimental._utils.cuda_utils import (
2525
handle_return,
2626
runtime,
2727
)
28+
from cuda.core.experimental._stream cimport default_stream
2829

2930

3031
# TODO: I prefer to type these as "cdef object" and avoid accessing them from within Python,

cuda_core/cuda/core/experimental/_memory.pyx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ from libc.string cimport memset, memcpy
1212
from cuda.bindings cimport cydriver
1313

1414
from cuda.core.experimental._stream cimport Stream as cyStream
15+
from cuda.core.experimental._stream cimport default_stream
1516
from cuda.core.experimental._utils.cuda_utils cimport (
1617
_check_driver_error as raise_if_driver_error,
1718
check_or_create_options,
@@ -30,7 +31,7 @@ import platform
3031
import weakref
3132

3233
from cuda.core.experimental._dlpack import DLDeviceType, make_py_capsule
33-
from cuda.core.experimental._stream import Stream, default_stream
34+
from cuda.core.experimental._stream import Stream
3435
from cuda.core.experimental._utils.cuda_utils import ( driver, Transaction, get_binding_version )
3536

3637
if platform.system() == "Linux":

cuda_core/cuda/core/experimental/_stream.pxd

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,6 @@ cdef class Stream:
2222
cpdef close(self)
2323
cdef int _get_context(self) except?-1 nogil
2424
cdef int _get_device_and_context(self) except?-1
25+
26+
27+
cdef Stream default_stream()

0 commit comments

Comments
 (0)