Skip to content

[BUG]: The symbol table requires protection for thread-safety #852

@pentschev

Description

@pentschev

Is this a duplicate?

Type of Bug

Runtime Error

Component

cuda.bindings

Describe the bug

Multi-threaded applications may attempt to access the symbol table simultaneously before it's populated and cause concurrency issues, as observed previously.

Sample failure stack
/opt/conda/envs/test/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/devices.py:125: in ensure_context
    with driver.get_active_context():
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
/opt/conda/envs/test/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py:539: in __enter__
    hctx = driver.cuCtxGetCurrent()
           ^^^^^^^^^^^^^^^^^^^^^^^^
/opt/conda/envs/test/lib/python3.12/site-packages/numba_cuda/numba/cuda/cudadrv/driver.py:396: in safe_cuda_api_call
    return self._check_cuda_python_error(fname, libfn(*args))
                                                ^^^^^^^^^^^^
cuda/bindings/driver.pyx:20135: in cuda.bindings.driver.cuCtxGetCurrent
    ???
cuda/bindings/cydriver.pyx:107: in cuda.bindings.cydriver.cuCtxGetCurrent
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   RuntimeError: Function "cuCtxGetCurrent" not found

This should have been already addressed by #835.

How to Reproduce

No known trivial reproducer, consistently observed with this test from RapidsMPF and numba-cuda=0.18.1 (required to ensure it's using cuda-bindings, instead of its own ctypes implementation):

Reproducer
@pytest.mark.parametrize("partition_count", [None, 3])
@pytest.mark.parametrize("sort", [True, False])
@pytest.mark.parametrize("cluster_kind", ["auto", "single"])
def test_dask_cudf_integration_single(
    partition_count: int,
    sort: bool,  # noqa: FBT001
    cluster_kind: Literal["distributed", "single", "auto"],
) -> None:
    # Test single-worker cuDF integration with Dask-cuDF
    pytest.importorskip("dask_cudf")

    df = (
        dask.datasets.timeseries(
            freq="3600s",
            partition_freq="2D",
        )
        .reset_index(drop=True)
        .to_backend("cudf")
    )
    partition_count_in = df.npartitions
    expect = df.compute().sort_values(["id", "name", "x", "y"])
    shuffled = dask_cudf_shuffle(
        df,
        ["id", "name"],
        sort=sort,
        partition_count=partition_count,
        cluster_kind=cluster_kind,
        config_options=Options({"single_spill_device": "0.1"}),
    )
    assert shuffled.npartitions == (partition_count or partition_count_in)
    got = shuffled.compute()
    if sort:
        assert got["id"].is_monotonic_increasing
    got = got.sort_values(["id", "name", "x", "y"])

    dd.assert_eq(expect, got, check_index=False)

Expected behavior

Attempting to create a CUDA context from multiple threads should not cause an error.

Operating System

No response

nvidia-smi output

No response

Metadata

Metadata

Assignees

Labels

P0High priority - Must do!bugSomething isn't workingcuda.bindingsEverything related to the cuda.bindings module

Type

No type

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions