Skip to content

[epic] Use custom CUDA stream for the entire codebase. #12122

@trivialfis

Description

@trivialfis

This will be a long refactoring task. The objective is to enable the use of a custom CUDA stream to improve control over asynchronous memory allocation and to enable stream-specific device.

We have support for device ordinal cuda:1. This has been a pain point for XGBoost, yet it's a widely used feature. In CUDA 13, streams are implicitly attached to the device during creation. As a result, if we can use a custom stream, we can remove the C API guard and avoid initializing the CUDA context.

Plan:

  • Provide optional context parameter in all storage class, including:
    • DeviceUVector
    • HostDeviceVector
    • Tensor
    • TemporaryArray
  • Use stream-oriented memory allocation in the booster class.
  • Use stream-oriented memory allocation in the DMatrix classes.
  • Provide synchronization between the DMatrix and the booster.
  • Remove the set device when a device ordinal is not provided.
  • Remove C API guard. Verify that XGBoost doesn't initialize the CUDA context when CUDA is not used.

PRs:

Related:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions