[epic] Use custom CUDA stream for the entire codebase.

This will be a long refactoring task. The objective is to enable the use of a custom CUDA stream to improve control over asynchronous memory allocation and to enable stream-specific device.

We have support for device ordinal `cuda:1`. This has been a pain point for XGBoost, yet it's a widely used feature. In CUDA 13, streams are implicitly attached to the device during creation. As a result, if we can use a custom stream, we can remove the C API guard and avoid initializing the CUDA context.

Plan:
- [ ] Provide optional context parameter in all storage class, including:
  + [ ] DeviceUVector
  + [ ] HostDeviceVector
  + [ ] Tensor
  + [ ] TemporaryArray
- [ ] Use stream-oriented memory allocation in the booster class.
- [ ] Use stream-oriented memory allocation in the DMatrix classes.
- [ ] Provide synchronization between the DMatrix and the booster.
- [ ] Remove the set device when a device ordinal is not provided.
- [ ] Remove C API guard. Verify that XGBoost doesn't initialize the CUDA context when CUDA is not used.

PRs:
- https://github.com/dmlc/xgboost/pull/12121

Related:
- https://github.com/dmlc/xgboost/issues/12116
- https://github.com/dmlc/xgboost/issues/11884

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[epic] Use custom CUDA stream for the entire codebase. #12122

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[epic] Use custom CUDA stream for the entire codebase. #12122

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions