This will be a long refactoring task. The objective is to enable the use of a custom CUDA stream to improve control over asynchronous memory allocation and to enable stream-specific device.
We have support for device ordinal cuda:1. This has been a pain point for XGBoost, yet it's a widely used feature. In CUDA 13, streams are implicitly attached to the device during creation. As a result, if we can use a custom stream, we can remove the C API guard and avoid initializing the CUDA context.
Plan:
PRs:
Related:
This will be a long refactoring task. The objective is to enable the use of a custom CUDA stream to improve control over asynchronous memory allocation and to enable stream-specific device.
We have support for device ordinal
cuda:1. This has been a pain point for XGBoost, yet it's a widely used feature. In CUDA 13, streams are implicitly attached to the device during creation. As a result, if we can use a custom stream, we can remove the C API guard and avoid initializing the CUDA context.Plan:
PRs:
Related: