Implement cg_streaming via USM #69

breyerml · 2024-10-09T14:04:44Z

Implements the cg_streaming solver_type using USM.
The cg_explicit kernels are used, i.e., no special performance tuning has been performed.

The logic for the OpenCL backend had to be changed to allow multi-GPU support with OpenCL's SVM.
(instead of one context with all devices, we now create one separate context for each device)

… workarounds).

…le GPUs.

…d improve simplicity of implementation by using a std::variant<cl_mem, T*> as device_pointer_type.

…lightly such that the backends are more similar.

…tially initializes all values to zero. Instead, use a std::unique_ptr together with a C++17 conformant make_unique_for_overwrite implementation followed by an OpenMP parallel zero initialization of all values drastically reducing the overhead.

…bly and BLAS implementation. Align names more to the ones used in the other backends.

…bly + BLAS implementation. Align names more to the ones used in the other backends.

Now: some parts of the kernels are specialized for the CPU for better performance.

…ree function to function object).

…e the HPX runtime before a call to Kokkos::initialize, otherwise the HPX specific command line options are ignored.

…re used inside another namespace.

…acro.

…ode duplication. Add the possibility to filter out some command line options (mainly from third party libraries HPX and Kokkos).

…s by forwarding them to the respective initialization functions.

…ion.

…tom kernels since the previous version using clEnqueueFillBuffer failed for SOME data sets on NVIDIA GPUs.

# Conflicts: # include/plssvm/backends/CUDA/csvm.hpp # include/plssvm/backends/gpu_device_ptr.hpp # include/plssvm/csvm.hpp # include/plssvm/detail/data_distribution.hpp # include/plssvm/detail/type_traits.hpp # src/plssvm/backends/OpenCL/csvm.cpp # src/plssvm/backends/OpenCL/detail/context.cpp # src/plssvm/backends/OpenCL/detail/device_ptr.cpp # src/plssvm/backends/OpenCL/detail/utility.cpp # src/plssvm/backends/OpenMP/csvm.cpp # src/plssvm/backends/stdpar/csvm.cpp # src/plssvm/detail/data_distribution.cpp # tests/backends/CUDA/detail/device_ptr.cpp # tests/backends/HIP/detail/device_ptr.hip # tests/backends/OpenCL/detail/device_ptr.cpp # tests/backends/generic_csvm_tests.hpp # tests/backends/generic_device_ptr_tests.hpp # tests/types_to_test.hpp

…r type. Note: functionality currently not implemented!

breyerml added 30 commits October 2, 2024 12:09

Add cg_streaming enum class case.

235cdca

Add device_ptr flag to enable shared/managed memory allocations.

0006b9a

Allocate kernel matrix using shared memory for cg_streaming.

3508e15

Use USM allocations in BLAS kernel and slightly change API.

bf19526

Remove USM related if in copy functions.

e403c62

Use variable to specify whether USM allocations should be used.

7663860

Add solver_type::automatic handling for cg_streaming.

cd6deea

Only use USM for the kernel matrix.

2dc7881

Improve automatic solver_type handling.

55ad721

Implement cg_streaming via USM allocations in SYCL.

dad3561

Implement cg_streaming via USM allocations in HIP.

f29c792

For OpenMP and stdpar, cg_streaming is equal to cg_explicit.

c53ea42

Implement cg_streaming via USM allocations in OpenCL (using some ugly…

f41aa35

… workarounds).

Only call get_variant() where necessary.

b5894e0

Add and improve error check.

ed9b633

Use cg_explicit as maximum allocation size constraint.

d850275

Improve output by mentioning the maximum guaranteed allocation size.

ed2e2a8

Throw an exception if clSVMAlloc failed.

a34b620

Rewrite OpenCL context logic to also support cg_streaming with multip…

9fcdd7f

…le GPUs.

Use the correct OpenCL functions to perform SVM pointer operations an…

1dd509c

…d improve simplicity of implementation by using a std::variant<cl_mem, T*> as device_pointer_type.

Fix usage of undefined type alias in assertion message.

570ba77

Update tests to support USM device_ptr and the cg_streaming solver.

38c27fe

Add missing data set size contribution.

91b75b3

Improve performance (mainly on AMD GPUs) and change implementations s…

18691a5

…lightly such that the backends are more similar.

Additional performance improvement tests.

6cddbb6

Preliminary changes.

a185caf

Update CUDA implementation and update comments.

c74aca8

Improve the performance of the OpenMP cg_explicit kernel matrix assem…

10d303e

…bly and BLAS implementation. Align names more to the ones used in the other backends.

Improve the performance of the OpenMP cg_implicit kernel matrix assem…

2e64193

…bly + BLAS implementation. Align names more to the ones used in the other backends.

breyerml added 30 commits July 5, 2025 16:31

Make blocking sizes constexpr instead of only const.

23d6350

Update comments.

3eea873

Update formatting for better consistency with the other backends.

1c4e479

Change THREAD_BLOCK_SIZE to THREAD_BLOCK_SIZE_uz.

a029f23

Fix documentation error using q vector instead of w vector.

efda49d

Update the stdpar backend kernels.

8ecd618

Now: some parts of the kernels are specialized for the CPU for better performance.

Fix stdpar tests after changing the kernel function interface (from f…

ee08eaa

…ree function to function object).

Correctly trim the device name in the stdpar Intel LLVM backend.

04afa46

Improve stdpar NVHPC output if the CPU target platform is used.

795da8b

Use omp_set_max_active_levels instead of the deprecated omp_set_nested.

c28ba90

If Kokkos::Experimental::HPX is used, we explicitly have to initializ…

9d35cc8

…e the HPX runtime before a call to Kokkos::initialize, otherwise the HPX specific command line options are ignored.

Explicitly use the full namespace to prevent problems if the macros a…

49658c9

…re used inside another namespace.

Use simple ifdef instead of the PLSSVM_KOKKOS_BACKEND_INVOKE_IF_HPX m…

1490b7e

…acro.

Refactor some parser functionality into utility functions to reduce c…

d912ce5

…ode duplication. Add the possibility to filter out some command line options (mainly from third party libraries HPX and Kokkos).

Improve the README file (grammar related stuff).

8adade4

State that we support the HPX and Kokkos specific command line option…

39ecc5a

…s by forwarding them to the respective initialization functions.

Update description of the PLSSVM_THREAD_BLOCK_SIZE behavior CMake opt…

321ac81

…ion.

Undo align center changes.

2b3e11f

Update include documentation.

dd15efb

Update includes.

cb5c485

Reimplement the OpenCL device_ptr memset and fill functions using cus…

25b2922

…tom kernels since the previous version using clEnqueueFillBuffer failed for SOME data sets on NVIDIA GPUs.

Fix decltype error.

88c69a3

Update documentation to reflect the new solver type.

9cb953f

Add support for the new solver type to the Python bindings.

9944b7d

Improve AdaptiveCpp CMake warning for hierarchical and scoped kernels.

90f3f9b

Add missing cg_streaming case to the HPX switches.

c64c9b4

Fix wrong PLSSVM_ASSERT and update test case.

d2b5af8

Implement basic Kokkos backend API changes for the cg_streaming solve…

e7558c6

…r type. Note: functionality currently not implemented!

Add USM (and therefore cg_streaming) support to the Kokkos backend.

e70a42b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement cg_streaming via USM #69

Implement cg_streaming via USM #69

Uh oh!

breyerml commented Oct 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement cg_streaming via USM #69

Are you sure you want to change the base?

Implement cg_streaming via USM #69

Uh oh!

Conversation

breyerml commented Oct 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants