Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
235cdca
Add cg_streaming enum class case.
breyerml Oct 2, 2024
0006b9a
Add device_ptr flag to enable shared/managed memory allocations.
breyerml Oct 2, 2024
3508e15
Allocate kernel matrix using shared memory for cg_streaming.
breyerml Oct 2, 2024
bf19526
Use USM allocations in BLAS kernel and slightly change API.
breyerml Oct 7, 2024
e403c62
Remove USM related if in copy functions.
breyerml Oct 7, 2024
7663860
Use variable to specify whether USM allocations should be used.
breyerml Oct 7, 2024
cd6deea
Add solver_type::automatic handling for cg_streaming.
breyerml Oct 7, 2024
2dc7881
Only use USM for the kernel matrix.
breyerml Oct 7, 2024
55ad721
Improve automatic solver_type handling.
breyerml Oct 7, 2024
dad3561
Implement cg_streaming via USM allocations in SYCL.
breyerml Oct 7, 2024
f29c792
Implement cg_streaming via USM allocations in HIP.
breyerml Oct 7, 2024
c53ea42
For OpenMP and stdpar, cg_streaming is equal to cg_explicit.
breyerml Oct 7, 2024
f41aa35
Implement cg_streaming via USM allocations in OpenCL (using some ugly…
breyerml Oct 7, 2024
b5894e0
Only call get_variant() where necessary.
breyerml Oct 7, 2024
ed9b633
Add and improve error check.
breyerml Oct 7, 2024
d850275
Use cg_explicit as maximum allocation size constraint.
breyerml Oct 8, 2024
ed2e2a8
Improve output by mentioning the maximum guaranteed allocation size.
breyerml Oct 8, 2024
a34b620
Throw an exception if clSVMAlloc failed.
breyerml Oct 8, 2024
9fcdd7f
Rewrite OpenCL context logic to also support cg_streaming with multip…
breyerml Oct 8, 2024
1dd509c
Use the correct OpenCL functions to perform SVM pointer operations an…
breyerml Oct 9, 2024
570ba77
Fix usage of undefined type alias in assertion message.
breyerml Oct 9, 2024
38c27fe
Update tests to support USM device_ptr and the cg_streaming solver.
breyerml Oct 9, 2024
91b75b3
Add missing data set size contribution.
breyerml Oct 14, 2024
18691a5
Improve performance (mainly on AMD GPUs) and change implementations s…
breyerml May 24, 2025
6cddbb6
Additional performance improvement tests.
breyerml May 26, 2025
a185caf
Preliminary changes.
breyerml May 26, 2025
c74aca8
Update CUDA implementation and update comments.
breyerml May 28, 2025
dbc00ae
Do not use std::vector directly for the kernel matrix since it sequen…
breyerml May 28, 2025
10d303e
Improve the performance of the OpenMP cg_explicit kernel matrix assem…
breyerml May 29, 2025
2e64193
Improve the performance of the OpenMP cg_implicit kernel matrix assem…
breyerml May 30, 2025
8aa1c93
Improve the performance of the OpenMP predict implementation. Align n…
breyerml May 30, 2025
51b75b6
Improve variable names and remove some implicit conversions.
breyerml May 30, 2025
8a570a8
Fix tests after slight API changes.
breyerml May 30, 2025
3025c76
Remove unnecessary conditions. Improve variable naming.
breyerml May 30, 2025
46a9558
Update variable names.
breyerml May 30, 2025
0c68206
Update documentation and add missing headers.
breyerml May 31, 2025
a9e8271
Update the HIP backend kernels.
breyerml May 31, 2025
45832e7
Fix Doxygen documentation.
breyerml May 31, 2025
10ff3c2
Add additional assert.
breyerml May 31, 2025
dad55f2
Fix variable names.
breyerml May 31, 2025
e6b76f2
Use typename instead of class.
breyerml May 31, 2025
5913b50
Move parallel zero memset to header function (used in multiple places).
breyerml May 31, 2025
a67751b
Add documentation and rearrange constant declarations.
breyerml May 31, 2025
54741ff
Inverse all temp indices for better consistency.
breyerml May 31, 2025
46891d9
Add missing doxygen documentation.
breyerml May 31, 2025
fa5cea3
Update the HPX backend kernels.
breyerml May 31, 2025
ff89212
Some small changes: where possible change remaining const to constexp…
breyerml Jun 14, 2025
b4d553a
Update comments.
breyerml Jun 14, 2025
4020339
Rename sv to support_vectors for better readability and consistency.
breyerml Jun 14, 2025
39513f8
Update some comments.
breyerml Jun 20, 2025
3ef281d
Also use trimmed names in performance tracking output.
breyerml Jun 20, 2025
a20d76d
Always use a loop for the custom powi function.
breyerml Jun 20, 2025
1c55fb1
The get_default_queue now honors the default target platform.
breyerml Jun 20, 2025
b6b98fc
Improve the AdaptiveCpp device pointer creation performance on CPUs w…
breyerml Jun 21, 2025
88e5d80
Based on the provided CPU target architectures, set the correct prefe…
breyerml Jun 22, 2025
56a0f7d
Update the SYCL backend kernels.
breyerml Jun 22, 2025
8f390d0
Renamed all mentions of SYCL kernel invocation type to SYCL data para…
breyerml Jun 22, 2025
63f8b6b
Add missing include.
breyerml Jun 30, 2025
ed5d124
Update the Kokkos backend kernels.
breyerml Jun 30, 2025
7f8acb5
Update power function implementation.
breyerml Jun 30, 2025
cfc3d68
Fix Kokkos related compiler error.
breyerml Jul 1, 2025
31b4f99
Update the OpenCL backend kernels.
breyerml Jul 1, 2025
b3f8b22
Fix OpenCL kernel.
breyerml Jul 1, 2025
dc6c267
Rename variable for better consistency with other backends.
breyerml Jul 5, 2025
23d6350
Make blocking sizes constexpr instead of only const.
breyerml Jul 5, 2025
3eea873
Update comments.
breyerml Jul 5, 2025
1c4e479
Update formatting for better consistency with the other backends.
breyerml Jul 5, 2025
a029f23
Change THREAD_BLOCK_SIZE to THREAD_BLOCK_SIZE_uz.
breyerml Jul 5, 2025
efda49d
Fix documentation error using q vector instead of w vector.
breyerml Jul 5, 2025
8ecd618
Update the stdpar backend kernels.
breyerml Jul 5, 2025
ee08eaa
Fix stdpar tests after changing the kernel function interface (from f…
breyerml Jul 5, 2025
04afa46
Correctly trim the device name in the stdpar Intel LLVM backend.
breyerml Jul 7, 2025
795da8b
Improve stdpar NVHPC output if the CPU target platform is used.
breyerml Jul 7, 2025
c28ba90
Use omp_set_max_active_levels instead of the deprecated omp_set_nested.
breyerml Jul 7, 2025
9d35cc8
If Kokkos::Experimental::HPX is used, we explicitly have to initializ…
breyerml Jul 7, 2025
49658c9
Explicitly use the full namespace to prevent problems if the macros a…
breyerml Jul 10, 2025
1490b7e
Use simple ifdef instead of the PLSSVM_KOKKOS_BACKEND_INVOKE_IF_HPX m…
breyerml Jul 10, 2025
d912ce5
Refactor some parser functionality into utility functions to reduce c…
breyerml Jul 10, 2025
8adade4
Improve the README file (grammar related stuff).
breyerml Jul 10, 2025
39ecc5a
State that we support the HPX and Kokkos specific command line option…
breyerml Jul 10, 2025
321ac81
Update description of the PLSSVM_THREAD_BLOCK_SIZE behavior CMake opt…
breyerml Jul 10, 2025
2b3e11f
Undo align center changes.
breyerml Jul 10, 2025
dd15efb
Update include documentation.
breyerml Jul 12, 2025
cb5c485
Update includes.
breyerml Jul 12, 2025
25b2922
Reimplement the OpenCL device_ptr memset and fill functions using cus…
breyerml Jul 12, 2025
88c69a3
Fix decltype error.
breyerml Jul 13, 2025
a1627e6
Merge branch 'optimizations' into cg_streaming_via_usm
breyerml Jul 19, 2025
9cb953f
Update documentation to reflect the new solver type.
breyerml Jul 20, 2025
9944b7d
Add support for the new solver type to the Python bindings.
breyerml Jul 20, 2025
90f3f9b
Improve AdaptiveCpp CMake warning for hierarchical and scoped kernels.
breyerml Jul 20, 2025
c64c9b4
Add missing cg_streaming case to the HPX switches.
breyerml Jul 20, 2025
d2b5af8
Fix wrong PLSSVM_ASSERT and update test case.
breyerml Jul 20, 2025
e7558c6
Implement basic Kokkos backend API changes for the cg_streaming solve…
breyerml Jul 20, 2025
e70a42b
Add USM (and therefore cg_streaming) support to the Kokkos backend.
breyerml Jul 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 38 additions & 6 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -78,14 +78,15 @@ endif ()
# set base sources
set(PLSSVM_BASE_SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/src/plssvm/backends/Kokkos/execution_space.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/plssvm/backends/SYCL/data_parallel_kernels.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/plssvm/backends/SYCL/implementation_types.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/plssvm/backends/SYCL/kernel_invocation_types.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/plssvm/backends/stdpar/implementation_types.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/plssvm/backends/execution_range.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/plssvm/data_set/min_max_scaler.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/plssvm/detail/cmd/parser_predict.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/plssvm/detail/cmd/parser_scale.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/plssvm/detail/cmd/parser_train.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/plssvm/detail/cmd/utility.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/plssvm/detail/io/file_reader.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/plssvm/detail/data_distribution.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/plssvm/detail/memory_size.cpp
Expand Down Expand Up @@ -638,6 +639,37 @@ if (PLSSVM_ENABLE_LTO)
endif ()
endif ()

########################################################################################################################
# enable the requested vectorization widths for the auto-vectorizers #
########################################################################################################################
# GCC and clang both do not automatically auto-vectorize for AVX-512 (only AVX2)
# -> enable it if "cpu:avx512" was passed as PLSSVM_TARGET_PLATFORMS
if (PLSSVM_NUM_CPU_TARGET_ARCHS EQUAL 1)
if (${PLSSVM_CPU_TARGET_ARCHS} STREQUAL "avx512")
message(STATUS "Enabling AVX512 support for the auto-vectorizers (-mprefer-vector-width=512).")
target_compile_options(
${PLSSVM_BASE_LIBRARY_NAME}
PUBLIC $<$<COMPILE_LANGUAGE:CXX>:$<$<CXX_COMPILER_ID:GNU,Clang,IntelLLVM>:-mprefer-vector-width=512>>
)
elseif (${PLSSVM_CPU_TARGET_ARCHS} STREQUAL "avx2" OR ${PLSSVM_CPU_TARGET_ARCHS} STREQUAL "avx")
message(STATUS "Enabling AVX/AVX2 support for the auto-vectorizers (-mprefer-vector-width=256).")
target_compile_options(
${PLSSVM_BASE_LIBRARY_NAME}
PUBLIC $<$<COMPILE_LANGUAGE:CXX>:$<$<CXX_COMPILER_ID:GNU,Clang,IntelLLVM>:-mprefer-vector-width=256>>
)
elseif (${PLSSVM_CPU_TARGET_ARCHS} MATCHES "^sse")
message(STATUS "Enabling SSE for the auto-vectorizers (-mprefer-vector-width=128).")
target_compile_options(
${PLSSVM_BASE_LIBRARY_NAME}
PUBLIC $<$<COMPILE_LANGUAGE:CXX>:$<$<CXX_COMPILER_ID:GNU,Clang,IntelLLVM>:-mprefer-vector-width=128>>
)
else ()
message(FATAL_ERROR "Unrecognized CPU target architecture \"${PLSSVM_CPU_TARGET_ARCHS}\". Allowed values are: avx512, avx2, avx, sse.")
endif ()
else ()
# automatically use the "optimal" auto-vectorizer width
endif ()

########################################################################################################################
# check for optional and necessary dependencies #
########################################################################################################################
Expand Down Expand Up @@ -914,16 +946,16 @@ if (TARGET ${PLSSVM_SYCL_BACKEND_LIBRARY_NAME})
choose the SYCL implementation to be used in the SYCL backend: ${PLSSVM_SYCL_BACKEND_NAME_LIST} (default: automatic)
"
)
string(REPLACE ";" "|" PLSSVM_SYCL_KERNEL_INVOCATION_TYPE_NAME_LIST "${PLSSVM_SYCL_KERNEL_INVOCATION_TYPE_NAME_LIST}")
set(PLSSVM_SYCL_KERNEL_INVOCATION_TYPE_MANPAGE_ENTRY
string(REPLACE ";" "|" PLSSVM_SYCL_DATA_PARALLEL_KERNEL_NAME_LIST "${PLSSVM_SYCL_DATA_PARALLEL_KERNEL_NAME_LIST}")
set(PLSSVM_SYCL_DATA_PARALLEL_KERNEL_MANPAGE_ENTRY
"
.TP
.B --sycl_kernel_invocation_type
choose the kernel invocation type when using SYCL as backend: ${PLSSVM_SYCL_KERNEL_INVOCATION_TYPE_NAME_LIST} (default: automatic)
.B --sycl_data_parallel_kernel
choose the data parallel kernel when using SYCL as backend: ${PLSSVM_SYCL_DATA_PARALLEL_KERNEL_NAME_LIST} (default: automatic)
"
)
endif ()
set(PLSSVM_SYCL_MANPAGE_ENTRY "${PLSSVM_SYCL_KERNEL_INVOCATION_TYPE_MANPAGE_ENTRY}${PLSSVM_SYCL_IMPLEMENTATION_TYPE_MANPAGE_ENTRY}")
set(PLSSVM_SYCL_MANPAGE_ENTRY "${PLSSVM_SYCL_DATA_PARALLEL_KERNEL_MANPAGE_ENTRY}${PLSSVM_SYCL_IMPLEMENTATION_TYPE_MANPAGE_ENTRY}")
# assemble the Kokkos manpage entry
if (TARGET ${PLSSVM_KOKKOS_BACKEND_LIBRARY_NAME})
string(REPLACE ";" "|" PLSSVM_KOKKOS_BACKEND_AVAILABLE_EXECUTION_SPACES "${PLSSVM_KOKKOS_BACKEND_AVAILABLE_EXECUTION_SPACES}")
Expand Down
Loading
Loading