Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 9ad04fb

Browse files
committedDec 6, 2024·
Integrated to the main unified specification and other updates.
* Moved the functionality to clCreateBufferWithProperties, thus now requiring 3.0+. * Single memobj query for fetching the address(es). * Also other smaller improvements pointed by Kevin. * Candidate for 1.0.0.
1 parent e9f1bbc commit 9ad04fb

5 files changed

+232
-337
lines changed
 
+96
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
// Copyright 2024 The Khronos Group Inc.
2+
// SPDX-License-Identifier: CC-BY-4.0
3+
4+
include::{generated}/meta/{refprefix}cl_ext_buffer_device_address.txt[]
5+
6+
=== Other Extension Metadata
7+
8+
*Last Modified Date*::
9+
2024-12-06
10+
*IP Status*::
11+
No known IP claims.
12+
*Contributors*::
13+
- Pekka Jääskeläinen, Intel +
14+
- Karol Herbst, Red Hat +
15+
- Henry Linjamäki, Intel +
16+
- Kevin Petit, Arm +
17+
18+
=== Description
19+
20+
This extension provides access to raw device pointers for cl_mem buffers
21+
without requiring a shared virtual address space between the host and
22+
the device.
23+
24+
==== Background
25+
26+
Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature
27+
that enables raw pointers in the OpenCL standard. Its coarse-grain
28+
variant is relatively simple to implement on various platforms in terms of
29+
coherency requirements, but it requires mapping the buffer's address range
30+
to the host virtual address space.
31+
However, various higher-level heterogeneous APIs present a memory allocation
32+
routine which can allocate device-only memory and provide raw addresses to
33+
it without guarentees of system-wide uniqueness. For example, minimal
34+
implementations of OpenMP's omp_target_alloc() and CUDA/HIP's
35+
cudaMalloc()/hipMalloc() do not require a shared address space between the host and the device.
36+
37+
Host-device unified addressing might not be a major implementation issue in
38+
systems which can provide virtual memory across the platform, but might
39+
bring challenges in cases where the device presents a global memory with
40+
a disjoint address space (that can also be a physical memory address space) or,
41+
for example, when a barebone embedded system lacks virtual memory support altogether.
42+
This extension is targeted to complement the OpenCL SVM extension by providing
43+
an additional lower-end step in the spectrum of type of pointers/buffers OpenCL
44+
can allocate.
45+
46+
=== New Command
47+
48+
* {clSetKernelArgDevicePointerEXT}
49+
50+
=== New Types
51+
52+
* {cl_mem_device_address_EXT}
53+
54+
=== New Enums
55+
56+
* {cl_mem_properties_TYPE}
57+
** {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}
58+
** {CL_MEM_DEVICE_SHARED_ADDRESS_EXT}
59+
* {cl_mem_info_TYPE}
60+
** {CL_MEM_DEVICE_ADDRESS_EXT}
61+
* {cl_kernel_exec_info_TYPE}
62+
** {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT}
63+
64+
=== Version History
65+
66+
[cols="5,15,15,70"]
67+
[grid="rows"]
68+
[options="header"]
69+
|====
70+
| *Version* | *Date* | *Author* | *Changes*
71+
| 0.9.0 | 2024-12-06 | Pekka Jääskeläinen, Kevin Petit |
72+
Integrated to the main unified specification.
73+
Moved the functionality to clCreateBufferWithProperties,
74+
thus requiring 3.0+. Single memobj query for fetching the
75+
address(es). Also other smaller improvements pointed by Kevin.
76+
Candidate for final 1.0.0.
77+
| 0.3.0 | 2024-09-24 | Pekka Jääskeläinen, Karol Herbst |
78+
Made the allocation flags independent from each other and
79+
renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and
80+
CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the
81+
same address across all devices in the context, whereas the latter
82+
allows per-device addresses.
83+
| 0.2.0 | 2024-09-09 | Pekka Jääskeläinen, Karol Herbst |
84+
Changed the CL_MEM_DEVICE_ADDRESS_EXT wording for multi-device
85+
cases "all", not "any", covering a case where not all devices
86+
can ensure the same address across the context. In that case
87+
CL_INVALID_VALUE can be returned. Defined sub-buffer address
88+
computation to be 'base_addr + origin'. Added error conditions
89+
for clSetKernelExecInfo when the device doesn't support
90+
device pointers.
91+
| 0.1.0 | 2024-05-07 | Pekka Jääskeläinen | First draft text for feedback.
92+
This version describes the first API version that was prototyped
93+
in PoCL and RustiCL using temporary placeholder flag/enum values.
94+
The PoCL implementation and initial discussion on the extension
95+
can be found https://github.com/pocl/pocl/pull/1441[in this PR].
96+
|====

‎api/opencl_runtime_layer.asciidoc

+125-1
Original file line numberDiff line numberDiff line change
@@ -594,6 +594,39 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_HANDLE_LIST_KHR.asciidoc[]
594594
{CL_MEM_DEVICE_HANDLE_LIST_END_KHR_anchor}) to associate with the
595595
external memory handle.
596596
endif::cl_khr_external_memory[]
597+
598+
ifdef::cl_ext_buffer_device_address[]
599+
600+
| {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor}
601+
602+
include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT.asciidoc[]
603+
| {cl_bool_TYPE}
604+
| When set to CL_TRUE, specifies that the buffer must have a single fixed
605+
device-side address for its lifetime, and the address can be queried via {clGetMemObjectInfo}.
606+
607+
Each device in the context can have their own (fixed) device-side address and
608+
a copy of the created buffer which are synchronized
609+
implicitly by the runtime.
610+
611+
The flag might imply that the buffer will be "pinned" permanently to
612+
a device's memory, but might not be necessarily so, as long as the address
613+
range of the buffer remains constant.
614+
615+
The device addresses of sub-buffers derived from {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}
616+
allocated buffers can be computed by adding the sub-buffer origin to the
617+
device-specific start address.
618+
619+
| {CL_MEM_DEVICE_SHARED_ADDRESS_EXT_anchor}
620+
621+
include::{generated}/api/version-notes/CL_MEM_DEVICE_SHARED_ADDRESS_EXT.asciidoc[]
622+
| {cl_bool_TYPE}
623+
| When set to CL_TRUE, the buffer has otherwise the same properties as
624+
when allocated using the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor} flag,
625+
but with an additional property that the buffer's address is the same across
626+
all the devices in the context.
627+
628+
endif::cl_ext_buffer_device_address[]
629+
597630
|====
598631

599632
ifdef::cl_khr_external_memory[]
@@ -660,6 +693,15 @@ ifdef::cl_khr_external_memory[]
660693
** if _properties_ does not include a supported external memory handle and
661694
{CL_MEM_DEVICE_HANDLE_LIST_KHR} is specified as part of _properties_.
662695
endif::cl_khr_external_memory[]
696+
ifdef::cl_ext_buffer_device_address[]
697+
* {CL_INVALID_DEVICE}
698+
** If _properties_ includes either {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} or
699+
{CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} and there is at least one device in
700+
the context that doesn't support such allocation.
701+
* {CL_INVALID_VALUE}
702+
** If _properties_ includes both {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} and
703+
{CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} at the same time.
704+
endif::cl_ext_buffer_device_address[]
663705

664706
[[memory-flags-table]]
665707
.List of supported memory flag values
@@ -6149,6 +6191,21 @@ include::{generated}/api/version-notes/CL_MEM_D3D11_RESOURCE_KHR.asciidoc[]
61496191
returns the _resource_ argument specified when _memobj_ was created.
61506192
endif::cl_khr_d3d11_sharing[]
61516193

6194+
ifdef::cl_ext_buffer_device_address[]
6195+
| {CL_MEM_DEVICE_ADDRESS_EXT_anchor}
6196+
6197+
include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[]
6198+
| {cl_mem_device_address_EXT_TYPE}[]
6199+
| If _memobj_ was created using {clCreateBufferWithProperties} with
6200+
the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property set to CL_TRUE,
6201+
returns a list of device addresses for the buffer, one for each
6202+
device in the context. If the buffer was allocated
6203+
with the {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} property,
6204+
only one device address is returned.
6205+
6206+
endif::cl_ext_buffer_device_address[]
6207+
6208+
61526209
|====
61536210

61546211
// refError
@@ -6158,6 +6215,12 @@ successfully.
61586215
Otherwise, it returns one of the following errors:
61596216

61606217
* {CL_INVALID_MEM_OBJECT} if _memobj_ is a not a valid memory object.
6218+
ifdef::cl_ext_buffer_device_address[]
6219+
** Returned for the {CL_MEM_DEVICE_ADDRESS_EXT} query if
6220+
the {cl_ext_buffer_device_address_EXT} is not supported or if the
6221+
buffer was not allocated with neither {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} or
6222+
{CL_MEM_DEVICE_SHARED_ADDRESS_EXT}.
6223+
endif::cl_ext_buffer_device_address[]
61616224
* {CL_INVALID_VALUE} if _param_name_ is not valid, or if size in bytes
61626225
specified by _param_value_size_ is < size of return type as described in
61636226
the <<mem-info-table,Memory Object Info>> table and _param_value_ is not
@@ -10454,6 +10517,48 @@ Otherwise, it returns one of the following errors:
1045410517
required by the OpenCL implementation on the host.
1045510518
--
1045610519

10520+
ifdef::cl_ext_buffer_device_address[]
10521+
[open,refpage='clSetKernelArgDevicePointerEXT',desc='Set a device pointer as the argument value for a specific argument of a kernel.',type='protos']
10522+
--
10523+
To set a device pointer as the argument value for a specific argument of a
10524+
kernel, call the function
10525+
10526+
include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[]
10527+
include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[]
10528+
10529+
* _kernel_ is a valid kernel object.
10530+
* _arg_index_ is the argument index.
10531+
Arguments to the kernel are referred by indices that go from 0 for the
10532+
leftmost argument to _n_ - 1, where _n_ is the total number of arguments
10533+
declared by a kernel.
10534+
* _arg_value_ is the device pointer that should be used as the argument value for
10535+
argument specified by _arg_index_.
10536+
The device pointer specified is the value used by all API calls that enqueue
10537+
_kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument
10538+
value is changed by a call to {clSetKernelArgDevicePointerEXT} for _kernel_.
10539+
The device pointer can only be used for arguments that are declared to be a
10540+
pointer to `global` memory allocated with {clCreateBufferWithProperties} with
10541+
either the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} or {CL_MEM_DEVICE_SHARED_ADDRESS_EXT}
10542+
property. The pointer value specified as the argument value
10543+
can be the pointer to the beginning of the buffer or any offset into
10544+
the buffer region. The device pointer value must be naturally aligned according to
10545+
the argument's type.
10546+
10547+
{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the argument was set
10548+
successfully. Otherwise, it returns one of the following errors:
10549+
10550+
* {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object.
10551+
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support
10552+
the device pointer.
10553+
* {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index.
10554+
* {CL_INVALID_ARG_VALUE} if _arg_value_ specified is not a valid value.
10555+
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required
10556+
by the OpenCL implementation on the device.
10557+
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
10558+
required by the OpenCL implementation on the host.
10559+
--
10560+
endif::cl_ext_buffer_device_address[]
10561+
1045710562
[open,refpage='clSetKernelExecInfo',desc='Pass additional information other than argument values to a kernel.',type='protos']
1045810563
--
1045910564
To pass additional information other than argument values to a kernel, call
@@ -10497,6 +10602,19 @@ include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM
1049710602
grain system SVM allocations.
1049810603
These fine grain system SVM pointers may be passed as arguments or
1049910604
defined in SVM buffers that are passed as arguments to _kernel_.
10605+
10606+
ifdef::cl_ext_buffer_device_address[]
10607+
| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor}
10608+
10609+
include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[]
10610+
| {cl_mem_device_address_EXT_TYPE}[]
10611+
| Device pointers must reference locations contained entirely within
10612+
buffers that are passed to kernel as arguments, or that are passed
10613+
through the execution information. Non-argument device pointers accessed
10614+
by the kernel must be specified by passing pointers to those buffers
10615+
via this {clSetKernelExecInfo} option.
10616+
endif::cl_ext_buffer_device_address[]
10617+
1050010618
|====
1050110619

1050210620
// refError
@@ -10506,7 +10624,13 @@ successfully.
1050610624
Otherwise, it returns one of the following errors:
1050710625

1050810626
* {CL_INVALID_KERNEL} if _kernel_ is a not a valid kernel object.
10509-
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support SVM.
10627+
* {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_SVM_PTRS} if no devices in
10628+
the context associated with _kernel_ support SVM.
10629+
ifdef::cl_ext_buffer_device_address[]
10630+
* {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT} if no
10631+
device in the context associated with _kernel_ support device pointers.
10632+
endif::cl_ext_buffer_device_address[]
10633+
1051010634
* {CL_INVALID_VALUE} if _param_name_ is not valid, if _param_value_ is
1051110635
`NULL` or if the size specified by _param_value_size_ is not valid.
1051210636
* {CL_INVALID_OPERATION} if _param_name_ is

0 commit comments

Comments
 (0)
Please sign in to comment.