Skip to content

Commit 1931416

Browse files
committed
BDA: Made the allocation flags independent from each other
...and renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the same address across all devices in the context, whereas the latter allows per-device addresses.
1 parent 6ec48c4 commit 1931416

File tree

2 files changed

+32
-27
lines changed

2 files changed

+32
-27
lines changed

extensions/cl_ext_buffer_device_address.asciidoc

+30-25
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ Draft.
4646
== Version
4747

4848
Built On: {docdate} +
49-
Revision: 0.2.0
49+
Revision: 0.3.0
5050

5151
== Dependencies
5252

@@ -59,8 +59,6 @@ This extension requires OpenCL 1.0 or later.
5959
The basic cl_mem buffer API doesn't enable access to the underlying raw
6060
pointers in the device memory, preventing its use in host side
6161
data structures that need pointer references to objects.
62-
This API adds a minimal increment on top of cl_mem that provides such
63-
capabilities.
6462

6563
Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature
6664
that enables raw device side pointers in the OpenCL standard. Its coarse-grain
@@ -69,17 +67,18 @@ coherency requirements, but it requires mapping the buffer's address range
6967
to the host virtual address space although it might not be needed by the
7068
application. This is not an issue in systems which can provide virtual memory
7169
across the platform, but might provide implementation challenges in cases
72-
where the device presents a global memory with its disjoint address space
70+
where the device presents a global memory with a disjoint address space
7371
(that can also be a physical memory address space) or, for example, when
7472
a barebone embedded system lacks virtual memory support altogether.
7573

7674
Various higher-level APIs present a memory allocation routine which can
7775
allocate device-only memory and provide raw pointers to it without guarentees
78-
of system-wide uniqueness: Minimal implementations of OpenMP's omp_target_alloc() and
79-
CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared
76+
of system-wide uniqueness: For example, minimal implementations of OpenMP's
77+
omp_target_alloc() and CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared
8078
address space between the host and the device. This extension is meant to
81-
provide a minimal set of features to implement such APIs without requiring
82-
a shared virtual address space between the host and the device.
79+
provide a minimal set of features to implement such APIs using the cl_mem
80+
buffers without requiring a shared virtual address space between the host and
81+
the device.
8382

8483
=== New API Function
8584

@@ -92,8 +91,8 @@ Enums for enabling device pointer properties when creating a buffer
9291

9392
[source]
9493
----
95-
#define CL_MEM_DEVICE_ADDRESS_EXT (1ul << 31)
96-
#define CL_MEM_DEVICE_PRIVATE_EXT (1ul << 30)
94+
#define CL_MEM_DEVICE_SHARED_ADDRESS_EXT (1ul << 31)
95+
#define CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT (1ul << 30)
9796
----
9897

9998
Enums for querying the device pointer from the cl_mem <<clGetMemObjectInfo, the list of supported param_names table>>:
@@ -142,9 +141,9 @@ Add new allocation flags <<clCreateBuffer, List of supported memory flag values
142141
[width="100%",cols="<50%,<50%",options="header"]
143142
|====
144143
| Memory Flags | Description
145-
| {CL_MEM_DEVICE_ADDRESS_EXT_anchor}
144+
| {CL_MEM_DEVICE_SHARED_ADDRESS_EXT_anchor}
146145

147-
include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[]
146+
include::{generated}/api/version-notes/CL_MEM_DEVICE_SHARED_ADDRESS_EXT.asciidoc[]
148147
| This flag specifies that the buffer must have a single fixed address
149148
for its lifetime and the address should be unique at least across the devices
150149
of the context, but not necessarily within the host (virtual) memory.
@@ -161,22 +160,22 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[]
161160
client code. If all of the devices in the context do not support
162161
this type of allocations, an error (CL_INVALID_VALUE) is returned.
163162

164-
The device addresses of sub-buffers derived from CL_MEM_DEVICE_ADDRESS_EXT
163+
The device addresses of sub-buffers derived from CL_MEM_DEVICE_SHARED_ADDRESS_EXT
165164
allocated buffers can be computed by adding the sub-buffer origin to the
166165
start address.
167166

168-
| {CL_MEM_DEVICE_PRIVATE_EXT_anchor}
167+
| {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor}
169168

170-
include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_EXT.asciidoc[]
171-
| If this flag is combined with CL_MEM_DEVICE_ADDRESS_EXT, each device in
172-
the context can have their own (fixed) device-side address and a copy of
173-
the created buffer which are synchronized implicitly by the runtime.
174-
The main difference to a default cl_mem allocation in that case is then
175-
that the addresses are queriable with CL_MEM_DEVICE_PTRS_EXT and the
176-
per-device address is guaranteed to be the same for the entire lifetime
169+
include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT.asciidoc[]
170+
| This flag specifies that the buffer must have a single fixed address
171+
for its lifetime. Each device in the context can have their own (fixed)
172+
device-side address and a copy of the created buffer which are synchronized
173+
implicitly by the runtime. The main difference to a default cl_mem allocation
174+
in that case is that the addresses are queriable with CL_MEM_DEVICE_PTRS_EXT
175+
and the per-device address is guaranteed to be the same for the entire lifetime
177176
of the cl_mem.
178177

179-
The device addresses of sub-buffers derived from CL_MEM_DEVICE_PRIVATE_EXT
178+
The device addresses of sub-buffers derived from CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT
180179
allocated buffers can be computed by adding the sub-buffer origin to the
181180
device-specific start address.
182181

@@ -196,7 +195,7 @@ Add a new information type <<clGetMemObjectInfo, List of supported param_names t
196195
include::{generated}/api/version-notes/CL_MEM_DEVICE_PTR_EXT.asciidoc[]
197196
| {cl_mem_device_address_EXT_TYPE}
198197
| Returns the device address for a buffer allocated with
199-
CL_MEM_DEVICE_ADDRESS_EXT. If the buffer was not created with the flag
198+
CL_MEM_DEVICE_SHARED_ADDRESS_EXT. If the buffer was not created with the flag
200199
or there are multiple devices in the context and the buffer address is
201200
not the same for all of them, it returns CL_INVALID_MEM_OBJECT.
202201

@@ -205,7 +204,7 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_PTRS_EXT.asciidoc[]
205204
| {cl_mem_device_address_pair_EXT_TYPE}
206205
| Returns the device-address pairs for all devices in the context.
207206
The per-device addresses might differ when the buffer was allocated
208-
with the CL_MEM_DEVICE_PRIVATE_EXT enabled.
207+
with the CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT enabled.
209208
|====
210209

211210

@@ -231,7 +230,7 @@ include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[]
231230
value is changed by a call to {clSetKernelArgDevicePointer} for _kernel_.
232231
The device pointer can only be used for arguments that are declared to be a
233232
pointer to `global` memory allocated with clCreateBuffer() with the
234-
CL_MEM_DEVICE_ADDRESS_EXT flag. The pointer value specified as the argument value
233+
CL_MEM_DEVICE_SHARED_ADDRESS_EXT flag. The pointer value specified as the argument value
235234
can be the pointer to the beginning of the buffer or be a pointer offset into
236235
the buffer region. The device pointer value must be naturally aligned according to
237236
the argument's type.
@@ -299,6 +298,12 @@ None.
299298
[options="header"]
300299
|====
301300
| *Version* | *Date* | *Author* | *Changes*
301+
| 0.3.0 | 2024-09-24 | Pekka Jääskeläinen, Karol Herbst |
302+
Made the allocation flags independent from each other and
303+
renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and
304+
CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the
305+
same address across all devices in the context, whereas the latter
306+
allows per-device addresses.
302307
| 0.2.0 | 2024-09-09 | Pekka Jääskeläinen, Karol Herbst |
303308
Changed the CL_MEM_DEVICE_ADDRESS_EXT wording for multi-device
304309
cases "all", not "any", covering a case where not all devices

xml/cl.xml

+2-2
Original file line numberDiff line numberDiff line change
@@ -7207,8 +7207,8 @@ server's OpenCL/api-docs repository.
72077207
<command name="clSetKernelArgDevicePointerEXT"/>
72087208
</require>
72097209
<require comment="cl_mem_flags">
7210-
<enum name="CL_MEM_DEVICE_ADDRESS_EXT"/>
7211-
<enum name="CL_MEM_DEVICE_PRIVATE_EXT"/>
7210+
<enum name="CL_MEM_DEVICE_SHARED_ADDRESS_EXT"/>
7211+
<enum name="CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT"/>
72127212
</require>
72137213
<require comment="cl_mem_info">
72147214
<enum name="CL_MEM_DEVICE_PTR_EXT"/>

0 commit comments

Comments
 (0)