46
46
== Version
47
47
48
48
Built On: {docdate} +
49
- Revision: 0.2 .0
49
+ Revision: 0.3 .0
50
50
51
51
== Dependencies
52
52
@@ -59,8 +59,6 @@ This extension requires OpenCL 1.0 or later.
59
59
The basic cl_mem buffer API doesn't enable access to the underlying raw
60
60
pointers in the device memory, preventing its use in host side
61
61
data structures that need pointer references to objects.
62
- This API adds a minimal increment on top of cl_mem that provides such
63
- capabilities.
64
62
65
63
Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature
66
64
that enables raw device side pointers in the OpenCL standard. Its coarse-grain
@@ -69,17 +67,18 @@ coherency requirements, but it requires mapping the buffer's address range
69
67
to the host virtual address space although it might not be needed by the
70
68
application. This is not an issue in systems which can provide virtual memory
71
69
across the platform, but might provide implementation challenges in cases
72
- where the device presents a global memory with its disjoint address space
70
+ where the device presents a global memory with a disjoint address space
73
71
(that can also be a physical memory address space) or, for example, when
74
72
a barebone embedded system lacks virtual memory support altogether.
75
73
76
74
Various higher-level APIs present a memory allocation routine which can
77
75
allocate device-only memory and provide raw pointers to it without guarentees
78
- of system-wide uniqueness: Minimal implementations of OpenMP's omp_target_alloc() and
79
- CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared
76
+ of system-wide uniqueness: For example, minimal implementations of OpenMP's
77
+ omp_target_alloc() and CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared
80
78
address space between the host and the device. This extension is meant to
81
- provide a minimal set of features to implement such APIs without requiring
82
- a shared virtual address space between the host and the device.
79
+ provide a minimal set of features to implement such APIs using the cl_mem
80
+ buffers without requiring a shared virtual address space between the host and
81
+ the device.
83
82
84
83
=== New API Function
85
84
@@ -92,8 +91,8 @@ Enums for enabling device pointer properties when creating a buffer
92
91
93
92
[source]
94
93
----
95
- #define CL_MEM_DEVICE_ADDRESS_EXT (1ul << 31)
96
- #define CL_MEM_DEVICE_PRIVATE_EXT (1ul << 30)
94
+ #define CL_MEM_DEVICE_SHARED_ADDRESS_EXT (1ul << 31)
95
+ #define CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT (1ul << 30)
97
96
----
98
97
99
98
Enums for querying the device pointer from the cl_mem <<clGetMemObjectInfo, the list of supported param_names table>>:
@@ -142,9 +141,9 @@ Add new allocation flags <<clCreateBuffer, List of supported memory flag values
142
141
[width="100%",cols="<50%,<50%",options="header"]
143
142
|====
144
143
| Memory Flags | Description
145
- | {CL_MEM_DEVICE_ADDRESS_EXT_anchor }
144
+ | {CL_MEM_DEVICE_SHARED_ADDRESS_EXT_anchor }
146
145
147
- include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT .asciidoc[]
146
+ include::{generated}/api/version-notes/CL_MEM_DEVICE_SHARED_ADDRESS_EXT .asciidoc[]
148
147
| This flag specifies that the buffer must have a single fixed address
149
148
for its lifetime and the address should be unique at least across the devices
150
149
of the context, but not necessarily within the host (virtual) memory.
@@ -161,22 +160,22 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[]
161
160
client code. If all of the devices in the context do not support
162
161
this type of allocations, an error (CL_INVALID_VALUE) is returned.
163
162
164
- The device addresses of sub-buffers derived from CL_MEM_DEVICE_ADDRESS_EXT
163
+ The device addresses of sub-buffers derived from CL_MEM_DEVICE_SHARED_ADDRESS_EXT
165
164
allocated buffers can be computed by adding the sub-buffer origin to the
166
165
start address.
167
166
168
- | {CL_MEM_DEVICE_PRIVATE_EXT_anchor }
167
+ | {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor }
169
168
170
- include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_EXT .asciidoc[]
171
- | If this flag is combined with CL_MEM_DEVICE_ADDRESS_EXT, each device in
172
- the context can have their own (fixed) device-side address and a copy of
173
- the created buffer which are synchronized implicitly by the runtime.
174
- The main difference to a default cl_mem allocation in that case is then
175
- that the addresses are queriable with CL_MEM_DEVICE_PTRS_EXT and the
176
- per-device address is guaranteed to be the same for the entire lifetime
169
+ include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT .asciidoc[]
170
+ | This flag specifies that the buffer must have a single fixed address
171
+ for its lifetime. Each device in the context can have their own (fixed)
172
+ device-side address and a copy of the created buffer which are synchronized
173
+ implicitly by the runtime. The main difference to a default cl_mem allocation
174
+ in that case is that the addresses are queriable with CL_MEM_DEVICE_PTRS_EXT
175
+ and the per-device address is guaranteed to be the same for the entire lifetime
177
176
of the cl_mem.
178
177
179
- The device addresses of sub-buffers derived from CL_MEM_DEVICE_PRIVATE_EXT
178
+ The device addresses of sub-buffers derived from CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT
180
179
allocated buffers can be computed by adding the sub-buffer origin to the
181
180
device-specific start address.
182
181
@@ -196,7 +195,7 @@ Add a new information type <<clGetMemObjectInfo, List of supported param_names t
196
195
include::{generated}/api/version-notes/CL_MEM_DEVICE_PTR_EXT.asciidoc[]
197
196
| {cl_mem_device_address_EXT_TYPE}
198
197
| Returns the device address for a buffer allocated with
199
- CL_MEM_DEVICE_ADDRESS_EXT . If the buffer was not created with the flag
198
+ CL_MEM_DEVICE_SHARED_ADDRESS_EXT . If the buffer was not created with the flag
200
199
or there are multiple devices in the context and the buffer address is
201
200
not the same for all of them, it returns CL_INVALID_MEM_OBJECT.
202
201
@@ -205,7 +204,7 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_PTRS_EXT.asciidoc[]
205
204
| {cl_mem_device_address_pair_EXT_TYPE}
206
205
| Returns the device-address pairs for all devices in the context.
207
206
The per-device addresses might differ when the buffer was allocated
208
- with the CL_MEM_DEVICE_PRIVATE_EXT enabled.
207
+ with the CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT enabled.
209
208
|====
210
209
211
210
@@ -231,7 +230,7 @@ include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[]
231
230
value is changed by a call to {clSetKernelArgDevicePointer} for _kernel_.
232
231
The device pointer can only be used for arguments that are declared to be a
233
232
pointer to `global` memory allocated with clCreateBuffer() with the
234
- CL_MEM_DEVICE_ADDRESS_EXT flag. The pointer value specified as the argument value
233
+ CL_MEM_DEVICE_SHARED_ADDRESS_EXT flag. The pointer value specified as the argument value
235
234
can be the pointer to the beginning of the buffer or be a pointer offset into
236
235
the buffer region. The device pointer value must be naturally aligned according to
237
236
the argument's type.
@@ -299,6 +298,12 @@ None.
299
298
[options="header"]
300
299
|====
301
300
| *Version* | *Date* | *Author* | *Changes*
301
+ | 0.3.0 | 2024-09-24 | Pekka Jääskeläinen, Karol Herbst |
302
+ Made the allocation flags independent from each other and
303
+ renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and
304
+ CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the
305
+ same address across all devices in the context, whereas the latter
306
+ allows per-device addresses.
302
307
| 0.2.0 | 2024-09-09 | Pekka Jääskeläinen, Karol Herbst |
303
308
Changed the CL_MEM_DEVICE_ADDRESS_EXT wording for multi-device
304
309
cases "all", not "any", covering a case where not all devices
0 commit comments