Name Strings

cl_intel_unified_shared_memory

Contact

Ben Ashbaugh, Intel (ben 'dot' ashbaugh 'at' intel 'dot' com)

Contributors

Ben Ashbaugh, Intel
James Brodman, Intel
Maciej Dziuban, Intel
Krzysztof Gibala, Intel
Wenju He, Intel
Kris Kang, Intel
Michael Kinsner, Intel
Michal Mrozek, Intel
Lukasz Towarek, Intel

Notice

Copyright (c) 2021-2023 Intel Corporation. All rights reserved.

Status

Shipping

Version

Built On: 2023-06-12
Revision: 1.0.0

Dependencies

This extension is written against the OpenCL API Specification Version 3.0.9. This extension extends the clSetKernelExecInfo API from OpenCL 2.0 and hence requires an OpenCL 2.0 platform, however it is intended to be implementable by devices supporting many diverse OpenCL versions.

Overview

This extension adds "Unified Shared Memory" (USM) to OpenCL. Unified Shared Memory provides:

  • Easier integration into existing code bases by representing OpenCL allocations as pointers rather than handles (cl_mems), with full support for pointer arithmetic into allocations.

  • Fine-grain control over ownership and accessibility of OpenCL allocations, to optimally choose between performance and programmer convenience.

  • A simpler programming model, by automatically migrating some allocations between OpenCL devices and the host.

While Unified Shared Memory (USM) shares many features with Shared Virtual Memory (SVM), Unified Shared Memory provides a different mix of capabilities and control. Specifically:

  • The matrix of USM capabilities supports combinations of features beyond the SVM capability queries.

  • USM provides explicit control over memory placement and migration by supporting host allocations with wide visibility, devices allocations for best performance, and shared allocations that may migrate between devices and the host.

  • USM allocations may be associated with both a device and a context. The USM allocation APIs support additional memory flags and optional properties to affect how memory is allocated and migrated.

  • There is no need for APIs to map or unmap USM allocations, because host accessible USM allocations do not need to be mapped or unmapped to access the contents of a USM allocation on the host.

  • An application may indicate that a kernel may access categories of USM allocations indirectly, without passing a set of all indirectly accessed USM allocations to the kernel, improving usability and reducing driver overhead for kernels that access many USM allocations.

  • USM adds API functions to query properties of a USM allocation and to provide memory advice for an allocation.

Unified Shared Memory and Shared Virtual Memory can and will coexist for many implementations. All implementations that support Shared Virtual Memory may support at least some types of Unified Shared Memory.

New API Functions

void*   clHostMemAllocINTEL(
            cl_context context,
            const cl_mem_properties_intel* properties,
            size_t size,
            cl_uint alignment,
            cl_int* errcode_ret);

void*   clDeviceMemAllocINTEL(
            cl_context context,
            cl_device_id device,
            const cl_mem_properties_intel* properties,
            size_t size,
            cl_uint alignment,
            cl_int* errcode_ret);

void*   clSharedMemAllocINTEL(
            cl_context context,
            cl_device_id device,
            const cl_mem_properties_intel* properties,
            size_t size,
            cl_uint alignment,
            cl_int* errcode_ret);

cl_int  clMemFreeINTEL(
            cl_context context,
            void* ptr);

cl_int  clMemBlockingFreeINTEL(
            cl_context context,
            void* ptr);

cl_int  clGetMemAllocInfoINTEL(
            cl_context context,
            const void* ptr,
            cl_mem_info_intel param_name,
            size_t param_value_size,
            void* param_value,
            size_t* param_value_size_ret);

cl_int  clSetKernelArgMemPointerINTEL(
            cl_kernel kernel,
            cl_uint arg_index,
            const void* arg_value);

cl_int  clEnqueueMemFillINTEL(
            cl_command_queue command_queue,
            void* dst_ptr,
            const void* pattern,
            size_t pattern_size,
            size_t size,
            cl_uint num_events_in_wait_list,
            const cl_event* event_wait_list,
            cl_event* event);

cl_int  clEnqueueMemcpyINTEL(
            cl_command_queue command_queue,
            cl_bool blocking,
            void* dst_ptr,
            const void* src_ptr,
            size_t size,
            cl_uint num_events_in_wait_list,
            const cl_event* event_wait_list,
            cl_event* event);

cl_int  clEnqueueMigrateMemINTEL(
            cl_command_queue command_queue,
            const void* ptr,
            size_t size,
            cl_mem_migration_flags flags,
            cl_uint num_events_in_wait_list,
            const cl_event* event_wait_list,
            cl_event* event);

cl_int  clEnqueueMemAdviseINTEL(
            cl_command_queue command_queue,
            const void* ptr,
            size_t size,
            cl_mem_advice_intel advice,
            cl_uint num_events_in_wait_list,
            const cl_event* event_wait_list,
            cl_event* event);

New API Enums

Accepted value for the param_name parameter to clGetDeviceInfo to query the Unified Shared Memory capabilities of an OpenCL device:

#define CL_DEVICE_HOST_MEM_CAPABILITIES_INTEL                   0x4190
#define CL_DEVICE_DEVICE_MEM_CAPABILITIES_INTEL                 0x4191
#define CL_DEVICE_SINGLE_DEVICE_SHARED_MEM_CAPABILITIES_INTEL   0x4192
#define CL_DEVICE_CROSS_DEVICE_SHARED_MEM_CAPABILITIES_INTEL    0x4193
#define CL_DEVICE_SHARED_SYSTEM_MEM_CAPABILITIES_INTEL          0x4194

Bitfield type and bits describing the Unified Shared Memory capabilities of an OpenCL device:

typedef cl_bitfield cl_device_unified_shared_memory_capabilities_intel;

#define CL_UNIFIED_SHARED_MEMORY_ACCESS_INTEL                   (1 << 0)
#define CL_UNIFIED_SHARED_MEMORY_ATOMIC_ACCESS_INTEL            (1 << 1)
#define CL_UNIFIED_SHARED_MEMORY_CONCURRENT_ACCESS_INTEL        (1 << 2)
#define CL_UNIFIED_SHARED_MEMORY_CONCURRENT_ATOMIC_ACCESS_INTEL (1 << 3)

Type to describe optional Unified Shared Memory allocation properties:

typedef cl_bitfield cl_mem_properties_intel;

Enumerant value requesting optional allocation properties for a Unified Shared Memory allocation:

#define CL_MEM_ALLOC_FLAGS_INTEL        0x4195

Bitfield type and bits describing optional allocation properties for a Unified Shared Memory allocation:

typedef cl_bitfield cl_mem_alloc_flags_intel;

#define CL_MEM_ALLOC_WRITE_COMBINED_INTEL               (1 << 0)
#define CL_MEM_ALLOC_INITIAL_PLACEMENT_DEVICE_INTEL     (1 << 1)
#define CL_MEM_ALLOC_INITIAL_PLACEMENT_HOST_INTEL       (1 << 2)

Enumeration type and values for the param_name parameter to clGetMemAllocInfoINTEL to query information about a Unified Shared Memory allocation. Optional allocation properties may also be queried using clGetMemAllocInfoINTEL:

typedef cl_uint cl_mem_info_intel;

#define CL_MEM_ALLOC_TYPE_INTEL         0x419A
#define CL_MEM_ALLOC_BASE_PTR_INTEL     0x419B
#define CL_MEM_ALLOC_SIZE_INTEL         0x419C
#define CL_MEM_ALLOC_DEVICE_INTEL       0x419D
/* CL_MEM_ALLOC_FLAGS_INTEL - defined above */

Enumeration type and values describing the type of Unified Shared Memory allocation. Returned by clGetMemAllocInfoINTEL when param_name is CL_MEM_ALLOC_TYPE_INTEL:

typedef cl_uint cl_unified_shared_memory_type_intel;

#define CL_MEM_TYPE_UNKNOWN_INTEL       0x4196
#define CL_MEM_TYPE_HOST_INTEL          0x4197
#define CL_MEM_TYPE_DEVICE_INTEL        0x4198
#define CL_MEM_TYPE_SHARED_INTEL        0x4199

Enumeration type and values for the advice parameter to clEnqueueMemAdviseINTEL to provide memory advice for a Unified Shared Memory allocation:

typedef cl_uint cl_mem_advice_intel;
/* Enum values 0x4208-0x420F are reserved for future memory advices. */

Accepted value for the param_name parameter to clSetKernelExecInfo to specify that the kernel may indirectly access Unified Shared Memory allocations of the specified type:

#define CL_KERNEL_EXEC_INFO_INDIRECT_HOST_ACCESS_INTEL      0x4200
#define CL_KERNEL_EXEC_INFO_INDIRECT_DEVICE_ACCESS_INTEL    0x4201
#define CL_KERNEL_EXEC_INFO_INDIRECT_SHARED_ACCESS_INTEL    0x4202

Accepted value for the param_name parameter to clSetKernelExecInfo to specify a set of Unified Shared Memory allocations that the kernel may indirectly access:

#define CL_KERNEL_EXEC_INFO_USM_PTRS_INTEL                  0x4203

New return values from clGetEventInfo when param_name is CL_EVENT_COMMAND_TYPE:

#define CL_COMMAND_MEMFILL_INTEL        0x4204
#define CL_COMMAND_MEMCPY_INTEL         0x4205
#define CL_COMMAND_MIGRATEMEM_INTEL     0x4206
#define CL_COMMAND_MEMADVISE_INTEL      0x4207

Modifications to the OpenCL API Specification

Section 4.2 - Querying Devices:

Add to Table 5 - List of supported param_names by clGetDeviceInfo:

Table 5. List of supported param_names by clGetDeviceInfo
Device Info Return Type Description

CL_DEVICE_HOST_​MEM_CAPABILITIES_INTEL

CL_DEVICE_DEVICE_​MEM_CAPABILITIES_INTEL

CL_DEVICE_SINGLE_DEVICE_SHARED_​MEM_CAPABILITIES_INTEL

CL_DEVICE_CROSS_DEVICE_SHARED_​MEM_CAPABILITIES_INTEL

CL_DEVICE_SHARED_SYSTEM_​MEM_CAPABILITIES_INTEL

cl_device_unified_shared_​memory_capabilities_intel

Describes the ability for a device to access Unified Shared Memory allocations of the specified type.

The host memory access capabilities apply to any host allocation.

The device memory access capabilities apply to any device allocation associated with this device.

The single device shared memory access capabilities apply to any shared allocation associated with this device.

The cross-device shared memory access capabilities apply to any shared allocation associated with this device, or to any shared memory allocation on another device that also supports the same cross-device shared memory access capability.

The shared system memory access capabilities apply to any allocations made by a system allocator, such as malloc or new.

The access capabilities are encoded as bits in a bitfield. Supported capabilities are:

CL_UNIFIED_SHARED_MEMORY_ACCESS_INTEL: The device may access (read or write) Unified Shared Memory allocations of this type.

CL_UNIFIED_SHARED_MEMORY_ATOMIC_ACCESS_INTEL: The device may perform atomic operations on Unified Shared Memory allocations of this type.

CL_UNIFIED_SHARED_MEMORY_CONCURRENT_ACCESS_INTEL: The device supports concurrent access to Unified Shared Memory allocations of this type. Concurrent access may be from the host, or from other OpenCL devices, where applicable.

CL_UNIFIED_SHARED_MEMORY_CONCURRENT_ATOMIC_ACCESS_INTEL: The device supports concurrent atomic access to Unified Shared Memory allocations of this type.

New Section 5.X - Unified Shared Memory

This section describes Unified Shared Memory, abbreviated USM. Unified Shared Memory allocations are represented as pointers in the host application, rather than as handles (specifically, cl_mems). Unified Shared Memory additionally provides fine-grain control over placement and accessibility of an allocation, allowing many tradeoffs between programmer convenience and performance.

Three types of Unified Shared Memory allocations are supported. The type describes the ownership of the allocation:

  1. Host allocations are owned by the host and are intended to be allocated out of system memory. Host allocations are accessible by the host and one or more devices. The same pointer to a host allocation may be used on the host and all supported devices; they have address equivalence. Host allocations are not expected to migrate between system memory and device local memory. Host allocations trade off wide accessibility and transfer benefits for potentially higher per-access costs, such as over PCI express.

  2. Device allocations are owned by a specific device and are intended to be allocated out of device local memory, if present. Device allocations generally trade off access limitations for higher performance. With very few exceptions, device allocations may only be accessed by the specific device they are allocated on, or copied to a host or another device allocation. The same pointer to a device allocation may be used on any supported device.

  3. Shared allocations share ownership and are intended to migrate between the host and one or more devices. Shared allocations are accessible by at least the host and an associated device. Shared allocations may be accessed by other devices in some cases. Shared allocations trade off transfer costs for per-access benefits. The same pointer to a shared allocation may be used on the host and all supported devices.

A Shared System allocation is a sub-class of a Shared allocation, where the memory is allocated by a system allocator - such as malloc or new - rather than by a USM allocation API. Shared system allocations have no associated device - they are inherently cross-device. Like other shared allocations, shared system allocations are intended to migrate between the host and supported devices, and the same pointer to a shared system allocation may be used on the host and all supported devices.

Table 1. Summary of Unified Shared Memory Capabilities
Name Initial Location Accessible By Migratable To

Host

Host

Host

Yes

Host

N/A

Any Device

Yes (perhaps over a bus, such as PCIe)

Device

No

Device

Specific Device

Host

No

Host

No

Specific Device

Yes

Device

N/A

Another Device

Optional

Another Device

No

Shared

Host, or Specific Device, Or Unspecified

Host

Yes

Host

Yes

Specific Device

Yes

Device

Yes

Another Device

Optional

Another Device

Optional

Shared System

Host

Host

Yes

Host

Yes

Device

Yes

Device

Yes

OpenCL devices may support different capabilities for each type of Unified Shared Memory allocation. Supported capabilities are:

  • CL_UNIFIED_SHARED_MEMORY_ACCESS_INTEL: The device may access (read or write) Unified Shared Memory allocations of this type.

  • CL_UNIFIED_SHARED_MEMORY_ATOMIC_ACCESS_INTEL: The device may perform atomic operations on Unified Shared Memory allocations of this type.

  • CL_UNIFIED_SHARED_MEMORY_CONCURRENT_ACCESS_INTEL: The device supports concurrent access to Unified Shared Memory allocations of this type. Concurrent access may be from the host, or from other OpenCL devices, where applicable.

  • CL_UNIFIED_SHARED_MEMORY_CONCURRENT_ATOMIC_ACCESS_INTEL: The device supports concurrent atomic access to Unified Shared Memory allocations of this type.

Some devices may oversubscribe some shared allocations. When and how such oversubscription occurs, including which allocations are evicted when the working set changes, are considered implementation details.

The minimum set of capabilities are:

Table 2. Minimum Unified Shared Memory Capabilities
Allocation Type Access Atomic Access Concurrent Access Concurrent Atomic Access

Host

Optional

Optional

Optional

Optional

Device

Required

Optional

Optional

Optional

Shared

Optional

Optional

Optional

Optional

Shared (Cross-Device)

Optional

Optional

Optional

Optional

Shared System (Cross-Device)

Optional

Optional

Optional

Optional

Allocating and Freeing Unified Shared Memory

Host Allocations

The function

void*   clHostMemAllocINTEL(
            cl_context context,
            const cl_mem_properties_intel* properties,
            size_t size,
            cl_uint alignment,
            cl_int* errcode_ret);

allocates host Unified Shared Memory.

context is a valid OpenCL context used to allocate the host memory.

properties is an optional list of allocation properties and their corresponding values. The list is terminated with the special property 0. If no allocation properties are required, properties may be NULL. Please refer to the table below for valid property values and their description.

size is the size in bytes of the requested host allocation.

alignment is the minimum alignment in bytes for the requested host allocation. It must be a power of two and must be equal to or smaller than the size of the largest data type supported by any OpenCL device in context. If alignment is 0, a default alignment will be used that is equal to the size of the largest data type supported by any OpenCL device in context.

errcode_ret may return an appropriate error code. If errcode_ret is NULL then no error code will be returned.

clHostMemAllocINTEL will return a valid non-NULL address and CL_SUCCESS will be returned in errcode_ret if the host Unified Shared Memory is allocated successfully. Otherwise, NULL will be returned, and errcode_ret will be set to one of the following error values:

  • CL_INVALID_CONTEXT if context is not a valid context.

  • CL_INVALID_OPERATION if CL_DEVICE_HOST_MEM_CAPABILITIES_INTEL is zero for all devices in context, indicating that no devices in context support host Unified Shared Memory allocations.

  • CL_INVALID_VALUE if alignment is not zero or a power of two.

  • CL_INVALID_VALUE if alignment is greater than the size of the largest data type supported by any OpenCL device in context that supports host Unified Shared Memory allocations.

  • CL_INVALID_PROPERTY if a memory property name in properties is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once.

  • CL_INVALID_PROPERTY if either the CL_MEM_ALLOC_INITIAL_PLACEMENT_DEVICE_INTEL or CL_MEM_ALLOC_INITIAL_PLACEMENT_HOST_INTEL flags are specified.

  • CL_INVALID_BUFFER_SIZE if size is zero or greater than CL_DEVICE_MAX_MEM_ALLOC_SIZE for any OpenCL device in context that supports host Unified Shared Memory allocations.

  • CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.

  • CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host.

Device Allocations

The function

void*   clDeviceMemAllocINTEL(
            cl_context context,
            cl_device_id device,
            const cl_mem_properties_intel* properties,
            size_t size,
            cl_uint alignment,
            cl_int* errcode_ret);

allocates Unified Shared Memory specific to an OpenCL device.

context is a valid OpenCL context used to allocate the device memory.

device is a valid OpenCL device ID to associate with the allocation.

properties is an optional list of allocation properties and their corresponding values. The list is terminated with the special property 0. If no allocation properties are required, properties may be NULL. Please refer to the table below for valid property values and their description.

size is the size in bytes of the requested device allocation.

alignment is the minimum alignment in bytes for the requested device allocation. It must be a power of two and must be equal to or smaller than the size of the largest data type supported by device. If alignment is 0, a default alignment will be used that is equal to the size of largest data type supported by device.

errcode_ret may return an appropriate error code. If errcode_ret is NULL then no error code will be returned.

clDeviceMemAllocINTEL will return a valid non-NULL address and CL_SUCCESS will be returned in errcode_ret if the device Unified Shared Memory is allocated successfully. Otherwise, NULL will be returned, and errcode_ret will be set to one of the following error values:

  • CL_INVALID_CONTEXT if context is not a valid context.

  • CL_INVALID_DEVICE if device is not a valid device or is not associated with context.

  • CL_INVALID_OPERATION if CL_DEVICE_DEVICE_MEM_CAPABILITIES_INTEL is zero for device, indicating that device does not support device Unified Shared Memory allocations.

  • CL_INVALID_VALUE if alignment is not zero or a power of two.

  • CL_INVALID_VALUE if alignment is greater than the size of the largest data type supported by device.

  • CL_INVALID_PROPERTY if a memory property name in properties is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once.

  • CL_INVALID_PROPERTY if either the CL_MEM_ALLOC_INITIAL_PLACEMENT_DEVICE_INTEL or CL_MEM_ALLOC_INITIAL_PLACEMENT_HOST_INTEL flags are specified.

  • CL_INVALID_BUFFER_SIZE if size is zero or greater than CL_DEVICE_MAX_MEM_ALLOC_SIZE for device.

  • CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.

  • CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host.

Shared Allocations

The function

void*   clSharedMemAllocINTEL(
            cl_context context,
            cl_device_id device,
            const cl_mem_properties_intel* properties,
            size_t size,
            cl_uint alignment,
            cl_int* errcode_ret);

allocates Unified Shared Memory with shared ownership between the host and the specified OpenCL device. If the specified OpenCL device supports cross-device access capabilities, the allocation is also accessible by other OpenCL devices in the context that have cross-device access capabilities.

context is a valid OpenCL context used to allocate the Unified Shared Memory.

device is an optional OpenCL device ID to associate with the allocation. If device is NULL then the allocation is not associated with any device. Allocations with no associated device are accessible by the host and OpenCL devices in the context that have cross-device access capabilities.

properties is an optional list of allocation properties and their corresponding values. The list is terminated with the special property 0. If no allocation properties are required, properties may be NULL. Please refer to the table below for valid property values and their description.

size is the size in bytes of the requested shared allocation.

alignment is the minimum alignment in bytes for the requested shared allocation. It must be a power of two and must be equal to or smaller than the size of the largest data type supported by device. If alignment is 0, a default alignment will be used that is equal to the size of largest data type supported by device. If device is NULL, alignment must be a power of two equal to or smaller than the size of the largest data type supported by any OpenCL device in context, and the default alignment will be equal to the size of the largest data type supported by any OpenCL device in context.

errcode_ret may return an appropriate error code. If errcode_ret is NULL then no error code will be returned.

clSharedMemAllocINTEL will return a valid non-NULL address and CL_SUCCESS will be returned in errcode_ret if the shared Unified Shared Memory is allocated successfully. Otherwise, NULL will be returned, and errcode_ret will be set to one of the following error values:

  • CL_INVALID_CONTEXT if context is not a valid context.

  • CL_INVALID_DEVICE if device is not NULL and is either not a valid device or is not associated with context.

  • CL_INVALID_OPERATION if device is not NULL and CL_DEVICE_SINGLE_DEVICE_SHARED_MEM_CAPABILITIES_INTEL and CL_DEVICE_CROSS_DEVICE_SHARED_MEM_CAPABILITIES_INTEL are both zero, indicating that device does not support shared Unified Shared Memory allocations, or if device is NULL and no devices in context support shared Unified Shared Memory allocations.

  • CL_INVALID_VALUE if alignment is not zero or a power of two.

  • CL_INVALID_VALUE if device is not NULL and alignment is greater than the size of the largest data type supported by device, or if device is NULL and alignment is greater than the size of the largest data type supported by any OpenCL device in context that supports shared Unified Shared Memory allocations.

  • CL_INVALID_PROPERTY if a memory property name in properties is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once.

  • CL_INVALID_PROPERTY if both CL_MEM_ALLOC_INITIAL_PLACEMENT_DEVICE_INTEL and CL_MEM_ALLOC_INITIAL_PLACEMENT_HOST_INTEL flags are specified.

  • CL_INVALID_BUFFER_SIZE if size is zero, or if device is not NULL and size is greater than CL_DEVICE_MAX_MEM_ALLOC_SIZE for device, or if device is NULL and size is greater than CL_DEVICE_MAX_MEM_ALLOC_SIZE for any device in context that supports shared Unified Shared Memory allocations.

  • CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.

  • CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host.

Freeing Allocations

The functions

cl_int  clMemFreeINTEL(
            cl_context context,
            void* ptr);

cl_int  clMemBlockingFreeINTEL(
            cl_context context,
            void* ptr);

free a Unified Shared Memory allocation.

context is a valid OpenCL context used to free the Unified Shared Memory allocation.

ptr is the Unified Shared Memory allocation to free. It must be a value returned by clHostMemAllocINTEL, clDeviceMemAllocINTEL, or clSharedMemAllocINTEL, or a NULL pointer. If ptr is NULL then no action occurs.

Note that clMemFreeINTEL may not wait for previously enqueued commands that may be using ptr to finish before freeing ptr. It is the responsibility of the application to make sure enqueued commands that use ptr are complete before freeing ptr. Applications should take particular care freeing memory allocations with kernels that may access memory indirectly, since a kernel with indirect memory access counts as using all memory allocations of the specified type or types. To wait for previously enqueued commands to finish that may be using ptr before freeing ptr, use the clMemBlockingFreeINTEL function instead.

clMemFreeINTEL and clMemBlockingFreeINTEL will return CL_SUCCESS if the function executes successfully. Otherwise, they will return one of the following error values:

  • CL_INVALID_CONTEXT if context is not a valid context.

  • CL_INVALID_VALUE if ptr is not a value returned by clHostMemAllocINTEL, clDeviceMemAllocINTEL, clSharedMemAllocINTEL, or a NULL pointer.

  • CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.

  • CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host.

Controlling Allocations

The table below describes allocation properties that may be passed to control allocation behavior.

Table 3. List of Supported cl_mem_properties_intel Properties
Property Property Type Description

CL_MEM_ALLOC_FLAGS_INTEL

cl_mem_alloc_flags_intel

Flags specifying allocation and usage information. This is a bitfield type that may be set to a combination of the following values:

CL_MEM_ALLOC_WRITE_COMBINED_INTEL: Request write combined (WC) memory. Write combined memory may improve performance in some cases, however write combined memory must be used with care since it may hurt performance in other cases or use different coherency protocols than non-write combined memory.

CL_MEM_ALLOC_INITIAL_PLACEMENT_DEVICE_INTEL: Request the implementation to optimize for first access being done by the device. This flag is valid only for clSharedMemAllocINTEL. This flag does not affect functionality and is purely a performance hint.

CL_MEM_ALLOC_INITIAL_PLACEMENT_HOST_INTEL: Request the implementation to optimize for first access being done by the host. This flag is valid only for clSharedMemAllocINTEL. This flag does not affect functionality and is purely a performance hint.

CL_MEM_ALLOC_INITIAL_PLACEMENT_DEVICE_INTEL and CL_MEM_ALLOC_INITIAL_PLACEMENT_HOST_INTEL are mutually exclusive.

Unified Shared Memory Queries

The function

cl_int  clGetMemAllocInfoINTEL(
            cl_context context,
            const void* ptr,
            cl_mem_info_intel param_name,
            size_t param_value_size,
            void* param_value,
            size_t* param_value_size_ret);

queries information about a Unified Shared Memory allocation.

context is a valid OpenCL context to query for information about the Unified Shared Memory allocation.

ptr is a pointer into a Unified Shared Memory allocation to query. ptr need not be a value returned by clHostMemAllocINTEL, clDeviceMemAllocINTEL, or clSharedMemAllocINTEL, but the query may be faster if it is.

param_name specifies the information to query. The list of supported param_name values and the information returned in param_value is described in the Unified Memory Allocation Queries table.

param_value is a pointer to memory where the appropriate result being queried is returned. If param_value is NULL, it is ignored.

param_value_size is used to specify the size in bytes of memory pointed to by param_value. This size must be greater than or equal to the size of return type as described in the Unified Memory Allocation Queries table. If param_value is NULL, it is ignored.

param_value_size_ret returns the actual size in bytes of data being queried by param_name. If param_value_size_ret is NULL, it is ignored.

clGetMemAllocInfoINTEL returns CL_SUCCESS if the function is executed successfully. Otherwise, it will return one of the following error values:

  • CL_INVALID_CONTEXT if context is not a valid context.

  • CL_INVALID_VALUE if param_name is not a valid Unified Shared Memory allocation query.

  • CL_INVALID_VALUE if param_value is not NULL and param_value_size is smaller than the size of the query return type.

  • CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.

  • CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host.

Table 4. List of supported param_names by clGetMemAllocInfoINTEL
cl_mem_info_intel Return type Info. returned in param_value

CL_MEM_ALLOC_TYPE_INTEL

cl_unified_shared_memory_type_intel

Returns the type of the Unified Shared Memory allocation.

Returns CL_MEM_TYPE_HOST_INTEL for allocations made by clHostMemAllocINTEL . Returns CL_MEM_TYPE_DEVICE_INTEL for allocations made by clDeviceMemAllocINTEL. Returns CL_MEM_TYPE_SHARED_INTEL for allocations made by clSharedMemAllocINTEL. Returns CL_MEM_TYPE_UNKNOWN_INTEL if the type of the Unified Shared Memory allocation cannot be determined or if ptr does not point into a Unified Shared Memory allocation.

CL_MEM_ALLOC_BASE_PTR_INTEL

void*

Returns the base address of the Unified Shared Memory allocation.

Returns NULL for CL_MEM_TYPE_UNKNOWN_INTEL allocations.

CL_MEM_ALLOC_SIZE_INTEL

size_t

Returns the size in bytes of the Unified Shared Memory allocation.

Returns 0 for CL_MEM_TYPE_UNKNOWN_INTEL allocations.

CL_MEM_ALLOC_DEVICE_INTEL

cl_device_id

Returns the device associated with the Unified Shared Memory allocation.

Returns NULL for CL_MEM_TYPE_HOST_INTEL allocations, for CL_MEM_TYPE_SHARED_INTEL allocations with no associated device, and for CL_MEM_TYPE_UNKNOWN_INTEL allocations.

CL_MEM_ALLOC_FLAGS_INTEL

cl_mem_alloc_flags_intel

Returns allocation flags for the Unified Shared Memory allocation.

Returns 0 if no allocation flags were specified for the Unified Shared Memory allocation and for CL_MEM_TYPE_UNKNOWN_INTEL allocations.

Using Unified Shared Memory with Kernels

The function

cl_int  clSetKernelArgMemPointerINTEL(
            cl_kernel kernel,
            cl_uint arg_index,
            const void* arg_value);

is used to set a pointer into a Unified Shared Memory allocation as an argument to a kernel.

kernel is a valid kernel object.

arg_index is the argument index to set. Arguments to the kernel are referred to by indices that go from 0 for the leftmost argument to n - 1, where n is the total number of arguments declared by a kernel.

arg_value is the pointer value that should be used as the argument specified by arg_index. The pointer value will be used as the argument by all API calls that enqueue a kernel until the argument value is set to a different pointer value by a subsequent call. A pointer into Unified Shared Memory allocation may only be set as an argument value for an argument declared to be a pointer to global or constant memory. For devices supporting shared system allocations, any pointer value is valid. Otherwise, the pointer value must be NULL or must point into a Unified Shared Memory allocation returned by clHostMemAllocINTEL, clDeviceMemAllocINTEL, or clSharedMemAllocINTEL.

clSetKernelArgMemPointerINTEL returns CL_SUCCESS if the function is executed successfully. Otherwise, it will return one of the following errors:

  • CL_INVALID_KERNEL if kernel is not a valid kernel object.

  • CL_INVALID_ARG_INDEX if arg_index is not a valid argument index.

  • CL_INVALID_ARG_VALUE if arg_value is not a valid argument value.

  • CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.

  • CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host.

In addition to direct use of a Unified Shared Memory allocation as a kernel argument, Unified Shared Memory allocations may be accessed by kernels indirectly. The new param_name values described below may be used with the existing clSetKernelExecInfo function to describe how Unified Shared Memory allocations are accessed indirectly by a kernel:

Table 28. List of supported param_names by clSetKernelExecInfo
cl_kernel_exec_info Type Description

CL_KERNEL_EXEC_INFO_​USM_PTRS_INTEL

void*[]

Specifies an explicit set of Unified Shared Memory allocations accessed indirectly by the kernel. The new set replaces any previously specified set of Unified Shared Memory allocations.

Initially, the set of Unified Shared Memory allocations accessed indirectly by the kernel is the empty set.

CL_KERNEL_EXEC_INFO_​INDIRECT_HOST_ACCESS_INTEL

cl_bool

Specifies that the kernel may access any host Unified Shared Memory allocation indirectly.

By default, the value for this flag is CL_FALSE, indicating that the kernel will only access explicitly specified host Unified Shared Memory allocations.

CL_KERNEL_EXEC_INFO_​INDIRECT_DEVICE_ACCESS_INTEL

cl_bool

Specifies that the kernel may access any device Unified Shared Memory allocation indirectly.

By default, the value for this flag is CL_FALSE, indicating that the kernel will only access explicitly specified device Unified Shared Memory allocations.

CL_KERNEL_EXEC_INFO_​INDIRECT_SHARED_ACCESS_INTEL

cl_bool

Specifies that the kernel may access any shared Unified Shared Memory allocation indirectly.

By default, the value for this flag is CL_FALSE, indicating that the kernel will only access explicitly specified shared Unified Shared Memory allocations.

Filling and Copying Unified Shared Memory

The function

cl_int  clEnqueueMemFillINTEL(
            cl_command_queue command_queue,
            void* dst_ptr,
            const void* pattern,
            size_t pattern_size,
            size_t size,
            cl_uint num_events_in_wait_list,
            const cl_event* event_wait_list,
            cl_event* event);

fills a region of a memory with the specified pattern.

command_queue is a valid host command-queue. The memory fill command will be queued for execution on the device associated with command_queue.

dst_ptr is a pointer to the start of the memory region to fill. The Unified Shared Memory allocation pointed to by dst_ptr must be valid for the context associated with command_queue, must be accessible by the device associated with command_queue, and must be aligned to pattern_size bytes.

pattern is a pointer to the value to write to the Unified Shared Memory region. The memory associated with pattern can be reused or freed after the function returns.

pattern_size describes the size of of the value to write to the Unified Shared Memory region, in bytes. pattern_size must be a power of two and must be less than or equal to the size of the largest integer or floating-point vector data type supported by the device.

size describes the size of the memory region to set, in bytes. size must be a multiple of pattern_size.

event_wait_list and num_events_in_wait_list specify events that need to complete before this command can be executed. If event_wait_list is NULL, then this command does not wait on any event to complete. If event_wait_list is NULL, num_events_in_wait_list must be 0. If event_wait_list is not NULL, the list of events pointed to by event_wait_list must be valid and num_events_in_wait_list must be greater than 0. The events specified in event_wait_list act as synchronization points. The context associated with events in event_wait_list and command_queue must be the same. The memory associated with event_wait_list can be reused or freed after the function returns.

event returns a unique event object that identifies this command. If event is NULL, no event will be created and therefore it will not be possible to query or wait for this command. If the event_wait_list and the event arguments are not NULL, the event argument must not refer to an element of the event_wait_list array.

clEnqueueMemFillINTEL returns CL_SUCCESS if the command is queued successfully. Otherwise, it will return one of the following errors:

  • CL_INVALID_COMMAND_QUEUE if command_queue is not a valid host command-queue.

  • CL_INVALID_CONTEXT if the context associated with command_queue and events in event_wait_list are not the same.

  • CL_INVALID_VALUE if dst_ptr is NULL, or if dst_ptr is not aligned to pattern_size bytes.

  • CL_INVALID_VALUE if pattern is NULL.

  • CL_INVALID_VALUE if pattern_size is not a power of two or is greater than the size of the largest integer or floating-point vector data type supported by the device associated with command_queue.

  • CL_INVALID_VALUE if size is not a multiple of pattern_size.

  • CL_INVALID_EVENT_WAIT_LIST if event_wait_list is NULL and num_events_in_wait_list is greater than zero, or if event_wait_list is not NULL and num_events_in_wait_list is zero, or if event objects in event_wait_list are not valid events.

  • CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.

  • CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host.

The function

cl_int  clEnqueueMemcpyINTEL(
            cl_command_queue command_queue,
            cl_bool blocking,
            void* dst_ptr,
            const void* src_ptr,
            size_t size,
            cl_uint num_events_in_wait_list,
            const cl_event* event_wait_list,
            cl_event* event);

copies a region of memory from one location to another.

command_queue is a valid host command-queue. The memory copy command will be queued for execution on the device associated with command_queue.

blocking indicates if the copy operation is blocking or non-blocking. If blocking is CL_TRUE, the copy command is blocking, and the function will not return until the copy command is complete. Otherwise, if blocking is CL_FALSE, the copy command is non-blocking, and the contents of the dst_ptr cannot be used nor can the contents of the src_ptr be overwritten until the copy command is complete.

dst_ptr is a pointer to the start of the memory region to copy to. If dst_ptr is a pointer into a Unified Shared Memory allocation it must be valid for the context associated with command_queue.

src_ptr is a pointer to the start of the memory region to copy from. If src_ptr is a pointer into a Unified Shared Memory allocation it must be valid for the context associated with command_queue.

size describes the size of the memory region to copy, in bytes.

event_wait_list and num_events_in_wait_list specify events that need to complete before this command can be executed. If event_wait_list is NULL, then this command does not wait on any event to complete. If event_wait_list is NULL, num_events_in_wait_list must be 0. If event_wait_list is not NULL, the list of events pointed to by event_wait_list must be valid and num_events_in_wait_list must be greater than 0. The events specified in event_wait_list act as synchronization points. The context associated with events in event_wait_list and command_queue must be the same. The memory associated with event_wait_list can be reused or freed after the function returns.

event returns a unique event object that identifies this command. If event is NULL, no event will be created and therefore it will not be possible to query or wait for this command. If the event_wait_list and the event arguments are not NULL, the event argument must not refer to an element of the event_wait_list array.

clEnqueueMemcpyINTEL returns CL_SUCCESS if the command is queued successfully. Otherwise, it will return one of the following errors:

  • CL_INVALID_COMMAND_QUEUE if command_queue is not a valid host command-queue.

  • CL_INVALID_CONTEXT if the context associated with command_queue and events in event_wait_list are not the same.

  • CL_INVALID_VALUE if either dst_ptr or src_ptr are NULL.

  • CL_INVALID_EVENT_WAIT_LIST if event_wait_list is NULL and num_events_in_wait_list is greater than zero, or if event_wait_list is not NULL and num_events_in_wait_list is zero, or if event objects in event_wait_list are not valid events.

  • CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the copy operation is blocking and the execution status of any of the events in event_wait_list is a negative integer value.

  • CL_MEM_COPY_OVERLAP if the values specified for dst_ptr, src_ptr and size result in an overlapping copy.

  • CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.

  • CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host.

Unified Shared Memory Hints

The function

cl_int  clEnqueueMigrateMemINTEL(
            cl_command_queue command_queue,
            const void* ptr,
            size_t size,
            cl_mem_migration_flags flags,
            cl_uint num_events_in_wait_list,
            const cl_event* event_wait_list,
            cl_event* event);

explicitly migrates a region of a shared Unified Shared Memory allocation to the device associated with command_queue. This is a hint that may improve performance and is not required for correctness. Memory migration may not be supported for all allocation types for all devices. If memory migration is not supported for the specified memory range then the migration hint may be ignored. Memory migration may only be supported at a device-specific granularity, such as a page boundary. In this case, the memory range may be expanded such that the start and end of the range satisfy the granularity requirements.

command_queue is a valid host command-queue. The memory migration command will be queued for execution on the device associated with command_queue.

ptr is a pointer to the start of the shared Unified Shared Memory allocation to migrate.

size describes the size of the memory region to migrate.

flags is a bit-field that is used to specify memory migration options.

event_wait_list and num_events_in_wait_list specify events that need to complete before this command can be executed. If event_wait_list is NULL, then this command does not wait on any event to complete. If event_wait_list is NULL, num_events_in_wait_list must be 0. If event_wait_list is not NULL, the list of events pointed to by event_wait_list must be valid and num_events_in_wait_list must be greater than 0. The events specified in event_wait_list act as synchronization points. The context associated with events in event_wait_list and command_queue must be the same. The memory associated with event_wait_list can be reused or freed after the function returns.

event returns a unique event object that identifies this command. If event is NULL, no event will be created and therefore it will not be possible to query or wait for this command. If the event_wait_list and the event arguments are not NULL, the event argument must not refer to an element of the event_wait_list array.

clEnqueueMigrateMemINTEL returns CL_SUCCESS if the command is queued successfully. Otherwise, it will return one of the following errors:

  • CL_INVALID_COMMAND_QUEUE if command_queue is not a valid host command-queue.

  • CL_INVALID_CONTEXT if the context associated with command_queue and events in event_wait_list are not the same.

  • CL_INVALID_VALUE TODO, are any values of ptr and size considered invalid?

  • CL_INVALID_VALUE if flags is zero or is not a supported combination of memory migration flags.

  • CL_INVALID_EVENT_WAIT_LIST if event_wait_list is NULL and num_events_in_wait_list is greater than zero, or if event_wait_list is not NULL and num_events_in_wait_list is zero, or if event objects in event_wait_list are not valid events.

  • CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.

  • CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host.

The function

cl_int  clEnqueueMemAdviseINTEL(
            cl_command_queue command_queue,
            const void* ptr,
            size_t size,
            cl_mem_advice_intel advice,
            cl_uint num_events_in_wait_list,
            const cl_event* event_wait_list,
            cl_event* event);

provides advice about a region of a shared Unified Shared Memory allocation. Memory advice is a performance hint only and is not required for correctness. Providing memory advice hints may override driver heuristics that control shared memory behavior. Not all memory advice hints may be supported for all allocation types for all devices. If a memory advice hint is not supported by the device it will be ignored. Memory advice hints may only be supported at a device-specific granularity, such as at a page boundary. In this case, the memory range may be expanded such that the start and end of the range satisfy the granularity requirements.

command_queue is a valid host command-queue. The memory advice hints will be queued for the device associated with command_queue.

ptr is a pointer to the start of the shared Unified Shared Memory allocation.

size describes the size of the memory region.

advice is a bit-field describing the memory advice hints for the region.

event_wait_list and num_events_in_wait_list specify events that need to complete before this command can be executed. If event_wait_list is NULL, then this command does not wait on any event to complete. If event_wait_list is NULL, num_events_in_wait_list must be 0. If event_wait_list is not NULL, the list of events pointed to by event_wait_list must be valid and num_events_in_wait_list must be greater than 0. The events specified in event_wait_list act as synchronization points. The context associated with events in event_wait_list and command_queue must be the same. The memory associated with event_wait_list can be reused or freed after the function returns.

event returns a unique event object that identifies this command. If event is NULL, no event will be created and therefore it will not be possible to query or wait for this command. If the event_wait_list and the event arguments are not NULL, the event argument must not refer to an element of the event_wait_list array.

clEnqueueMemAdviseINTEL returns CL_SUCCESS if the command is queued successfully. Otherwise, it will return one of the following errors:

  • CL_INVALID_COMMAND_QUEUE if command_queue is not a valid host command-queue.

  • CL_INVALID_CONTEXT if the context associated with command_queue and events in event_wait_list are not the same.

  • CL_INVALID_VALUE TODO, are any values of ptr and size considered invalid?

  • CL_INVALID_VALUE if advice is not supported advice for the device associated with command_queue.

  • CL_INVALID_EVENT_WAIT_LIST if event_wait_list is NULL and num_events_in_wait_list is greater than zero, or if event_wait_list is not NULL and num_events_in_wait_list is zero, or if event objects in event_wait_list are not valid events.

  • CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.

  • CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL implementation on the host.

Interactions with Other Extensions

If cl_intel_mem_alloc_buffer_location is supported then clDeviceMemAllocINTEL, clSharedMemAllocINTEL, clHostMemAllocINTEL, clGetMemAllocInfoINTEL also accepts CL_MEM_ALLOC_BUFFER_LOCATION_INTEL for cl_mem_properties_intel.

Issues

  1. Is there a minimum supported granularity for concurrent access? For example, might it be possible to concurrently access different pages of an allocation, but not different bytes within the same page?

    UNRESOLVED:

  2. What other Unified Shared Memory allocation properties should we support?

    UNRESOLVED: The proposed Unified Shared Memory allocation APIs accept cl_mem_alloc_flags_intel. We could also accept (some? all?) cl_mem_flags, for example.

  3. Do we need separate "concurrent access" capabilities for host access vs. device access?

    UNRESOLVED: We don’t differentiate right now, but we could differentiate between concurrent host access vs. concurrent access from another device.

  4. What would we need to add to support system allocations?

    RESOLVED: Added CL_DEVICE_SHARED_SYSTEM_MEM_CAPABILITIES_INTEL.

  5. Do we need the ability to "register" or "use" an existing host allocations?

    UNRESOLVED: Currently, only the ability to "allocate" host memory is supported. If we did support this then there may be alignment and size granularity requirements for "registering" a host allocation.

  6. Do we want to support both a flags argument and a properties argument to the USM allocation APIs?

    RESOLVED: The flags argument was folded into the properties in revision C.

  7. What should behavior be for clGetMemAllocInfoINTEL if the passed-in ptr is NULL or doesn’t point into a USM allocation?

    RESOLVED: The behavior was defined for all queries for this case in revision G.

  8. Do we want separate "memset" APIs to set to different sized "value", such as 8-bits, 16-bits?, 32-bits, or others? Do we want to go back to a "fill" API?

    RESOLVED: Switched to a "fill" API in revision I.

    Discussion: The host "memset" only sets to an 8-bit value. Switching back to a "fill" API is very flexible, but perhaps overkill, since it supports any supported integer or floating-point scalar or vector type.

  9. What are the restrictions for the dst_ptr values that can be passed to the "fill" API?

    UNRESOLVED: Need to close on:

    • Can a device "fill" another device’s allocation? (Recommendation: Yes, if accessible.)

    • Can a device "fill" arbitrary host memory? (Recommendation: Maybe?)

    • Can a device "fill" a USM allocation from another context? (Recommendation: No.)

  10. What are the restrictions for the src_ptr and dst_ptr values that can be passed to the "memcpy" API?

    UNRESOLVED: Need to close on:

    • Can a device "memcpy" from another device’s allocation?

    • Can a device "memcpy" to another device’s allocation?

    • Can a device "memcpy" to or from a USM allocation in another context? (Recommendation: No?)

    • Can a device "memcpy" to arbitrary host memory? (Recommendation: Yes.)

    • Can a device "memcpy" from arbitrary host memory? (Recommendation: Yes.)

    • Can a device "memcpy" from arbitrary host memory to arbitrary host memory? (Recommendation: Yes.)

    • Can the memory region to copy to overlap the memory region to copy from? (Recommendation: No?)

  11. Do we want to support migrating to devices other than the device associated with command_queue?

    UNRESOLVED: We could add an explicit dst_device argument if desired, which could be NULL when migrating to the device associated with the command_queue. We could also add a mechanism to allow migrating to the host.

  12. Should clEnqueueMigrateMemINTEL support migrating an array of pointers with one API call, similar to clEnqueueSVMMigrateMem?

    UNRESOLVED: This depends how frequently the migrate APIs are called.

  13. Could the device argument to clSharedMemAllocINTEL be NULL if there is no need to associate the shared allocation to a specific device?

    RESOLVED: Yes, this case is documented in revision G.

  14. Should we allow querying the associated device for a USM allocation using clGetMemAllocInfoINTEL?

    RESOLVED: This query was added in revision G.

  15. Should we add explicit mem alloc flags for CACHED and UNCACHED?

    UNRESOLVED:

  16. At least for HOST and SHARED allocations, should we have separate mem alloc flags for the host and the device?

    UNRESOLVED:

  17. What are invalid values for ptr and size for clEnqueueMigrateMemINTEL and clEnqueueMemAdviseINTEL? How about clEnqueueMemFillINTEL and clEnqueueMemcpyINTEL? Specifically, is NULL a valid value for ptr? Is size equal to zero valid?

    UNRESOLVED:

  18. Should we add a device query for a maximum supported USM alignment, or should the maximum supported alignment implicitly be defined by the size of the largest data type supported by the device? Should we allow implementation-defined behavior for alignments larger than the size of the largest data type supported by the device?

    UNRESOLVED: A device query would allow for larger supported alignments, such as page alignment. Note that supported alignments should always be a power of two.

    Note that there are no maximum supported alignments defined for posix_memalign or _aligned_alloc, and supported alignments for the standard aligned_alloc and std::aligned_alloc are implementation-defined.

  19. Should we add a device query for a maximum supported USM fill pattern size, or should the maximum supported fill pattern size implicitly be defined by the size of the largest data type supported by the device?

    UNRESOLVED: A device query would allow for larger fill patterns. Note that the fill pattern size should always be a power of two.

  20. Can a pointer to a device, host, or shared USM allocation be used to create a cl_mem using CL_MEM_USE_HOST_PTR?

    UNRESOLVED: Trending "no" in all cases. If the USM allocation is from the same context this could be an error, such as CL_INVALID_HOST_PTR. If the USM allocation is from a different context then behavior could be undefined.

  21. Can a pointer to a device, host, or shared USM allocation be used to create a cl_mem buffer using CL_MEM_COPY_HOST_PTR?

    UNRESOLVED: Trending "no" for device and shared USM allocations. If the USM allocation is from the same context this could be an error, such as CL_INVALID_HOST_PTR. If the USM allocation is from a different context then behavior could be undefined.

    Trending "yes" for host USM allocations, both when the host USM allocation is from this context and from another context.

  22. Can a pointer to a device, host, or shared USM allocation be passed to API functions to read from or write to cl_mem objects, such as clEnqueueReadBuffer or clEnqueueWriteImage?

    UNRESOLVED: Trending "yes" for device USM allocations, so long as the device USM allocation is accessible by the device associated with the command-queue, and the device allocation was made against the context associated with the command-queue.

    Trending "yes" for host USM allocations, both when the host USM allocation is from this context and from another context.

    Trending "no" for shared USM allocations. If the shared USM allocation is from the same context this could be an error, such as CL_INVALID_HOST_PTR. If the shared USM allocation is from a different context then behavior could be undefined.

  23. Can a pointer to a device, host, or shared USM allocation be passed to API functions to fill a cl_mem, SVM allocation, or USM allocation, such as clEnqueueFillBuffer?

    UNRESOLVED: Trending "no" for device and shared allocations. If the USM allocation is from the same context this could be an error, such as CL_INVALID_HOST_PTR. If the USM allocation is from a different context then behavior could be undefined.

    Trending "yes" for host USM allocations, both when the host USM allocation is from this context and from another context.

  24. Should we support passing traditional cl_mem_flags via the USM allocation properties?

    UNRESOLVED: Trending "yes", by allowing CL_MEM_FLAGS as a property and cl_mem_flags as the property value.

    Note that some flags will not be valid, such as CL_MEM_USE_HOST_PTR.

  25. Exactly how does Unified Shared Memory affect the memory model?

    UNRESOLVED:

  26. Should it be an error to set an unknown pointer as a kernel argument using clSetKernelArgMemPointerINTEL if no devices support shared system allocations?

    UNRESOLVED: Returning an error for an unknown pointer is helpful to identify and diagnose possible programming errors sooner, but passing a pointer to arbitrary memory to a function on the host is not an error until the pointer is dereferenced.

    If we relax the error condition for clSetKernelArgMemPointerINTEL then we could also consider relaxing the error condition for clSetKernelExecInfo(CL_KERNEL_EXEC_INFO_USM_PTRS_INTEL) similarly.

    Note that if the error condition is removed we can still check for possible programming errors via optional USM checking layers, such as the USMChecking functionality in the OpenCL Intercept Layer.

  27. Should we support a "rect" memcpy similar to clEnqueueCopyBufferRect?

    UNRESOLVED: This would be a fairly straightforward addition if it is useful.

    Note that there is no similar SVM "rect" memcpy.

Revision History

Rev Date Author Changes

A

2019-01-18

Ben Ashbaugh

Initial revision

B

2019-03-25

Ben Ashbaugh

Minor name changes.

C

2019-06-18

Ben Ashbaugh

Moved flags argument into properties.

D

2019-07-19

Ben Ashbaugh

Editorial fixes.

E

2019-07-22

Ben Ashbaugh

Allocation properties should be const.

F

2019-07-26

Ben Ashbaugh

Removed DEFAULT mem alloc flag.

G

2019-08-23

Ben Ashbaugh

Added mem alloc query for associated device.

H

2019-10-11

Ben Ashbaugh

Added initial list and description of error codes.

I

2019-11-14

Ben Ashbaugh

Switched from a memset to a memfill API.

J

2019-11-18

Ben Ashbaugh

Updated a few more error conditions.

K

2019-12-18

Krzysztof Gibala

Updated write combine description.

L

2020-01-15

Ben Ashbaugh

Added invalid arg case to setkernelarg API.

M

2020-01-17

Ben Ashbaugh

Minor name changes, removed const from memfree API.

N

2020-01-22

Ben Ashbaugh

Updated write combine description.

O

2020-01-23

Ben Ashbaugh

Added aliases for USM migration flags.

P

2020-02-28

Ben Ashbaugh

Added blocking memfree API.

Q

2020-03-12

Ben Ashbaugh

Name tweak for blocking memfree API, added comparison to SVM, allow zero memory advice.

R

2020-08-21

Ben Ashbaugh

Fixed enum name typo in table.

S

2020-08-26

Maciej Dziuban

Added initial placement flags for shared allocations.

1.0.0

2021-11-07

Ben Ashbaugh

Added version and other minor updates prior to posting on the OpenCL registry.

1.0.0

2022-11-08

Ben Ashbaugh

Added new issues regarding error behavior for clSetKernelArgMemPointerINTEL and rect copies.