Description
The OpenCL C programming language implements the following functions that provide asynchronous copies between global and
local memory and a prefetch from global memory.
The async copy and wait group events functions are performed by all work-items in a work-group and therefore must be encountered by all work-items in a work-group executing the kernel with the same argument values, otherwise the results are undefined. This rule applies to ND-ranges implemented with uniform and non-uniform work-groups.
If an async copy or wait group events function is inside a conditional statement then all work-items in the work-group must enter the conditional if any work-item in the work-group enters the conditional statement and executes the async copy or wait group events function.
If an async copy or wait group events function is inside a loop then all work-items in the work-group must execute the async copy or wait group events function on each iteration of the loop if any work-item executes the async copy or wait group events function on that iteration.
The generic type name gentype indicates that the function can take any of
All functions taking or returning half types are supported only when
the cl_khr_ extension macro is supported.
|
as the type for the arguments unless otherwise stated. n is 2, 3 [4], 4, 8, or 16.
| Function | Description |
|---|---|
event_t async_work_group_copy(__local gentype *dst,
const __global gentype *src, size_t num_gentypes, event_t event) |
Perform an async copy of num_gentypes gentype elements from src to dst. Returns an event object that can be used by wait_group_events to wait for the async copy to finish. The event argument can also be used to associate the async_work_group_copy with a previous async copy allowing an event to be shared by multiple async copies; otherwise event should be zero. 0 can be implicitly and explicitly cast to If event argument is non-zero, the event object supplied in event argument will be returned. This function does not perform any implicit synchronization of source data such as using a barrier before performing the copy. |
event_t async_work_group_strided_copy(__local gentype *dst,
const __global gentype *src, size_t num_gentypes, size_t src_stride,
event_t event) |
Perform an async gather of num_gentypes Returns an event object that can be used by wait_group_events to wait for the async copy to finish. The event argument can also be used to associate the async_work_group_strided_copy with a previous async copy allowing an event to be shared by multiple async copies; otherwise event should be zero. 0 can be implicitly and explicitly cast to event_t type. If event argument is non-zero, the event object supplied in event argument will be returned. This function does not perform any implicit synchronization of source data such as using a barrier before performing the copy. The behavior of async_work_group_strided_copy is undefined if src_stride or dst_stride is 0, or if the src_stride or dst_stride values cause the src or dst pointers to exceed the upper bounds of the address space during the copy. Requires support for OpenCL C 1.1 or newer. |
void wait_group_events(int num_events, event_t *event_list) |
Wait for events that identify the async_work_group_copy operations to complete. The event objects specified in event_list will be released after the wait is performed. |
void prefetch(const __global gentype *p, size_t num_gentypes) |
Prefetch |
|
Orders async copies produced by the work-items of a work-group executing a kernel. Async copies preceding the async_work_group_copy_fence must complete their access to the designated memory or memories, including both reads-from and writes-to it, before async copies following the fence are allowed to start accessing these memories. In other words, every async copy preceding the async_work_group_copy_fence must happen-before every async copy following the fence, with respect to the designated memory or memories. The flags argument specifies the memory address space and can be set to a combination of the following literal values: The async fence is performed by all work-items in a work-group and this built-in function must therefore be encountered by all work-items in a work-group executing the kernel with the same argument values; otherwise the results are undefined. This rule applies to ND-ranges implemented with uniform and non-uniform work-groups. Requires support for the
|
|
The kernel must wait for the completion of all async copies using the wait_group_events built-in function before exiting; otherwise the behavior is undefined. |
Document Notes
For more information, see the OpenCL C Specification
This page is extracted from the OpenCL C Specification. Fixes and changes should be made to the Specification, not directly.