asyncCopyFunctions(3)

Description

The OpenCL C programming language implements the following functions that provide asynchronous copies between global and local memory and a prefetch from global memory.

The async copy and wait group events functions are performed by all work-items in a work-group and therefore must be encountered by all work-items in a work-group executing the kernel with the same argument values, otherwise the results are undefined. This rule applies to ND-ranges implemented with uniform and non-uniform work-groups.

If an async copy or wait group events function is inside a conditional statement then all work-items in the work-group must enter the conditional if any work-item in the work-group enters the conditional statement and executes the async copy or wait group events function.

If an async copy or wait group events function is inside a loop then all work-items in the work-group must execute the async copy or wait group events function on each iteration of the loop if any work-item executes the async copy or wait group events function on that iteration.

The generic type name gentype indicates that the function can take any of

char, charn, uchar, or ucharn
short, shortn, ushort, or ushortn
int, intn, uint, or uintn
long ^[1], longn, ulong, or ulongn
float, floatn
double ^[2] or doublen
half ^[3] or halfn

All functions taking or returning half types are supported only when the cl_khr_fp16 extension macro is supported.

as the type for the arguments unless otherwise stated. n is 2, 3 ^[4], 4, 8, or 16.

Table 1. Built-in Async Copy and Prefetch Functions
Function	Description
event_t async_work_group_copy(__local gentype dst, const __global gentype src, size_t num_gentypes, event_t event) event_t async_work_group_copy(__global gentype dst, const __local gentype src, size_t num_gentypes, event_t event)	Perform an async copy of num_gentypes gentype elements from src to dst. Returns an event object that can be used by wait_group_events to wait for the async copy to finish. The event argument can also be used to associate the async_work_group_copy with a previous async copy allowing an event to be shared by multiple async copies; otherwise event should be zero. 0 can be implicitly and explicitly cast to `event_t` type. If event argument is non-zero, the event object supplied in event argument will be returned. This function does not perform any implicit synchronization of source data such as using a barrier before performing the copy.

event_t async_work_group_strided_copy(__local gentype dst, const __global gentype src, size_t num_gentypes, size_t src_stride, event_t event) event_t async_work_group_strided_copy(__global gentype dst, const __local gentype src, size_t num_gentypes, size_t dst_stride, event_t event)	Perform an async gather of num_gentypes `gentype` elements from src to dst. The src_stride is the stride in elements for each `gentype` element read from src. The dst_stride is the stride in elements for each `gentype` element written to dst. Returns an event object that can be used by wait_group_events to wait for the async copy to finish. The event argument can also be used to associate the async_work_group_strided_copy with a previous async copy allowing an event to be shared by multiple async copies; otherwise event should be zero. 0 can be implicitly and explicitly cast to event_t type. If event argument is non-zero, the event object supplied in event argument will be returned. This function does not perform any implicit synchronization of source data such as using a barrier before performing the copy. The behavior of async_work_group_strided_copy is undefined if src_stride or dst_stride is 0, or if the src_stride or dst_stride values cause the src or dst pointers to exceed the upper bounds of the address space during the copy. Requires support for OpenCL C 1.1 or newer.

void wait_group_events(int num_events, event_t *event_list)	Wait for events that identify the async_work_group_copy operations to complete. The event objects specified in event_list will be released after the wait is performed.

void prefetch(const __global gentype p, size_t num_gentypes*)	Prefetch `num_gentypes * sizeof(gentype)` bytes into the global cache. The prefetch instruction is applied to a work-item in a work-group and does not affect the functional behavior of the kernel.
`void async_work_group_copy_fence( cl_mem_fence_flags flags)`	Orders async copies produced by the work-items of a work-group executing a kernel. Async copies preceding the async_work_group_copy_fence must complete their access to the designated memory or memories, including both reads-from and writes-to it, before async copies following the fence are allowed to start accessing these memories. In other words, every async copy preceding the async_work_group_copy_fence must happen-before every async copy following the fence, with respect to the designated memory or memories. The flags argument specifies the memory address space and can be set to a combination of the following literal values: `CLK_LOCAL_MEM_FENCE` `CLK_GLOBAL_MEM_FENCE` The async fence is performed by all work-items in a work-group and this built-in function must therefore be encountered by all work-items in a work-group executing the kernel with the same argument values; otherwise the results are undefined. This rule applies to ND-ranges implemented with uniform and non-uniform work-groups. Requires support for the `cl_khr_async_work_group_copy_fence` extension macro.

The kernel must wait for the completion of all async copies using the wait_group_events built-in function before exiting; otherwise the behavior is undefined.

Document Notes

For more information, see the OpenCL C Specification

This page is extracted from the OpenCL C Specification. Fixes and changes should be made to the Specification, not directly.

Copyright

SPDX-License-Identifier: CC-BY-4.0

1. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the __opencl_c_int64 feature macro.

2. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the __opencl_c_fp64 feature macro.

3. Only if the cl_khr_fp16 extension is supported and has been enabled.

4. async_work_group_copy and async_work_group_strided_copy for 3-component vector types behave as async_work_group_copy and async_work_group_strided_copy respectively for 4-component vector types.

asyncCopyFunctions(3) Manual Page

Name

Description

See Also

Document Notes

Copyright