Description

The OpenCL C programming language implements the following functions that provide asynchronous copies between global and local memory and a prefetch from global memory.

The async copy and wait group events functions are performed by all work-items in a work-group and therefore must be encountered by all work-items in a work-group executing the kernel with the same argument values, otherwise the results are undefined. This rule applies to ND-ranges implemented with uniform and non-uniform work-groups.

If an async copy or wait group events function is inside a conditional statement then all work-items in the work-group must enter the conditional if any work-item in the work-group enters the conditional statement and executes the async copy or wait group events function.

If an async copy or wait group events function is inside a loop then all work-items in the work-group must execute the async copy or wait group events function on each iteration of the loop if any work-item executes the async copy or wait group events function on that iteration.

We use the generic type name gentype to indicate the built-in data types char, charn, uchar, ucharn, short, shortn, ushort, ushortn, int, intn, uint, uintn, long [1], longn, ulong, ulongn, float, floatn, double [2], and doublen as the type for the arguments unless otherwise stated. n is 2, 3 [3], 4, 8, or 16.

Table 1. Built-in Async Copy and Prefetch Functions

Function

Description

event_t async_work_group_copy(__local gentype *dst, const __global gentype *src, size_t num_gentypes, event_t event)
event_t async_work_group_copy(__global gentype *dst, const __local gentype *src, size_t num_gentypes, event_t event)

Perform an async copy of num_gentypes gentype elements from src to dst. Returns an event object that can be used by wait_group_events to wait for the async copy to finish. The event argument can also be used to associate the async_work_group_copy with a previous async copy allowing an event to be shared by multiple async copies; otherwise event should be zero.

0 can be implicitly and explicitly cast to event_t type.

If event argument is non-zero, the event object supplied in event argument will be returned.

This function does not perform any implicit synchronization of source data such as using a barrier before performing the copy.

event_t async_work_group_strided_copy(__local gentype *dst, const __global gentype *src, size_t num_gentypes, size_t src_stride, event_t event)
event_t async_work_group_strided_copy(__global gentype *dst, const __local gentype *src, size_t num_gentypes, size_t dst_stride, event_t event)

Perform an async gather of num_gentypes gentype elements from src to dst. The src_stride is the stride in elements for each gentype element read from src. The dst_stride is the stride in elements for each gentype element written to dst.

Returns an event object that can be used by wait_group_events to wait for the async copy to finish. The event argument can also be used to associate the async_work_group_strided_copy with a previous async copy allowing an event to be shared by multiple async copies; otherwise event should be zero.

0 can be implicitly and explicitly cast to event_t type.

If event argument is non-zero, the event object supplied in event argument will be returned.

This function does not perform any implicit synchronization of source data such as using a barrier before performing the copy.

The behavior of async_work_group_strided_copy is undefined if src_stride or dst_stride is 0, or if the src_stride or dst_stride values cause the src or dst pointers to exceed the upper bounds of the address space during the copy.

Requires support for OpenCL C 1.1 or newer.

void wait_group_events(int num_events, event_t *event_list)

Wait for events that identify the async_work_group_copy operations to complete. The event objects specified in event_list will be released after the wait is performed.

void prefetch(const __global gentype *p, size_t num_gentypes)

Prefetch num_gentypes * sizeof(gentype) bytes into the global cache. The prefetch instruction is applied to a work-item in a work-group and does not affect the functional behavior of the kernel.

The kernel must wait for the completion of all async copies using the wait_group_events built-in function before exiting; otherwise the behavior is undefined.

See Also

No cross-references are available

Document Notes

For more information, see the OpenCL C Specification

This page is extracted from the OpenCL C Specification. Fixes and changes should be made to the Specification, not directly.

Copyright 2014-2023 The Khronos Group Inc.

SPDX-License-Identifier: CC-BY-4.0


1. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the __opencl_c_int64 feature macro.
2. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the __opencl_c_fp64 feature macro.
3. async_work_group_copy and async_work_group_strided_copy for 3-component vector types behave as async_work_group_copy and async_work_group_strided_copy respectively for 4-component vector types.