Description
If the cl_khr_ extension macro is supported,
additional Built-in Extended Async Copy
Functions are provided which interpret the source and destination as 2D or
3D data.
|
async_work_group_strided_copy is a special
case of async_work_group_copy_2D2D, namely one which copies a single
column to a single line or vice versa.
For example: |
The functions described in this section support arbitrary gentype-based
buffers by casting pointers to void*.
These functions do not perform any implicit synchronization of source data such as using a barrier before performing the copy.
These functions are performed by all work-items in a work-group and must therefore be encountered by all work-items in a work-group executing the kernel with the same argument values; otherwise the results are undefined.
The src_offset, dst_offset, src_total_line_length, dst_total_line_length, src_total_plane_area and dst_total_plane_area function arguments are expressed in elements.
Both src_total_line_length and dst_total_line_length describe the number of elements between the beginning of the current line and the beginning of the next line.
Both src_total_plane_area and dst_total_plane_area describe the number of elements between the beginning of the current plane and the beginning of the next plane.
These functions return an event object that can be used by wait_group_events to wait for the async copy to finish. The event argument can also be used to associate the async copy with a previous async copy allowing an event to be shared by multiple async copies; otherwise event should be zero. If the event argument is non-zero, the event object supplied as the event argument will be returned.
| Function | Description |
|---|---|
|
Perform an async copy of (num_elements_per_line * num_lines)
elements of size num_bytes_per_element from (src + (src_offset *
num_bytes_per_element)) to (dst + (dst_offset *
num_bytes_per_element)).
All pointer arithmetic is performed with implicit casting to The behavior of async_work_group_copy_2D2D is undefined if the source or destination addresses exceed the upper bounds of the address space during the copy. The behavior of async_work_group_copy_2D2D is also undefined if the src_total_line_length or dst_total_line_length values are smaller than num_elements_per_line, i.e. overlapping of lines is undefined. |
|
Perform an async copy of num_elements_per_line * num_lines) * num_planes) elements of size num_bytes_per_element from (src + (src_offset * num_bytes_per_element to (dst + (dst_offset *
num_bytes_per_element)), arranged in num_planes planes.
All pointer arithmetic is performed with implicit casting to The behavior of async_work_group_copy_3D3D is undefined if the source or destination addresses exceed the upper bounds of the address space during the copy. The behavior of async_work_group_copy_3D3D is also undefined if the src_total_line_length or dst_total_line_length values are smaller than num_elements_per_line, i.e. overlapping of lines is undefined. The behavior of async_work_group_copy_3D3D is also undefined if src_total_plane_area is smaller than (num_lines * src_total_line_length), or dst_total_plane_area is smaller than (num_lines * dst_total_line_length), i.e. overlapping of planes is undefined. |
Document Notes
For more information, see the OpenCL C Specification
This page is extracted from the OpenCL C Specification. Fixes and changes should be made to the Specification, not directly.