Description

The functionality described in this section requires support for OpenCL C 3.0 or newer and the __opencl_c_subgroups feature.

The table below describes OpenCL C programming language built-in functions that operate on a sub-group level. These built-in functions must be encountered by all work-items in the sub-group executing the kernel. For the functions below, the generic type name gentype may be the one of the supported built-in scalar data types int, uint, long [1], ulong, half [2], float, and double [3].

Table 1. Built-in Sub-group Collective Functions
Function Description

int sub_group_all (int predicate)

Evaluates predicate for all work-items in the sub-group and returns a non-zero value if predicate evaluates to non-zero for all work-items in the sub-group.

int sub_group_any (int predicate)

Evaluates predicate for all work-items in the sub-group and returns a non-zero value if predicate evaluates to non-zero for any work-items in the sub-group.

gentype sub_group_broadcast (
gentype x, uint sub_group_local_id)

Broadcast the value of x for work-item identified by sub_group_local_id (value returned by get_sub_group_local_id) to all work-items in the sub-group.

Behavior is undefined when the value of sub_group_local_id is not equivalent for all work-items in the sub-group.

Behavior is undefined when sub_group_local_id is greater or equal to the sub-group size.

gentype sub_group_reduce_<op> (
gentype x)

Return result of reduction operation specified by <op> for all values of x specified by work-items in a sub-group.

gentype sub_group_scan_exclusive_<op> (
gentype x)

Do an exclusive scan operation specified by <op> of all values specified by work-items in a sub-group. The scan results are returned for each work-item.

The scan order is defined by increasing sub-group local ID within the sub-group.

gentype sub_group_scan_inclusive_<op> (
gentype x)

Do an inclusive scan operation specified by <op> of all values specified by work-items in a sub-group. The scan results are returned for each work-item.

The scan order is defined by increasing sub-group local ID within the sub-group.

The <op> in sub_group_reduce_<op>, sub_group_scan_inclusive_<op> and sub_group_scan_exclusive_<op> defines the operator and can be add, min or max.

The exclusive scan operation takes a binary operator op with an identity I and n (where n is the size of the sub-group) elements [a0, a1, …​ an-1] and returns [I, a0, (a0 op a1), …​ (a0 op a1 op …​ op an-2)].

The inclusive scan operation takes a binary operator op with an identity I and n (where n is the size of the sub-group) elements [a0, a1, …​ an-1] and returns [a0, (a0 op a1), …​ (a0 op a1 op …​ op an-1)].

If op = add, the identity I is 0. If op = min, the identity I is INT_MAX, UINT_MAX, LONG_MAX, ULONG_MAX, for int, uint, long, ulong types and is +INF for floating-point types. Similarly if op = max, the identity I is INT_MIN, 0, LONG_MIN, 0 and -INF.

The order of floating-point operations is not guaranteed for the sub_group_reduce_<op>, sub_group_scan_inclusive_<op> and sub_group_scan_exclusive_<op> built-in functions that operate on half, float and double data types. The order of these floating-point operations is also non-deterministic for a given sub-group.

The functionality described in the following table requires support for OpenCL C 3.0 or newer and the __opencl_c_subgroups and __opencl_c_pipes features.

The following table describes built-in pipe functions that operate at a sub-group level. These built-in functions must be encountered by all work-items in a sub-group executing the kernel with the same argument values, otherwise the behavior is undefined. We use the generic type name gentype to indicate the built-in OpenCL C scalar or vector integer or floating-point data types or any user defined type built from these scalar and vector data types can be used as the type for the arguments to the pipe functions listed in table 6.29.

Table 2. Built-in Sub-group Pipe Functions
Function Description

reserve_id_t sub_group_reserve_read_pipe (
read_only pipe gentype pipe,
uint num_packets)

reserve_id_t sub_group_reserve_write_pipe (
write_only pipe gentype pipe,
uint num_packets)

Reserve num_packets entries for reading from or writing to pipe. Returns a valid non-zero reservation ID if the reservation is successful and 0 otherwise.

The reserved pipe entries are referred to by indices that go from 0 …​ num_packets - 1.

void sub_group_commit_read_pipe (
read_only pipe gentype pipe,
reserve_id_t reserve_id)

void sub_group_commit_write_pipe (
write_only pipe gentype pipe,
reserve_id_t reserve_id)

Indicates that all reads and writes to num_packets associated with reservation reserve_id are completed.

Note: Reservations made by a sub-group are ordered in the pipe as they are ordered in the program. Reservations made by different sub-groups that belong to the same work-group can be ordered using sub-group synchronization. The order of sub-group based reservations that belong to different work groups is implementation-defined.

The functionality described in the following table requires support for OpenCL C 3.0 or newer and the __opencl_c_subgroups and __opencl_c_device_enqueue features.

The following table describes built-in functions to query sub-group information for a block to be enqueued.

Table 3. Built-in Sub-group Kernel Query Functions
Built-in Function Description

uint get_kernel_sub_group_count_for_ndrange (
const ndrange_t ndrange,
void (^block)(void));

uint get_kernel_sub_group_count_for_ndrange (
const ndrange_t ndrange,
void (^block)(local void *, …​));

Returns the number of sub-groups in each work-group of the dispatch (except for the last in cases where the global size does not divide cleanly into work-groups) given the combination of the passed ndrange and block.

block specifies the block to be enqueued.

uint get_kernel_max_sub_group_size_for_ndrange (
const ndrange_t ndrange,
void (^block)(void));

uint get_kernel_max_sub_group_size_for_ndrange (
const ndrange_t ndrange,
void (^block)(local void *, …​));

Returns the maximum sub-group size for a block.

See Also

No cross-references are available

Document Notes

For more information, see the OpenCL C Specification

This page is extracted from the OpenCL C Specification. Fixes and changes should be made to the Specification, not directly.

Copyright 2014-2023 The Khronos Group Inc.

SPDX-License-Identifier: CC-BY-4.0


1. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the __opencl_c_int64 feature macro.
2. Only if the cl_khr_fp16 extension is supported and has been enabled.
3. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the __opencl_c_fp64 feature macro.