Dependencies
Support for OpenCL 2.1, cl_khr_subgroups
, or cl_intel_subgroups
is required.
This extension is written against revision 23 of the OpenCL 2.1 API specification, against revision 30 of the OpenCL 2.0 OpenCL C specification, against version 31 of the OpenCL 2.0 Extensions specification, and against version 3 of the cl_intel_subgroups
specification.
Overview
The goal of this extension is to allow programmers to optionally specify the required sub-group size for a kernel function. This information is important for the correctness of many sub-group algorithms, and in some cases may be used by the compiler to generate more optimal code.
New API Enums
Accepted as the param_name parameter of clGetDeviceInfo:
CL_DEVICE_SUB_GROUP_SIZES_INTEL 0x4108
Accepted as the param_name parameter of clGetKernelWorkGroupInfo:
CL_KERNEL_SPILL_MEM_SIZE_INTEL 0x4109
Accepted as the param_name parameter of clGetKernelSubGroupInfo and/or clGetKernelSubGroupInfoKHR:
CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL 0x410A
New OpenCL C Optional Attribute Qualifiers
Optional __kernel
qualifier:
__attribute__((intel_reqd_sub_group_size(<int>)))
Modifications to the OpenCL API Specification
Additions to Table 4.3 - "OpenCL Device Queries"
cl_device_info | Return Type | Description |
---|---|---|
|
|
Returns the set of sub-group sizes supported by the device. |
Additions to Table 5.21 - "clGetKernelWorkGroupInfo parameter queries":
cl_kernel_work_group_info | Return Type | Info. returned in param_value |
---|---|---|
|
|
Returns the amount of spill memory used by a kernel. The meaning of this value will vary from implementation-to-implementation, however a return value of 0 will always indicate that compiler was able to compile the kernel to fit into the device’s register file without spilling registers to memory. |
Additions to "clGetKernelSubGroupInfo parameter queries":
This is Table 5.22 - "clGetKernelSubGroupInfo parameter queries" in the OpenCL 2.1 API spec, in Section 9.17.2.1 for clGetKernelSubGroupInfoKHR in the OpenCL 2.0 Extensions spec, and in the section describing the changes to Section 5.9.3 for clGetKernelSubGroupInfoKHR in the cl_intel_subgroups
spec:
cl_kernel_sub_group_info | Input Type | Return Type | Info. returned in param_value |
---|---|---|---|
|
|
|
Returns the sub-group size specified by the If the sub-group size is not specified using the above attribute qualifier then 0 is returned. |
Modifications to the OpenCL C Specification
Additions to Section 6.7.2 - "Optional Attribute Qualifiers"
The optional __attribute__((intel_reqd_sub_group_size(<int>)))
can be used to indicate that the kernel must be compiled and executed with the specified sub-group size.
When this attribute is present, get_max_sub_group_size() is guaranteed to return the specified integer value.
This is important for the correctness of many sub-group algorithms, and in some cases may be used by the compiler to generate more optimal code.
Note that there is no guarantee for the value of get_sub_group_size() even when this attribute is present, particularly when the work-group size is not evenly divisible by the required sub-group size.
Note as well that some devices may support a limited number of sub-group sizes, and that some devices may not support all language constructs with all sub-group sizes. This means that some kernels may fail compilation with one required sub-group size and succeed with another required sub-group size, even if both sub-group sizes are supported by the device.
Finally, note that requiring one sub-group size (particularly, a larger sub-group size) may require more spill memory than another sub-group size, and may negatively impact application performance."