Contact
Please see the Issues list in the Khronos OpenCL-Docs repository:
https://github.com/KhronosGroup/OpenCL-Docs
Contributors
Stuart Brady, ARM
Sven van Haastregt, ARM
Ben Ashbaugh, Intel
Alex Paige, Intel
Lukasz Towarek, Intel
Ruihao Zhang, Qualcomm
Dependencies
This extension is written against the OpenCL API Specification, OpenCL C Specification, and OpenCL SPIR-V Environment Specification Versions 3.0.8.
The functionality added by this extension uses the OpenCL C 2.0 atomic syntax and hence requires OpenCL 2.0 or newer.
This extension interacts with cl_khr_fp16
by optionally adding the ability to atomically operate on 16-bit floating point values in memory.
This extension depends on SPV_EXT_shader_atomic_float_add
and SPV_EXT_shader_atomic_float_min_max
for implementations that support SPIR-V and floating-point atomic add, min, or max operations.
Overview
This extension enables programmers to perform atomic operations on floating-point numbers in memory.
An OpenCL device supporting this extension may support atomic operations on 16-bit half-precision floating-point values (fp16
), 32-bit single-precision floating-point values (fp32
), or 64-bit double-precision floating-point values (fp64
).
For these types, an OpenCL device may support basic atomic operations (load, store, and exchange), atomic addition and subtraction, and atomic min and max.
The floating-point numbers may be in global or local memory.
New API Enums
Accepted value for the param_name parameter to clGetDeviceInfo to query the floating-point atomic capabilities of an OpenCL device:
#define CL_DEVICE_SINGLE_FP_ATOMIC_CAPABILITIES_EXT 0x4231
#define CL_DEVICE_DOUBLE_FP_ATOMIC_CAPABILITIES_EXT 0x4232
#define CL_DEVICE_HALF_FP_ATOMIC_CAPABILITIES_EXT 0x4233
Bitfield type describing the floating-point atomic capabilities of an OpenCL device. Subsequent versions of this extension may add additional floating-point atomic capabilities:
typedef cl_bitfield cl_device_fp_atomic_capabilities_ext;
#define CL_DEVICE_GLOBAL_FP_ATOMIC_LOAD_STORE_EXT (1 << 0)
#define CL_DEVICE_GLOBAL_FP_ATOMIC_ADD_EXT (1 << 1)
#define CL_DEVICE_GLOBAL_FP_ATOMIC_MIN_MAX_EXT (1 << 2)
/* bits 3 - 15 are currently unused */
#define CL_DEVICE_LOCAL_FP_ATOMIC_LOAD_STORE_EXT (1 << 16)
#define CL_DEVICE_LOCAL_FP_ATOMIC_ADD_EXT (1 << 17)
#define CL_DEVICE_LOCAL_FP_ATOMIC_MIN_MAX_EXT (1 << 18)
/* bits 19 and beyond are currently unused */
New OpenCL C Feature Names
__opencl_c_ext_fp16_global_atomic_load_store
__opencl_c_ext_fp16_local_atomic_load_store
__opencl_c_ext_fp16_global_atomic_add
__opencl_c_ext_fp32_global_atomic_add
__opencl_c_ext_fp64_global_atomic_add
__opencl_c_ext_fp16_local_atomic_add
__opencl_c_ext_fp32_local_atomic_add
__opencl_c_ext_fp64_local_atomic_add
__opencl_c_ext_fp16_global_atomic_min_max
__opencl_c_ext_fp32_global_atomic_min_max
__opencl_c_ext_fp64_global_atomic_min_max
__opencl_c_ext_fp16_local_atomic_min_max
__opencl_c_ext_fp32_local_atomic_min_max
__opencl_c_ext_fp64_local_atomic_min_max
New OpenCL C Functions
Add support for atomic_half
for the following functions:
// atomic_store:
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_global_atomic_load_store feature.
void atomic_store(volatile __global A *object, C desired)
void atomic_store_explicit(volatile __global A *object, C desired, memory_order order)
void atomic_store_explicit(volatile __global A *object, C desired,
memory_order order, memory_scope scope)
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_local_atomic_load_store feature.
void atomic_store(volatile __local A *object, C desired)
void atomic_store_explicit(volatile __local A *object, C desired, memory_order order)
void atomic_store_explicit(volatile __local A *object, C desired,
memory_order order, memory_scope scope)
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_global_atomic_load_store feature
// and the __opencl_c_ext_fp16_local_atomic_load_store feature.
void atomic_store(volatile A *object, C desired)
void atomic_store_explicit(volatile A *object, C desired, memory_order order)
void atomic_store_explicit(volatile A *object, C desired,
memory_order order, memory_scope scope)
// atomic_load:
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_global_atomic_load_store feature.
C atomic_load(volatile __global A *object)
C atomic_load_explicit(volatile __global A *object, memory_order order)
C atomic_load_explicit(volatile __global A *object,
memory_order order, memory_scope scope)
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_local_atomic_load_store feature.
C atomic_load(volatile __local A *object)
C atomic_load_explicit(volatile __local A *object, memory_order order)
C atomic_load_explicit(volatile __local A *object,
memory_order order, memory_scope scope)
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_global_atomic_load_store feature
// and the __opencl_c_ext_fp16_local_atomic_load_store feature.
C atomic_load(volatile A *object)
C atomic_load_explicit(volatile A *object, memory_order order)
C atomic_load_explicit(volatile A *object,
memory_order order, memory_scope scope)
// atomic_exchange:
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_global_atomic_load_store feature.
C atomic_exchange(volatile __global A *object, C desired)
C atomic_exchange_explicit(volatile __global A *object, C desired, memory_order order)
C atomic_exchange_explicit(volatile __global A *object, C desired,
memory_order order, memory_scope scope)
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_local_atomic_load_store feature.
C atomic_exchange(volatile __local A *object, C desired)
C atomic_exchange_explicit(volatile __local A *object, C desired, memory_order order)
C atomic_exchange_explicit(volatile __local A *object, C desired,
memory_order order, memory_scope scope)
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_global_atomic_load_store feature
// and the __opencl_c_ext_fp16_local_atomic_load_store feature.
C atomic_exchange(volatile A *object, C desired)
C atomic_exchange_explicit(volatile A *object, C desired, memory_order order)
C atomic_exchange_explicit(volatile A *object, C desired,
memory_order order, memory_scope scope)
Add support for atomic_half
, atomic_float
, and atomic_double
for the following functions:
// atomic_fetch_add / atomic_fetch_sub:
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_global_atomic_add feature (for atomic_half),
// requires the __opencl_c_ext_fp32_global_atomic_add feature (for atomic_float), or
// requires the __opencl_c_ext_fp64_global_atomic_add feature (for atomic_double).
C atomic_fetch_add(volatile __global A *object, M operand)
C atomic_fetch_sub(volatile __global A *object, M operand)
C atomic_fetch_add_explicit(volatile __global A *object, M operand, memory_order order)
C atomic_fetch_sub_explicit(volatile __global A *object, M operand, memory_order order)
C atomic_fetch_add_explicit(volatile __global A *object, M operand,
memory_order order, memory_scope scope)
C atomic_fetch_sub_explicit(volatile __global A *object, M operand,
memory_order order, memory_scope scope)
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_local_atomic_add feature (for atomic_half),
// requires the __opencl_c_ext_fp32_local_atomic_add feature (for atomic_float), or
// requires the __opencl_c_ext_fp64_local_atomic_add feature (for atomic_double).
C atomic_fetch_add(volatile __local A *object, M operand)
C atomic_fetch_sub(volatile __local A *object, M operand)
C atomic_fetch_add_explicit(volatile __local A *object, M operand, memory_order order)
C atomic_fetch_sub_explicit(volatile __local A *object, M operand, memory_order order)
C atomic_fetch_add_explicit(volatile __local A *object, M operand,
memory_order order, memory_scope scope)
C atomic_fetch_sub_explicit(volatile __local A *object, M operand,
memory_order order, memory_scope scope)
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_global_atomic_add feature
// and the __opencl_c_ext_fp16_local_atomic_add feature (for atomic_half),
// requires the __opencl_c_ext_fp32_global_atomic_add feature
// and the __opencl_c_ext_fp32_local_atomic_add feature (for atomic_float), or
// requires the __opencl_c_ext_fp64_global_atomic_add feature
// and the __opencl_c_ext_fp64_local_atomic_add feature (for atomic_double).
C atomic_fetch_add(volatile A *object, M operand)
C atomic_fetch_sub(volatile A *object, M operand)
C atomic_fetch_add_explicit(volatile A *object, M operand, memory_order order)
C atomic_fetch_sub_explicit(volatile A *object, M operand, memory_order order)
C atomic_fetch_add_explicit(volatile A *object, M operand,
memory_order order, memory_scope scope)
C atomic_fetch_sub_explicit(volatile A *object, M operand,
memory_order order, memory_scope scope)
// atomic_fetch_min / atomic_fetch_max:
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_global_atomic_min_max feature (for atomic_half),
// requires the __opencl_c_ext_fp32_global_atomic_min_max feature (for atomic_float), or
// requires the __opencl_c_ext_fp64_global_atomic_min_max feature (for atomic_double).
C atomic_fetch_min(volatile __global A *object, M operand)
C atomic_fetch_max(volatile __global A *object, M operand)
C atomic_fetch_min_explicit(volatile __global A *object, M operand, memory_order order)
C atomic_fetch_max_explicit(volatile __global A *object, M operand, memory_order order)
C atomic_fetch_min_explicit(volatile __global A *object, M operand,
memory_order order, memory_scope scope)
C atomic_fetch_max_explicit(volatile __global A *object, M operand,
memory_order order, memory_scope scope)
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_local_atomic_min_max feature (for atomic_half),
// requires the __opencl_c_ext_fp32_local_atomic_min_max feature (for atomic_float), or
// requires the __opencl_c_ext_fp64_local_atomic_min_max feature (for atomic_double).
C atomic_fetch_min(volatile __local A *object, M operand)
C atomic_fetch_max(volatile __local A *object, M operand)
C atomic_fetch_min_explicit(volatile __local A *object, M operand, memory_order order)
C atomic_fetch_max_explicit(volatile __local A *object, M operand, memory_order order)
C atomic_fetch_min_explicit(volatile __local A *object, M operand,
memory_order order, memory_scope scope)
C atomic_fetch_max_explicit(volatile __local A *object, M operand,
memory_order order, memory_scope scope)
// In addition to the requirements described in the OpenCL C 3.0 specification,
// requires the __opencl_c_ext_fp16_global_atomic_min_max feature
// and the __opencl_c_ext_fp16_local_atomic_min_max feature (for atomic_half),
// requires the __opencl_c_ext_fp32_global_atomic_min_max feature
//and the __opencl_c_ext_fp32_local_atomic_min_max feature (for atomic_float), or
// requires the __opencl_c_ext_fp64_global_atomic_min_max feature
// and the __opencl_c_ext_fp64_local_atomic_min_max feature (for atomic_double).
C atomic_fetch_min(volatile A *object, M operand)
C atomic_fetch_max(volatile A *object, M operand)
C atomic_fetch_min_explicit(volatile A *object, M operand, memory_order order)
C atomic_fetch_max_explicit(volatile A *object, M operand, memory_order order)
C atomic_fetch_min_explicit(volatile A *object, M operand,
memory_order order, memory_scope scope)
C atomic_fetch_max_explicit(volatile A *object, M operand,
memory_order order, memory_scope scope)
Modifications to the OpenCL API Specification
- Add to Table 5 - OpenCL Device Queries in Section 4.2 - Querying Devices:
-
Table 5. List of supported param_names by clGetDeviceInfo Device Info Return Type Description CL_DEVICE_
SINGLE_ FP_ ATOMIC_ CAPABILITIES_ EXT
CL_DEVICE_
DOUBLE_ FP_ ATOMIC_ CAPABILITIES_ EXT
CL_DEVICE_
HALF_ FP_ ATOMIC_ CAPABILITIES_ EXT cl_device_
fp_ atomic_ capabilities_ ext Describes the floating-point atomic operations supported by the device. This is a bit-field that describes a combination of the following values:
CL_DEVICE_GLOBAL_FP_ATOMIC_LOAD_STORE_EXT
- Can perform floating-point load, store, and exchange atomic operations in global memory.
CL_DEVICE_GLOBAL_FP_ATOMIC_ADD_EXT
- Can perform floating-point addition and subtraction atomic operations in global memory.
CL_DEVICE_GLOBAL_FP_ATOMIC_MIN_MAX_EXT
- Can perform floating-point min and max atomic operations in global memory.CL_DEVICE_LOCAL_FP_ATOMIC_LOAD_STORE_EXT
- Can perform floating-point load, store, and exchange atomic operations in local memory.
CL_DEVICE_LOCAL_FP_ATOMIC_ADD_EXT
- Can perform floating-point addition and subtraction atomic operations in local memory.
CL_DEVICE_LOCAL_FP_ATOMIC_MIN_MAX_EXT
- Can perform floating-point min and max atomic operations in local memory.There is no mandated minimum capability.
Modifications to the OpenCL C Specification
- Add to Table 1 - Optional features in OpenCL C 3.0 or newer and their predefined macros:
-
Table 1. Optional features in OpenCL C 3.0 or newer and their predefined macros Feature Macro/Name Brief Description __opencl_c_
,ext_ fp16_ global_ atomic_ load_ store
__opencl_c_
ext_ fp16_ local_ atomic_ load_ store The OpenCL C compiler supports built-in functions to atomically load, store, or exchange 16-bit floating-point values in
__global
or__local
memory.OpenCL C compilers that define the feature macros
__opencl_c_
orext_ fp16_ global_ atomic_ load_ store __opencl_c_
must also support the OpenCL extensionext_ fp16_ local_ atomic_ load_ store cl_khr_fp16
.Note: built-in functions to atomically load, store, or exchange 32-bit and 64-bit floating-point values are already in OpenCL C 2.0 and newer.
__opencl_c_
,ext_ fp16_ global_ atomic_ add
__opencl_c_
,ext_ fp32_ global_ atomic_ add
__opencl_c_
,ext_ fp64_ global_ atomic_ add
__opencl_c_
,ext_ fp16_ local_ atomic_ add
__opencl_c_
,ext_ fp32_ local_ atomic_ add
__opencl_c_
ext_ fp64_ local_ atomic_ add The OpenCL C compiler supports built-in functions to atomically add to or subtract from 16-bit, 32-bit, or 64-bit floating-point values in
__global
or__local
memory.OpenCL C compilers that define the feature macros
__opencl_c_
orext_ fp16_ global_ atomic_ add __opencl_c_
must also support the OpenCL extensionext_ fp16_ local_ atomic_ add cl_khr_fp16
.OpenCL C compilers that define the feature macros
__opencl_c_
orext_ fp64_ global_ atomic_ add __opencl_c_
must also define the feature macroext_ fp64_ local_ atomic_ add __opencl_c_fp64
.__opencl_c_
,ext_ fp16_ global_ atomic_ min_ max
__opencl_c_
,ext_ fp32_ global_ atomic_ min_ max
__opencl_c_
,ext_ fp64_ global_ atomic_ min_ max
__opencl_c_
,ext_ fp16_ local_ atomic_ min_ max
__opencl_c_
,ext_ fp32_ local_ atomic_ min_ max
__opencl_c_
ext_ fp64_ local_ atomic_ min_ max The OpenCL C compiler supports built-in functions to atomically compute the minimum or maximum of a 16-bit, 32-bit, or 64-bit floating-point operand and a value in
__global
or__local
memory.OpenCL C compilers that define the feature macros
__opencl_c_
orext_ fp16_ global_ atomic_ min_ max __opencl_c_
must also support the OpenCL extensionext_ fp16_ local_ atomic_ min_ max cl_khr_fp16
.OpenCL C compilers that define the feature macros
__opencl_c_
orext_ fp64_ global_ atomic_ min_ max __opencl_c_
must also define the feature macroext_ fp64_ local_ atomic_ min_ max __opencl_c_fp64
. - Add to the list of atomic type names in Section 6.15.12.6 Atomic integer and floating-point types:
-
-
atomic_half
*
*
Only if thecl_khr_fp16
extension is supported and has been enabled. -
- Add
atomic_half
to the list of atomic types supported by theatomic_store
functions in section 6.15.12.7.1: -
// In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_global_atomic_load_store feature. void atomic_store(volatile __global A *object, C desired) void atomic_store_explicit(volatile __global A *object, C desired, memory_order order) void atomic_store_explicit(volatile __global A *object, C desired, memory_order order, memory_scope scope) // In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_local_atomic_load_store feature. void atomic_store(volatile __local A *object, C desired) void atomic_store_explicit(volatile __local A *object, C desired, memory_order order) void atomic_store_explicit(volatile __local A *object, C desired, memory_order order, memory_scope scope) // In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_global_atomic_load_store feature // and the __opencl_c_ext_fp16_local_atomic_load_store feature. void atomic_store(volatile A *object, C desired) void atomic_store_explicit(volatile A *object, C desired, memory_order order) void atomic_store_explicit(volatile A *object, C desired, memory_order order, memory_scope scope)
- Add
atomic_half
to the list of atomic types supported by theatomic_load
functions in section 6.15.12.7.2: -
// In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_global_atomic_load_store feature. C atomic_load(volatile __global A *object) C atomic_load_explicit(volatile __global A *object, memory_order order) C atomic_load_explicit(volatile __global A *object, memory_order order, memory_scope scope) // In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_local_atomic_load_store feature. C atomic_load(volatile __local A *object) C atomic_load_explicit(volatile __local A *object, memory_order order) C atomic_load_explicit(volatile __local A *object, memory_order order, memory_scope scope) // In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_global_atomic_load_store feature // and the __opencl_c_ext_fp16_local_atomic_load_store feature. C atomic_load(volatile A *object) C atomic_load_explicit(volatile A *object, memory_order order) C atomic_load_explicit(volatile A *object, memory_order order, memory_scope scope)
- Add
atomic_half
to the list of atomic types supported by theatomic_exchange
functions in section 6.15.12.7.3: -
// In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_global_atomic_load_store feature. C atomic_exchange(volatile __global A *object, C desired) C atomic_exchange_explicit(volatile __global A *object, C desired, memory_order order) C atomic_exchange_explicit(volatile __global A *object, C desired, memory_order order, memory_scope scope) // In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_local_atomic_load_store feature. C atomic_exchange(volatile __local A *object, C desired) C atomic_exchange_explicit(volatile __local A *object, C desired, memory_order order) C atomic_exchange_explicit(volatile __local A *object, C desired, memory_order order, memory_scope scope) // In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_global_atomic_load_store feature // and the __opencl_c_ext_fp16_local_atomic_load_store feature. C atomic_exchange(volatile A *object, C desired) C atomic_exchange_explicit(volatile A *object, C desired, memory_order order) C atomic_exchange_explicit(volatile A *object, C desired, memory_order order, memory_scope scope)
- Add new floating-point atomic fetch and modify functions for the atomic operations add and sub for the atomic types
atomic_half
,atomic_float
, andatomic_double
: -
// In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_global_atomic_add feature (for atomic_half), // requires the __opencl_c_ext_fp32_global_atomic_add feature (for atomic_float), or // requires the __opencl_c_ext_fp64_global_atomic_add feature (for atomic_double). C atomic_fetch_add(volatile __global A *object, M operand) C atomic_fetch_sub(volatile __global A *object, M operand) C atomic_fetch_add_explicit(volatile __global A *object, M operand, memory_order order) C atomic_fetch_sub_explicit(volatile __global A *object, M operand, memory_order order) C atomic_fetch_add_explicit(volatile __global A *object, M operand, memory_order order, memory_scope scope) C atomic_fetch_sub_explicit(volatile __global A *object, M operand, memory_order order, memory_scope scope) // In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_local_atomic_add feature (for atomic_half), // requires the __opencl_c_ext_fp32_local_atomic_add feature (for atomic_float), or // requires the __opencl_c_ext_fp64_local_atomic_add feature (for atomic_double). C atomic_fetch_add(volatile __local A *object, M operand) C atomic_fetch_sub(volatile __local A *object, M operand) C atomic_fetch_add_explicit(volatile __local A *object, M operand, memory_order order) C atomic_fetch_sub_explicit(volatile __local A *object, M operand, memory_order order) C atomic_fetch_add_explicit(volatile __local A *object, M operand, memory_order order, memory_scope scope) C atomic_fetch_sub_explicit(volatile __local A *object, M operand, memory_order order, memory_scope scope) // In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_global_atomic_add feature // and the __opencl_c_ext_fp16_local_atomic_add feature (for atomic_half), // requires the __opencl_c_ext_fp32_global_atomic_add feature // and the __opencl_c_ext_fp32_local_atomic_add feature (for atomic_float), or // requires the __opencl_c_ext_fp64_global_atomic_add feature // and the __opencl_c_ext_fp64_local_atomic_add feature (for atomic_double). C atomic_fetch_add(volatile A *object, M operand) C atomic_fetch_sub(volatile A *object, M operand) C atomic_fetch_add_explicit(volatile A *object, M operand, memory_order order) C atomic_fetch_sub_explicit(volatile A *object, M operand, memory_order order) C atomic_fetch_add_explicit(volatile A *object, M operand, memory_order order, memory_scope scope) C atomic_fetch_sub_explicit(volatile A *object, M operand, memory_order order, memory_scope scope)
The floating-point atomic add and sub operations may be affected by compiler options affecting floating-point behavior, such as
-cl-no-signed-zeros
,-cl-denorms-are-zero
, and-cl-finite-math-only
. - Also add new floating-point atomic fetch and modify functions for the atomic operations min and max for the atomic types
atomic_half
,atomic_float
, andatomic_double
: -
// In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_global_atomic_min_max feature (for atomic_half), // requires the __opencl_c_ext_fp32_global_atomic_min_max feature (for atomic_float), or // requires the __opencl_c_ext_fp64_global_atomic_min_max feature (for atomic_double). C atomic_fetch_min(volatile __global A *object, M operand) C atomic_fetch_max(volatile __global A *object, M operand) C atomic_fetch_min_explicit(volatile __global A *object, M operand, memory_order order) C atomic_fetch_max_explicit(volatile __global A *object, M operand, memory_order order) C atomic_fetch_min_explicit(volatile __global A *object, M operand, memory_order order, memory_scope scope) C atomic_fetch_max_explicit(volatile __global A *object, M operand, memory_order order, memory_scope scope) // In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_local_atomic_min_max feature (for atomic_half), // requires the __opencl_c_ext_fp32_local_atomic_min_max feature (for atomic_float), or // requires the __opencl_c_ext_fp64_local_atomic_min_max feature (for atomic_double). C atomic_fetch_min(volatile __local A *object, M operand) C atomic_fetch_max(volatile __local A *object, M operand) C atomic_fetch_min_explicit(volatile __local A *object, M operand, memory_order order) C atomic_fetch_max_explicit(volatile __local A *object, M operand, memory_order order) C atomic_fetch_min_explicit(volatile __local A *object, M operand, memory_order order, memory_scope scope) C atomic_fetch_max_explicit(volatile __local A *object, M operand, memory_order order, memory_scope scope) // In addition to the requirements described in the OpenCL C 3.0 specification, // requires the __opencl_c_ext_fp16_global_atomic_min_max feature // and the __opencl_c_ext_fp16_local_atomic_min_max feature (for atomic_half), // requires the __opencl_c_ext_fp32_global_atomic_min_max feature // and the __opencl_c_ext_fp32_local_atomic_min_max feature (for atomic_float), or // requires the __opencl_c_ext_fp64_global_atomic_min_max feature // and the __opencl_c_ext_fp64_local_atomic_min_max feature (for atomic_double). C atomic_fetch_min(volatile A *object, M operand) C atomic_fetch_max(volatile A *object, M operand) C atomic_fetch_min_explicit(volatile A *object, M operand, memory_order order) C atomic_fetch_max_explicit(volatile A *object, M operand, memory_order order) C atomic_fetch_min_explicit(volatile A *object, M operand, memory_order order, memory_scope scope) C atomic_fetch_max_explicit(volatile A *object, M operand, memory_order order, memory_scope scope)
The floating-point atomic min and max operations may be affected by compiler options affecting floating-point behavior, such as
-cl-no-signed-zeros
,-cl-denorms-are-zero
, and-cl-finite-math-only
.Additionally, the floating-point atomic min and max operations may behave differently than the
fmin
andfmax
built-in functions in some cases.For the floating-point atomic min operation:
-
min(x, y) = x if x < y and y otherwise,
-
min(-0, +0) = min(+0, -0) = +0 or -0,
-
min(x, qNaN) = min(qNaN, x) = x,
-
min(qNaN, qNaN) = qNaN,
-
min(x, sNaN) = min(sNaN, x) = NaN or x, and
-
min(NaN, sNaN) = min(sNaN, NaN) = NaN
For the floating-point atomic max operation:
-
max(x, y) = y if x < y and x otherwise,
-
max(-0, +0) = max(+0, -0) = +0 or -0,
-
max(x, qNaN) = max(qNaN, x) = x,
-
max(qNaN, qNaN) = qNaN,
-
max(x, sNaN) = max(sNaN, x) = NaN or x, and
-
max(NaN, sNaN) = max(sNaN, NaN) = NaN
-
Modifications to the OpenCL SPIR-V Environment Specification
- (Add a new section 5.2.X -
cl_ext_float_atomics
) -
If the OpenCL environment supports the extension
cl_ext_float_atomics
and theCL_DEVICE_
bitfield includesHALF_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_LOAD_STORE_EXT
orCL_DEVICE_LOCAL_FP_ATOMIC_LOAD_STORE_EXT
, then for the Atomic Instructions OpAtomicLoad, OpAtomicStore, and OpAtomicExchange:-
16-bit floating-point types are supported for the Result Type and type of Value.
-
When
CL_DEVICE_
includesHALF_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_LOAD_STORE_EXT
, the Pointer operand may be a pointer to the CrossWorkGroup Storage Class. -
When
CL_DEVICE_
includesHALF_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_LOCAL_FP_ATOMIC_LOAD_STORE_EXT
, the Pointer operand may be a pointer to the Workgroup Storage Class. -
When
CL_DEVICE_
includesHALF_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_LOAD_STORE_EXT
andCL_DEVICE_LOCAL_FP_ATOMIC_LOAD_STORE_EXT
, and the GenericPointer capability is supported and declared, the Pointer operand may be a pointer to the Generic Storage Class.
If the OpenCL environment supports the extension
cl_ext_float_atomics
and theCL_DEVICE_
, orSINGLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
bitfields includeDOUBLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_ADD_EXT
orCL_DEVICE_LOCAL_FP_ATOMIC_ADD_EXT
, then the environment must accept modules that declare use of the extensionSPV_EXT_shader_atomic_float_add
. If the OpenCL environment supports the extensioncl_ext_float_atomics
and theCL_DEVICE_
bitfield includesHALF_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_ADD_EXT
orCL_DEVICE_LOCAL_FP_ATOMIC_ADD_EXT
, then the environment must accept modules that declare use of the extensionsSPV_EXT_shader_atomic_float_add
andSPV_EXT_shader_atomic_float16_add
. Additionally:-
When
CL_DEVICE_
includesSINGLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_ADD_EXT
orCL_DEVICE_LOCAL_FP_ATOMIC_ADD_EXT
, the AtomicFloat32AddEXT capability must be supported. -
When
CL_DEVICE_
includesDOUBLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_ADD_EXT
orCL_DEVICE_LOCAL_FP_ATOMIC_ADD_EXT
, the AtomicFloat64AddEXT capability must be supported. -
When
CL_DEVICE_
includesHALF_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_ADD_EXT
orCL_DEVICE_LOCAL_FP_ATOMIC_ADD_EXT
, the AtomicFloat16AddEXT capability must be supported. -
For the Atomic Instruction OpAtomicFAddEXT added by these extensions:
-
The instruction may be affected by compiler options affecting floating-point behavior, such as
-cl-no-signed-zeros
,-cl-denorms-are-zero
, and-cl-finite-math-only
. -
When
CL_DEVICE_
,SINGLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
, orDOUBLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
includesHALF_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_ADD_EXT
, the Pointer operand may be a pointer to the CrossWorkGroup Storage Class. -
When
CL_DEVICE_
,SINGLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
, orDOUBLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
includesHALF_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_LOCAL_FP_ATOMIC_ADD_EXT
, the Pointer operand may be a pointer to the Workgroup Storage Class. -
When
CL_DEVICE_
,SINGLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
, orDOUBLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
includesHALF_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_ADD_EXT
andCL_DEVICE_LOCAL_FP_ATOMIC_ADD_EXT
, and the GenericPointer capability is supported and declared, the Pointer operand may be a pointer to the Generic Storage Class.
-
If the OpenCL environment supports the extension
cl_ext_float_atomics
and theCL_DEVICE_
,SINGLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
, orDOUBLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
bitfields includeHALF_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_MIN_MAX_EXT
orCL_DEVICE_LOCAL_FP_ATOMIC_MIN_MAX_EXT
, then the environment must accept modules that declare use of the extensionSPV_EXT_shader_atomic_float_min_max
. Additionally:-
When
CL_DEVICE_
includesSINGLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_MIN_MAX_EXT
orCL_DEVICE_LOCAL_FP_ATOMIC_MIN_MAX_EXT
, the AtomicFloat32MinMaxEXT capability must be supported. -
When
CL_DEVICE_
includesDOUBLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_MIN_MAX_EXT
orCL_DEVICE_LOCAL_FP_ATOMIC_MIN_MAX_EXT
, the AtomicFloat64MinMaxEXT capability must be supported. -
When
CL_DEVICE_
includesHALF_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_MIN_MAX_EXT
orCL_DEVICE_LOCAL_FP_ATOMIC_MIN_MAX_EXT
, the AtomicFloat16MinMaxEXT capability must be supported. -
For the Atomic Instructions OpAtomicFMinEXT and OpAtomicFMaxEXT added by this extension:
-
These instructions may be affected by compiler options affecting floating-point behavior, such as
-cl-no-signed-zeros
,-cl-denorms-are-zero
, and-cl-finite-math-only
. -
When
CL_DEVICE_
,SINGLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
, orDOUBLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
includesHALF_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_MIN_MAX_EXT
, the Pointer operand may be a pointer to the CrossWorkGroup Storage Class. -
When
CL_DEVICE_
,SINGLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
, orDOUBLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
includesHALF_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_LOCAL_FP_ATOMIC_MIN_MAX_EXT
, the Pointer operand may be a pointer to the Workgroup Storage Class. -
When
CL_DEVICE_
,SINGLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
, orDOUBLE_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_
includesHALF_ FP_ ATOMIC_ CAPABILITIES_ EXT CL_DEVICE_GLOBAL_FP_ATOMIC_MIN_MAX_EXT
andCL_DEVICE_LOCAL_FP_ATOMIC_MIN_MAX_EXT
, and the GenericPointer capability is supported and declared, the Pointer operand may be a pointer to the Generic Storage Class.
-
-
Issues
-
Do the enums added by this extension need an
EXT
suffix?RESOLVED
: Yes, as per the extension template, enums and APIs added by EXT extensions need anEXT
suffix. -
Do the OpenCL C built-in functions or types added by this extension need an
ext
prefix or suffix?RESOLVED
: No prefix is required for built-in functions added by EXT extensions if the functionality is unlikely to change if it becomes a KHR or core feature. -
Do we need to establish a naming convention for OpenCL C feature and feature test macro names added by extensions?
RESOLVED
: Yes, we will include a prefix in the name of the feature and feature test macro names for EXT and vendor extensions. This gives us the ability to change functionality if it becomes a KHR or core feature. Because this is an EXT extension it will use__opencl_c_ext_feature_name
for the OpenCL C feature names it adds. -
Do we need to support the legacy OpenCL C 1.x atomic syntax, or is it sufficient to only support the newer OpenCL C 2.0 atomic syntax?
RESOLVED
: We will only support the newer OpenCL 2.0 atomic syntax in the initial version of this extension. -
Do we need to document any special floating-point behavior for floating-point atomic add?
RESOLVED
: Floating-point atomic add may be affected by compiler options affecting floating-point behavior, such as-cl-no-signed-zeros
,-cl-denorms-are-zero
, and-cl-finite-math-only
, otherwise there is no special behavior. -
Do we need to document any special floating-point behavior for floating-point atomic min and max?
RESOLVED
: This spec inherits all of the special-case NaN behavior from the SPIR-V atomic min and max spec. Additionally, floating-point atomic min and max may be affected by compiler options affecting floating-point behavior, such as-cl-no-signed-zeros
,-cl-denorms-are-zero
, and-cl-finite-math-only
. Otherwise, there is no special behavior.