Copyright 20082023 The Khronos Group Inc.
This Specification is protected by copyright laws and contains material proprietary to Khronos. Except as described by these terms, it or any components may not be reproduced, republished, distributed, transmitted, displayed, broadcast or otherwise exploited in any manner without the express prior written permission of Khronos.
This Specification has been created under the Khronos Intellectual Property Rights Policy, which is Attachment A of the Khronos Group Membership Agreement available at www.khronos.org/files/member_agreement.pdf and defines the terms 'Scope', 'Compliant Portion', and 'Necessary Patent Claims'.
Khronos grants a conditional copyright license to use and reproduce the unmodified Specification for any purpose, without fee or royalty, EXCEPT no licenses to any patent, trademark or other intellectual property rights are granted under these terms. Parties desiring to implement the Specification and make use of Khronos trademarks in relation to that implementation, and receive reciprocal patent license protection under the Khronos Intellectual Property Rights Policy must become Adopters and confirm the implementation as conformant under the process defined by Khronos for this Specification; see https://www.khronos.org/adopters.
Khronos makes no, and expressly disclaims any, representations or warranties, express or implied, regarding this Specification, including, without limitation: merchantability, fitness for a particular purpose, noninfringement of any intellectual property, correctness, accuracy, completeness, timeliness, and reliability. Under no circumstances will Khronos, or any of its Promoters, Contributors or Members, or their respective partners, officers, directors, employees, agents or representatives be liable for any damages, whether direct, indirect, special or consequential damages for lost revenues, lost profits, or otherwise, arising from or in connection with these materials.
Where this Specification identifies specific sections of external references, only those specifically identified sections define normative functionality. The Khronos Intellectual Property Rights Policy excludes external references to materials and associated enabling technology not created by Khronos from the Scope of this specification, and any licenses that may be required to implement such referenced materials and associated technologies must be obtained separately and may involve royalty payments.
KhronosÂ® and VulkanÂ® are registered trademarks, and SPIRâ„¢, SPIRVâ„¢, and SYCLâ„¢ are trademarks of The Khronos Group Inc. OpenCLâ„¢ is a trademark of Apple Inc. used under license by Khronos. OpenGLÂ® is a registered trademark and the OpenGL ESâ„¢ and OpenGL SCâ„¢ logos are trademarks of Hewlett Packard Enterprise used under license by Khronos. All other product names, trademarks, and/or company names are used solely for identification and belong to their respective owners.
1. Introduction
OpenCL is an open, royaltyfree, standard for general purpose parallel programming across CPUs, GPUs, and other processors, giving software developers portable and efficient access to the power of heterogeneous processing platforms.
SPIRV is an open, royaltyfree, standard intermediate language capable of representing parallel compute kernels that are executed by implementations of the OpenCL standard.
SPIRV is adaptable to multiple execution environments: a SPIRV module is consumed by an execution environment, as specified by a client API. This document describes the SPIRV execution environment for implementations of the OpenCL standard. The SPIRV execution environment describes required support for some SPIRV capabilities, additional semantics for some SPIRV instructions, and additional validation rules that a SPIRV binary module must adhere to in order to be considered valid.
This document is written for compiler developers who are generating SPIRV modules intended to be consumed by the OpenCL API, for implementors of the OpenCL API who are consuming SPIRV modules, and for software developers who are using SPIRV modules with the OpenCL API.
2. Common Properties
This section describes common properties of all OpenCL environments that consume SPIRV modules.
A SPIRV module passed to an OpenCL environment is interpreted as a series of 32bit words in host endianness, with literal strings packed as described in the SPIRV specification. The first few words of the SPIRV module must be a magic number and a SPIRV version number, as described in the SPIRV specification.
2.1. Supported SPIRV Versions
An OpenCL environment describes the versions of SPIRV modules that it
supports using the CL_DEVICE_
query in OpenCL 2.1 or newer,
the CL_DEVICE_
query in OpenCL 3.0 or newer, or the
CL_DEVICE_
query in the cl_khr_il_program
extension.
OpenCL environments that support the cl_khr_il_program
extension or
OpenCL 2.1 must support SPIRV 1.0 modules. OpenCL environments that support
OpenCL 2.2 must support SPIRV 1.0, 1.1, and 1.2 modules.
Use the CL_DEVICE_
or CL_DEVICE_
query
to determine the versions of SPIRV modules that are supported by
OpenCL environments that support OpenCL 3.0.
2.2. Extended Instruction Sets
OpenCL environments supporting SPIRV must support SPIRV modules that import the OpenCL.std extended instruction set for OpenCL using OpExtInstImport. For example:
... = OpExtInstImport "OpenCL.std"
2.3. Source Language Encoding
If a SPIRV module represents a program written in OpenCL C, then the Source Language operand for the OpSource instruction should be OpenCL_C, and the 32bit literal language Version should describe the version of OpenCL C, encoded MSB to LSB as:
0  Major Number  Minor Number  Revision Number (optional)
If a SPIRV module represents a program written in OpenCL C++, then the Source Language operand for the OpSource instruction should be OpenCL_CPP, and the 32bit literal language Version should describe the version of OpenCL C++, encoded similarly.
The source language version is purely informational and has no semantic meaning.
2.4. Numerical Type Formats
For all OpenCL environments, floatingpoint types are represented and stored using IEEE754 semantics. All integer formats are represented and stored using 2’scomplement format.
2.5. Supported Types
The following types are supported by OpenCL environments. Note that some types may require additional capabilities, and may not be supported by all OpenCL environments.
OpenCL environments support arrays declared using OpTypeArray, structs declared using OpTypeStruct, functions declared using OpTypeFunction, and pointers declared using OpTypePointer.
2.5.1. Basic Scalar and Vector Types
OpTypeVoid is supported.
The following scalar types are supported by OpenCL environments:

OpTypeBool

OpTypeInt, with Width equal to 8, 16, 32, or 64, and with Signedness equal to zero, indicating no signedness semantics.

OpTypeFloat, with Width equal to 16, 32, or 64.
OpenCL environments support vector types declared using OpTypeVector. The vector Component Type may be any of the scalar types described above. Supported vector Component Counts are 2, 3, 4, 8, or 16.
2.5.2. ImageRelated Data Types
The following table describes the OpTypeImage image types supported by OpenCL environments:
Dim  Depth  Arrayed  Description 

1D 


A 1D image. 
1D 


A 1D image array. 
2D 


A 2D image. 
2D 


A 2D depth image. 
2D 


A 2D image array. 
2D 


A 2D depth image array. 
3D 


A 3D image. 
Buffer 


A 1D buffer image. 
OpTypeSampler may be used to declare sampler types in OpenCL environments.
OpTypeSampledImage may be used to declare combined image and sampler types in OpenCL environments.
2.5.3. Other Data Types
The following table describes other data types that may be used in an OpenCL environment:
Type  Description 

OpTypeEvent 
OpenCL event representing async copies from global to local memory and viceversa. 
OpTypeDeviceEvent 
OpenCL deviceside event representing commands enqueued to device command queues. 
OpTypePipe 
OpenCL pipe. 
OpTypeReserveId 
OpenCL pipe reservation identifier. 
OpTypeQueue 
OpenCL deviceside command queue. 
2.6. Image Channel Order Mapping
The following table describes how the results of the SPIRV OpImageQueryOrder instruction correspond to the OpenCL host API image channel orders.
SPIRV Image Channel Order  OpenCL Image Channel Order 







































2.7. Image Channel Data Type Mapping
The following table describes how the results of the SPIRV OpImageQueryFormat instruction correspond to the OpenCL host API image channel data types.
SPIRV Image Channel Data Type  OpenCL Image Channel Data Type 



































2.8. Kernels
An OpFunction in a SPIRV module that is identified with OpEntryPoint defines an OpenCL kernel that may be invoked using the OpenCL host API enqueue kernel interfaces.
2.8.1. Kernel Return Types
The Result Type for an OpFunction identified with OpEntryPoint must be OpTypeVoid.
2.8.2. Kernel Arguments
An OpFunctionParameter for an OpFunction that is identified with OpEntryPoint defines an OpenCL kernel argument. Allowed types for OpenCL kernel arguments are:

OpTypeInt

OpTypeFloat

OpTypeStruct

OpTypeVector

OpTypePointer

OpTypeSampler

OpTypeImage

OpTypePipe

OpTypeQueue
For OpTypeInt parameters, supported Widths are 8, 16, 32, and 64, and must have no signedness semantics.
For OpTypeFloat parameters, Width must be 32.
For OpTypeStruct parameters, supported structure Member Types are:

OpTypeInt

OpTypeFloat

OpTypeStruct

OpTypeVector

OpTypePointer
For OpTypePointer parameters, supported Storage Classes are:

CrossWorkgroup

Workgroup

UniformConstant
OpenCL kernel argument types must have a representation in the OpenCL host API.
Environments that support extensions or optional features may allow additional types in an entry point’s parameter list.
2.9. Builtin Variables
An OpVariable in a SPIRV module with the BuiltIn decoration represents a builtin variable. All builtin variables must be in the Input storage class.
The following table describes the required SPIRV type for builtin variables.
In this table, size_t
is used as a generic type to represent:

OpTypeInt with Width equal to 32 if the Addressing Model declared in OpMemoryModel is Physical32.

OpTypeInt with Width equal to 64 if the Addressing Model declared in OpMemoryModel is Physical64.
The mapping from an OpenCL C builtin function to the SPIRV BuiltIn is informational and nonnormative.
OpenCL C Function  SPIRV BuiltIn  Required SPIRV Type 


WorkDim 
OpTypeInt with Width equal to 32 

GlobalSize 
OpTypeVector of 3 components of 

GlobalInvocationId 
OpTypeVector of 3 components of 

WorkgroupSize 
OpTypeVector of 3 components of 

EnqueuedWorkgroupSize 
OpTypeVector of 3 components of 

LocalInvocationId 
OpTypeVector of 3 components of 

NumWorkgroups 
OpTypeVector of 3 components of 

WorkgroupId 
OpTypeVector of 3 components of 

GlobalOffset 
OpTypeVector of 3 components of 

GlobalLinearId 


LocalInvocationIndex 


SubgroupSize 
OpTypeInt with Width equal to 32 

SubgroupMaxSize 
OpTypeInt with Width equal to 32 

NumSubgroups 
OpTypeInt with Width equal to 32 

NumEnqueuedSubgroups 
OpTypeInt with Width equal to 32 

SubgroupId 
OpTypeInt with Width equal to 32 

SubgroupLocalInvocationId 
OpTypeInt with Width equal to 32 
2.10. Alignment of Types
Objects of type OpTypeInt, OpTypeFloat, and OpTypePointer must be aligned in memory to the size of the type in bytes. Objects of type OpTypeVector with these component types must be aligned in memory to the size of the vector type in bytes. For 3component vector types, the size of the vector type is four times the size the component type.
The compiler is responsible for aligning objects allocated by OpVariable to the appropriate alignment as required by the Result Type.
For OpTypePointer arguments to a function, the compiler may assume that the pointer is appropriately aligned as required by the Type that the pointer points to.
Behavior of an unaligned load or store is undefined.
3. Required Capabilities
3.1. SPIRV 1.0
An OpenCL environment that supports SPIRV 1.0 must support SPIRV 1.0 modules that declare the following capabilities:

Addresses

Float16Buffer

Int64

For Full Profile devices.


Int16

Int8

Kernel

Linkage

Vector16

DeviceEnqueue

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting DeviceSide Enqueue (where
CL_DEVICE_
is notDEVICE_ ENQUEUE_ CAPABILITIES 0
).


GenericPointer

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting the Generic Address Space (where
CL_DEVICE_
isGENERIC_ ADDRESS_ SPACE_ SUPPORT CL_TRUE
).


Groups

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting Subgroups (where
CL_DEVICE_
is notMAX_ NUM_ SUB_ GROUPS 0
) or Work Group Collective Functions (whereCL_DEVICE_
isWORK_ GROUP_ COLLECTIVE_ FUNCTIONS_ SUPPORT CL_TRUE
).


Pipes

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting Pipes (where
CL_DEVICE_
isPIPE_ SUPPORT CL_TRUE
).


ImageBasic

For devices supporting Images (where
CL_DEVICE_
isIMAGE_ SUPPORT CL_TRUE
)


Float64

For devices supporting Double Precision FloatingPoint (where
CL_DEVICE_
is notDOUBLE_ FP_ CONFIG 0
)

If the OpenCL environment supports the ImageBasic capability, then the following capabilities must also be supported:

LiteralSampler

Sampled1D

Image1D

SampledBuffer

ImageBuffer

ImageReadWrite

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting ReadWrite Images (where
CL_DEVICE_
is notMAX_ READ_ WRITE_ IMAGE_ ARGS 0
)

3.2. SPIRV 1.1
An OpenCL environment supporting SPIRV 1.1 must support SPIRV 1.1 modules that declare the capabilities required for SPIRV 1.0 modules, above.
In addition, an OpenCL environment consuming SPIRV 1.1 must support SPIRV 1.1 modules that declare the following capabilities:

SubgroupDispatch

For OpenCL 2.2 devices, or OpenCL 3.0 devices supporting Subgroups (where
CL_DEVICE_
is notMAX_ NUM_ SUB_ GROUPS 0
)


PipeStorage

For OpenCL 2.2 devices.

4. Validation Rules
The following are a list of validation rules that apply to SPIRV modules executing in all OpenCL environments:
The Execution Model declared in OpEntryPoint must be Kernel.
The Addressing Model declared in OpMemoryModel must be either:

Physical32 (for OpenCL devices reporting
32
forCL_DEVICE_
)ADDRESS_ BITS 
Physical64 (for OpenCL devices reporting
64
forCL_DEVICE_
)ADDRESS_ BITS
The Memory Model declared in OpMemoryModel must be OpenCL.
For all OpTypeInt integer typedeclaration instructions:

Signedness must be 0, indicating no signedness semantics.
For all OpTypeImage typedeclaration instructions:

Sampled Type must be OpTypeVoid.

Sampled must be 0, indicating that the image usage will be known at run time, not at compile time.

MS must be 0, indicating singlesampled content.

Arrayed may only be set to 1, indicating arrayed content, when Dim is set to 1D or 2D.

Image Format must be Unknown, indicating that the image does not have a specified format.

The optional image Access Qualifier must be present.
The image write instruction OpImageWrite must not include any optional Image Operands.
The image read instructions OpImageRead and OpImageSampleExplicitLod must not include the optional Image Operand ConstOffset.
For all Atomic Instructions:

Only 32bit integer types are supported for the Result Type and/or type of Value.

The Pointer operand must be a pointer to the Function, Workgroup, or CrossWorkgroup Storage Classes. Note that an Atomic Instruction on a pointer to the Function Storage Class is valid, but does not have defined behavior.

For OpenCL environments that support and declare the GenericPointer capability, the Pointer operand may be a pointer to the Generic Storage Class, however behavior is still undefined if the Generic pointer represents a pointer to the Function Storage Class.
Recursion is not supported. The static function call graph for an entry point must not contain cycles.
Whether irreducible control flow is legal is implementation defined.
For the instructions OpGroupAsyncCopy and OpGroupWaitEvents, Scope for Execution must be:

Workgroup
For the Group and Subgroup Instructions, Scope for Execution must be one of:

Workgroup

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting Work Group Collective Functions (where
CL_DEVICE_
isWORK_ GROUP_ COLLECTIVE_ FUNCTIONS_ SUPPORT CL_TRUE
).


Subgroup

For OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting Subgroups (where
CL_DEVICE_
is notMAX_ NUM_ SUB_ GROUPS 0
)

For all other instructions, Scope for Execution must be one of:

Workgroup

Subgroup

For OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting Subgroups (where
CL_DEVICE_
is notMAX_ NUM_ SUB_ GROUPS 0
)

In an OpenCL 1.2 environment, for the Barrier Instructions OpControlBarrier and OpMemoryBarrier, the Scope for Memory must be Workgroup, and the memoryorder constraint in Memory Semantics must be SequentiallyConsistent. Otherwise, Scope for Memory must be one of:

CrossDevice

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting
CL_DEVICE_
inATOMIC_ SCOPE_ ALL_ DEVICES CL_DEVICE_
.ATOMIC_ FENCE_ CAPABILITIES


Device

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting
CL_DEVICE_
inATOMIC_ SCOPE_ DEVICE CL_DEVICE_
.ATOMIC_ FENCE_ CAPABILITIES


Workgroup

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting
CL_DEVICE_
inATOMIC_ SCOPE_ WORK_ GROUP CL_DEVICE_
.ATOMIC_ FENCE_ CAPABILITIES


Subgroup

For OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting Subgroups (where
CL_DEVICE_
is notMAX_ NUM_ SUB_ GROUPS 0
).


Invocation

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting
CL_DEVICE_
inATOMIC_ SCOPE_ WORK_ ITEM CL_DEVICE_
.ATOMIC_ FENCE_ CAPABILITIES

And, the memoryorder constraint in Memory Semantics must be one of:

None (Relaxed)

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting
CL_DEVICE_
inATOMIC_ ORDER_ RELAXED CL_DEVICE_
.ATOMIC_ FENCE_ CAPABILITIES


Acquire, Release, or AcquireRelease

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting
CL_DEVICE_
inATOMIC_ ORDER_ ACQ_ REL CL_DEVICE_
.ATOMIC_ FENCE_ CAPABILITIES


SequentiallyConsistent

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting
CL_DEVICE_
inATOMIC_ ORDER_ SEQ_ CST CL_DEVICE_
.ATOMIC_ FENCE_ CAPABILITIES

In all OpenCL environments, for the Barrier Instruction OpControlBarrier, when the Scope for Execution is Subgroup, behavior is undefined unless all invocations in the subgroup execute the same dynamic instance of the instruction.
In an OpenCL 1.2 environment, for the Atomic Instructions, the Scope for Memory must be Device, and the memoryorder constraint in Memory Semantics must be Relaxed. Otherwise, Scope for Memory must be one of:

CrossDevice

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting
CL_DEVICE_
inATOMIC_ SCOPE_ ALL_ DEVICES CL_DEVICE_
.ATOMIC_ MEMORY_ CAPABILITIES


Device

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting
CL_DEVICE_
inATOMIC_ SCOPE_ DEVICE CL_DEVICE_
.ATOMIC_ MEMORY_ CAPABILITIES


Workgroup

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting
CL_DEVICE_
inATOMIC_ SCOPE_ WORK_ GROUP CL_DEVICE_
.ATOMIC_ MEMORY_ CAPABILITIES


Subgroup

For OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting Subgroups (where
CL_DEVICE_
is notMAX_ NUM_ SUB_ GROUPS 0
).

And, the memoryorder constraint in Memory Semantics must be one of:

None (Relaxed)

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting
CL_DEVICE_
inATOMIC_ ORDER_ RELAXED CL_DEVICE_
.ATOMIC_ MEMORY_ CAPABILITIES


Acquire, Release, or AcquireRelease

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting
CL_DEVICE_
inATOMIC_ ORDER_ ACQ_ REL CL_DEVICE_
.ATOMIC_ MEMORY_ CAPABILITIES


SequentiallyConsistent

For OpenCL 2.0, OpenCL 2.1, OpenCL 2.2, or OpenCL 3.0 devices supporting
CL_DEVICE_
inATOMIC_ ORDER_ SEQ_ CST CL_DEVICE_
.ATOMIC_ MEMORY_ CAPABILITIES

5. OpenCL Extensions
An OpenCL environment may be modified by OpenCL
extensions. For example, some OpenCL extensions may require support
for additional SPIRV capabilities or instructions, or relax SPIRV
restrictions.
Some OpenCL extensions may modify the OpenCL environment by requiring
consumption of a SPIRV module that uses a SPIRV extension. In this case,
the implementation will include the OpenCL extension in the host API
CL_PLATFORM_EXTENSIONS
or CL_DEVICE_EXTENSIONS
string, but not the
corresponding SPIRV extension.
This section describes how the OpenCL environment is modified by Khronos
(khr
) OpenCL extensions. Other OpenCL extensions, such as multivendor
(ext
) extensions or vendorspecific extensions, describe how they modify
the OpenCL environment in their individual extension specifications.
5.1. Declaring SPIRV Extensions
A SPIRV module declares use of a SPIRV extension using OpExtension and the name of the SPIRV extension. For example:
OpExtension "SPV_KHR_extension_name"
Only use of SPIRV extensions may be declared in a SPIRV module using OpExtension; there is never a need to declare use of an OpenCL extension in a SPIRV module using OpExtension.
5.2. Full and Embedded Profile Extensions
5.2.1. cl_khr_3d_image_writes
If the OpenCL environment supports the extension cl_khr_3d_image_writes
,
then the environment must accept Image operands to OpImageWrite that
are declared with with dimensionality Dim equal to 3D.
5.2.2. cl_khr_depth_images
If the OpenCL environment supports the extension cl_khr_depth_images
,
then the environment must accept modules that declare 2D depth image types
using OpTypeImage with dimensionality Dim equal to 2D and Depth
equal to 1, indicating a depth image. 2D depth images may optionally be
Arrayed, if supported.
Additionally, the following Image Channel Orders may be returned by OpImageQueryOrder:

Depth
5.2.3. cl_khr_device_enqueue_local_arg_types
If the OpenCL environment supports the extension
cl_khr_device_enqueue_local_arg_types
, then then environment will allow
Invoke functions to be passed to OpEnqueueKernel with Workgroup
memory pointer parameters of any type.
5.2.4. cl_khr_fp16
If the OpenCL environment supports the extension cl_khr_fp16
, then the
environment must accept modules that declare the following SPIRV
capabilities:

Float16
5.2.5. cl_khr_fp64
If the OpenCL environment supports the extension cl_khr_fp64
, then the
environment must accept modules that declare the following SPIRV
capabilities:

Float64
5.2.6. cl_khr_gl_depth_images
If the OpenCL environment supports the extension cl_khr_gl_depth_images
,
then the following Image Channel Orders may additionally be returned by
OpImageQueryOrder:

DepthStencil
Also, the following Image Channel Data Types may additionally be returned by OpImageQueryFormat:

UnormInt24
5.2.7. cl_khr_gl_msaa_sharing
If the OpenCL environment supports the extension cl_khr_gl_msaa_sharing
,
then the environment must accept modules that declare 2D multisampled
image types using OpTypeImage with dimensionality Dim equal to 2D and
MS equal to 1, indicating multisampled content. 2D multisampled images
may optionally be Arrayed or Depth images, if supported.
The 2D multisampled images may be used with the following instructions:

OpImageRead

OpImageQuerySizeLod

OpImageQueryFormat

OpImageQueryOrder

OpImageQuerySamples
5.2.8. cl_khr_int64_base_atomics
and cl_khr_int64_extended_atomics
If the OpenCL environment supports the extension cl_khr_int64_base_atomics
or cl_khr_int64_extended_atomics
, then the environment must accept modules
that declare the following SPIRV capabilities:

Int64Atomics
When the Int64Atomics capability is declared, 64bit integer types are valid for the Result Type and type of Value for all Atomic Instructions.
Note: OpenCL environments that consume SPIRV must support both
cl_khr_int64_base_atomics
and cl_khr_int64_extended_atomics
or neither
of these extensions.
5.2.9. cl_khr_mipmap_image
If the OpenCL environment supports the extension cl_khr_mipmap_image
,
then the environment must accept nonzero optional Lod Image Operands
for the following instructions:

OpImageSampleExplicitLod

OpImageRead

OpImageQuerySizeLod
Note: Implementations that support cl_khr_mipmap_image
are not guaranteed
to support the ImageMipmap capability, since this extension does not
require nonzero optional Lod Image Operands for OpImageWrite.
5.2.10. cl_khr_mipmap_image_writes
If the OpenCL environment supports the extension cl_khr_mipmap_image_writes
,
then the environment must accept nonzero optional Lod Image Operands
for the following instructions:

OpImageWrite
Note: An implementation that supports cl_khr_mipmap_image_writes
must also
support cl_khr_mipmap_image
, and support for both extensions does
guarantee support for the ImageMipmap capability.
5.2.11. cl_khr_subgroups
If the OpenCL environment supports the extension cl_khr_subgroups
, then
for all instructions except OpGroupAsyncCopy and OpGroupWaitEvents
the Scope for Execution may be:

Subgroup
Additionally, for all instructions except Atomic Instructions in an OpenCL 1.2 environment, the Scope for Memory may be:

Subgroup
5.2.12. cl_khr_subgroup_named_barrier
If the OpenCL environment supports the extension
cl_khr_subgroup_named_barrier
, then the environment must accept modules
that declare the following SPIRV capabilities:

NamedBarrier
5.2.13. cl_khr_spirv_no_integer_wrap_decoration
If the OpenCL environment supports the extension cl_khr_spirv_no_integer_wrap_decoration
, then the environment must accept modules that declare use of the extension SPV_KHR_no_integer_wrap_decoration
via OpExtension.
If the OpenCL environment supports the extension cl_khr_spirv_no_integer_wrap_decoration
and use of the SPIRV extension SPV_KHR_no_integer_wrap_decoration
is declared in the module via OpExtension, then the environment must accept modules that include the NoSignedWrap or NoUnsignedWrap decorations.
5.2.14. cl_khr_subgroup_extended_types
If the OpenCL environment supports the extension cl_khr_subgroup_extended_types
, then additional types are valid for the following for Groups instructions with Scope for Execution equal to Subgroup:

OpGroupBroadcast

OpGroupIAdd, OpGroupFAdd

OpGroupSMin, OpGroupUMin, OpGroupFMin

OpGroupSMax, OpGroupUMax, OpGroupFMax
For these instructions, valid types for Value are:

Scalars of supported types:

OpTypeInt (equivalent to
char
,uchar
,short
,ushort
,int
,uint
,long
, andulong
) 
OpTypeFloat (equivalent to
half
,float
, anddouble
)

Additionally, for OpGroupBroadcast, valid types for Value are:

OpTypeVectors with 2, 3, 4, 8, or 16 Component Count components of supported types:

OpTypeInt (equivalent to
charn
,ucharn
,shortn
,ushortn
,intn
,uintn
,longn
, andulongn
) 
OpTypeFloat (equivalent to
halfn
,floatn
, anddoublen
)

5.2.15. cl_khr_subgroup_non_uniform_vote
If the OpenCL environment supports the extension cl_khr_subgroup_non_uniform_vote
, then the environment must accept SPIRV modules that declare the following SPIRV capabilities:

GroupNonUniform

GroupNonUniformVote
For instructions requiring these capabilities, Scope for Execution may be:

Subgroup
For the instruction OpGroupNonUniformAllEqual, valid types for Value are:

Scalars of supported types:

OpTypeInt (equivalent to
char
,uchar
,short
,ushort
,int
,uint
,long
, andulong
) 
OpTypeFloat (equivalent to
half
,float
, anddouble
)

5.2.16. cl_khr_subgroup_ballot
If the OpenCL environment supports the extension cl_khr_subgroup_ballot
, then the environment must accept SPIRV modules that declare the following SPIRV capabilities:

GroupNonUniformBallot
For instructions requiring these capabilities, Scope for Execution may be:

Subgroup
For the nonuniform broadcast instruction OpGroupNonUniformBroadcast, valid types for Value are:

Scalars of supported types:

OpTypeInt (equivalent to
char
,uchar
,short
,ushort
,int
,uint
,long
, andulong
) 
OpTypeFloat (equivalent to
half
,float
, anddouble
)


OpTypeVectors with 2, 3, 4, 8, or 16 Component Count components of supported types:

OpTypeInt (equivalent to
charn
,ucharn
,shortn
,ushortn
,intn
,uintn
,longn
, andulongn
) 
OpTypeFloat (equivalent to
halfn
,floatn
, anddoublen
)

For the instruction OpGroupNonUniformBroadcastFirst, valid types for Value are:

Scalars of supported types:

OpTypeInt (equivalent to
char
,uchar
,short
,ushort
,int
,uint
,long
, andulong
) 
OpTypeFloat (equivalent to
half
,float
, anddouble
)

For the instruction OpGroupNonUniformBallot, the valid Result Type is an OpTypeVector with four Component Count components of OpTypeInt, with Width equal to 32 and Signedness equal to 0 (equivalent to uint4
).
For the instructions OpGroupNonUniformInverseBallot, OpGroupNonUniformBallotBitExtract, OpGroupNonUniformBallotBitCount, OpGroupNonUniformBallotFindLSB, and OpGroupNonUniformBallotFindMSB, the valid type for Value is an OpTypeVector with four Component Count components of OpTypeInt, with Width equal to 32 and Signedness equal to 0 (equivalent to uint4
).
For builtin variables decorated with SubgroupEqMask, SubgroupGeMask, SubgroupGtMask, SubgroupLeMask, or SubgroupLtMask, the supported variable type is an OpTypeVector with four Component Count components of OpTypeInt, with Width equal to 32 and Signedness equal to 0 (equivalent to uint4
).
5.2.17. cl_khr_subgroup_non_uniform_arithmetic
If the OpenCL environment supports the extension cl_khr_subgroup_non_uniform_arithmetic
, then the environment must accept SPIRV modules that declare the following SPIRV capabilities:

GroupNonUniformArithmetic
For instructions requiring these capabilities, Scope for Execution may be:

Subgroup
For the instructions OpGroupNonUniformLogicalAnd, OpGroupNonUniformLogicalOr, and OpGroupNonUniformLogicalXor, the valid type for Value is OpTypeBool.
Otherwise, for the GroupNonUniformArithmetic scan and reduction instructions, valid types for Value are:

Scalars of supported types:

OpTypeInt (equivalent to
char
,uchar
,short
,ushort
,int
,uint
,long
, andulong
) 
OpTypeFloat (equivalent to
half
,float
, anddouble
)

For the GroupNonUniformArithmetic scan and reduction instructions, the optional ClusterSize operand must not be present.
5.2.18. cl_khr_subgroup_shuffle
If the OpenCL environment supports the extension cl_khr_subgroup_shuffle
, then the environment must accept SPIRV modules that declare the following SPIRV capabilities:

GroupNonUniformShuffle
For instructions requiring these capabilities, Scope for Execution may be:

Subgroup
For the instructions OpGroupNonUniformShuffle and OpGroupNonUniformShuffleXor requiring these capabilities, valid types for Value are:

Scalars of supported types:

OpTypeInt (equivalent to
char
,uchar
,short
,ushort
,int
,uint
,long
, andulong
) 
OpTypeFloat (equivalent to
half
,float
, anddouble
)

5.2.19. cl_khr_subgroup_shuffle_relative
If the OpenCL environment supports the extension cl_khr_subgroup_shuffle_relative
, then the environment must accept SPIRV modules that declare the following SPIRV capabilities:

GroupNonUniformShuffleRelative
For instructions requiring these capabilities, Scope for Execution may be:

Subgroup
For the GroupNonUniformShuffleRelative instructions, valid types for Value are:

Scalars of supported types:

OpTypeInt (equivalent to
char
,uchar
,short
,ushort
,int
,uint
,long
, andulong
) 
OpTypeFloat (equivalent to
half
,float
, anddouble
)

5.2.20. cl_khr_subgroup_clustered_reduce
If the OpenCL environment supports the extension cl_khr_subgroup_clustered_reduce
, then the environment must accept SPIRV modules that declare the following SPIRV capabilities:

GroupNonUniformClustered
For instructions requiring these capabilities, Scope for Execution may be:

Subgroup
When the GroupNonUniformClustered capability is declared, the GroupNonUniformArithmetic scan and reduction instructions may include the optional ClusterSize operand.
5.2.21. cl_khr_spirv_extended_debug_info
If the OpenCL environment supports the extension cl_khr_spirv_extended_debug_info
, then the environment must accept modules
that import the OpenCL.DebugInfo.100
extended instruction set via OpExtInstImport.
5.2.22. cl_khr_spirv_linkonce_odr
If the OpenCL environment supports the extension cl_khr_spirv_linkonce_odr
, then the environment must accept modules that declare use of the extension SPV_KHR_linkonce_odr
via OpExtension.
If the OpenCL environment supports the extension cl_khr_spirv_linkonce_odr
and use of the SPIRV extension SPV_KHR_linkonce_odr
is declared in the module via OpExtension, then the environment must accept modules that include the LinkOnceODR linkage type.
5.2.23. cl_khr_extended_bit_ops
If the OpenCL environment supports the extension cl_khr_extended_bit_ops
, then the environment must accept modules that declare use of the extension SPV_KHR_bit_instructions
via OpExtension.
If the OpenCL environment supports the extension cl_khr_extended_bit_ops
and use of the SPIRV extension SPV_KHR_bit_instructions
is declared in the module via OpExtension, then the environment must accept modules that declare the BitInstructions capability.
5.2.24. cl_khr_integer_dot_product
If the OpenCL environment supports the extension cl_khr_integer_dot_product
,
then the environment must accept modules that require SPV_KHR_integer_dot_product
and
declare the following SPIRV capabilities:

DotProductKHR

DotProductInput4x8BitKHR if
CL_DEVICE_INTEGER_DOT_PRODUCT_INPUT_4x8BIT_KHR
is supported 
DotProductInput4x8BitPackedKHR
5.2.25. cl_khr_expect_assume
If the OpenCL environment supports the extension cl_khr_expect_assume
, then the environment must accept modules that declare use of the extension SPV_KHR_expect_assume
via OpExtension.
If the OpenCL environment supports the extension cl_khr_expect_assume
and use of the SPIRV extension SPV_KHR_expect_assume
is declared in the module via OpExtension, then the environment must accept modules that declare the following SPIRV capabilities:

ExpectAssumeKHR
5.2.26. cl_khr_subgroup_rotate
If the OpenCL environment supports the extension cl_khr_subgroup_rotate
,
then the environment accept modules that require SPV_KHR_subgroup_rotate
and
declare the following SPIRV capabilities:

GroupNonUniformRotateKHR
5.2.27. cl_khr_work_group_uniform_arithmetic
If the OpenCL environment supports the extension cl_khr_work_group_uniform_arithmetic
, then the environment must accept modules that declare use of the extension SPV_KHR_uniform_group_instructions
via OpExtension.
If the OpenCL environment supports the extension cl_khr_work_group_uniform_arithmetic
and use of the SPIRV extension SPV_KHR_uniform_group_instructions
is declared in the module via OpExtension, then the environment must accept modules that declare the following SPIRV capabilities:

GroupUniformArithmeticKHR
For instructions requiring these capabilities, Scope for Execution may be:

Workgroup
For the instructions OpGroupLogicalAndKHR, OpGroupLogicalOrKHR, and OpGroupLogicalXorKHR, the valid type for X is OpTypeBool.
Otherwise, for the GroupUniformArithmeticKHR scan and reduction instructions, valid types for X are:

Scalars of supported types:

OpTypeInt with Width equal to
32
or64
(equivalent toint
,uint
,long
, andulong
) 
OpTypeFloat (equivalent to
half
,float
, anddouble
)

6. OpenCL Numerical Compliance
This section describes features of the C++14 and IEEE754 standards that must be supported by all OpenCL compliant devices.
This section describes the functionality that must be supported by all OpenCL devices for single precision floatingpoint numbers. Currently, only single precision floatingpoint is a requirement. Half precision floatingpoint is an optional feature indicated by the Float16 capability. Double precision floatingpoint is also an optional feature indicated by the Float64 capability.
6.1. Rounding Modes
Floatingpoint calculations may be carried out internally with extra precision and then rounded to fit into the destination type. IEEE 754 defines four possible rounding modes:

Round to nearest even

Round toward +infinity

Round toward infinity

Round toward zero
The complete set of rounding modes supported by the device are described by
the CL_DEVICE_SINGLE_FP_CONFIG
, CL_DEVICE_HALF_FP_CONFIG
, and
CL_DEVICE_DOUBLE_FP_CONFIG
device queries.
For double precision operations, Round to nearest even is a required rounding mode, and is therefore the default rounding mode for double precision operations.
For single precision operations, devices supporting the full profile must support Round to nearest even, therefore for full profile devices this is the default rounding mode for single precision operations. Devices supporting the embedded profile may support either Round to nearest even or Round toward zero as the default rounding mode for single precision operations.
For half precision operations, devices may support either Round to nearest even or Round toward zero as the default rounding mode for half precision operations.
Only static selection of rounding mode is supported. Dynamically reconfiguring the rounding mode as specified by the IEEE 754 spec is not supported.
6.2. Rounding Modes for Conversions
Results of the following conversion instructions may include an optional FPRoundingMode decoration:

OpConvertFToU

OpConvertFToS

OpConvertSToF

OpConvertUToF

OpFConvert
The FPRoundingMode decoration may not be added to results of any other instruction.
If no rounding mode is specified explicitly via an FPRoundingMode decoration, then the default rounding mode for conversion operations is:

Round to nearest even, for conversions to floatingpoint types.

Round toward zero, for conversions from floatingpoint to integer types.
6.3. OutofRange Conversions
When a conversion operand is either greater than the greatest representable destination value or less than the least representable destination value, it is said to be outofrange.
Converting an outofrange integer to an integer type without a SaturatedConversion decoration follows C99/C++14 conversion rules.
Converting an outofrange floating point number to an integer type without a SaturatedConversion decoration is implementationdefined.
6.4. INF, NaN, and Denormalized Numbers
INFs and NaNs must be supported. Support for signaling NaNs is not required.
Support for denormalized numbers with single precision and half precision floatingpoint is optional. Denormalized single precision or half precision floatingpoint numbers passed as the input or produced as the output of single precision or half precision floatingpoint operations may be flushed to zero. Support for denormalized numbers is required for double precision floatingpoint.
Support for INFs, NaNs, and denormalized numbers is described by the
CL_FP_DENORM
and CL_FP_INF_NAN
bits in the CL_DEVICE_SINGLE_FP_CONFIG
,
CL_DEVICE_HALF_FP_CONFIG
, and CL_DEVICE_DOUBLE_FP_CONFIG
device queries.
6.5. FloatingPoint Exceptions
Floatingpoint exceptions are disabled in OpenCL. The result of a floatingpoint exception must match the IEEE 754 spec for the exceptionsnotenabled case. Whether and when the implementation sets floatingpoint flags or raises floatingpoint exceptions is implementationdefined.
This standard provides no method for querying, clearing or setting floatingpoint flags or trapping raised exceptions. Due to nonperformance, nonportability of trap mechanisms, and the impracticality of servicing precise exceptions in a vector context (especially on heterogeneous hardware), such features are discouraged.
Implementations that nevertheless support such operations through an extension to the standard shall initialize with all exception flags cleared and the exception masks set so that exceptions raised by arithmetic operations do not trigger a trap to be taken. If the underlying work is reused by the implementation, the implementation is however not responsible for reclearing the flags or resetting exception masks to default values before entering the kernel. That is to say that kernels that do not inspect flags or enable traps are licensed to expect that their arithmetic will not trigger a trap. Those kernels that do examine flags or enable traps are responsible for clearing flag state and disabling all traps before returning control to the implementation. Whether or when the underlying workitem (and accompanying global floatingpoint state if any) is reused is implementationdefined.
6.6. Relative Error as ULPs
In this section we discuss the maximum relative error defined as ulp (units in the last place). Addition, subtraction, multiplication, fused multiplyadd, and conversion between integer and a single precision floatingpoint format are IEEE 754 compliant and are therefore correctly rounded. Conversion between floatingpoint formats and explicit conversions must be correctly rounded.
The ULP is defined as follows:
If x is a real number that lies between two finite consecutive floatingpoint numbers a and b, without being equal to one of them, then ulp(x) = b  a, otherwise ulp(x) is the distance between the two nonequal finite floatingpoint numbers nearest x. Moreover, ulp(NaN) is NaN.
Attribution: This definition was taken with consent from JeanMichel Muller with slight clarification for behavior at zero. Refer to: On the definition of ulp(x).
0 ULP is used for math functions that do not require rounding. The reference value used to compute the ULP value is the infinitely precise result.
Result overflow within the specified ULP error is permitted. Math instructions are allowed to return infinity for a finite reference value when the next floatingpoint number that would be representable after the finite maximum, if there was sufficient range, meets ULP error tolerance.
6.6.1. ULP Values for Math Instructions  Full Profile
The ULP Values for Math Instructions table below describes the minimum accuracy of floatingpoint math arithmetic instructions for full profile devices given as ULP values.
SPIRV Instruction  Minimum Accuracy  Float64  Minimum Accuracy  Float32  Minimum Accuracy  Float16 

OpFAdd 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpFSub 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpFMul 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpFDiv 
Correctly rounded 
<= 2.5 ulp 
Correctly rounded 
OpExtInst acos 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst acosh 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst acospi 
<= 5 ulp 
<= 5 ulp 
<= 2 ulp 
OpExtInst asin 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst asinh 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst asinpi 
<= 5 ulp 
<= 5 ulp 
<= 2 ulp 
OpExtInst atan 
<= 5 ulp 
<= 5 ulp 
<= 2 ulp 
OpExtInst atanh 
<= 5 ulp 
<= 5 ulp 
<= 2 ulp 
OpExtInst atanpi 
<= 5 ulp 
<= 5 ulp 
<= 2 ulp 
OpExtInst atan2 
<= 6 ulp 
<= 6 ulp 
<= 2 ulp 
OpExtInst atan2pi 
<= 6 ulp 
<= 6 ulp 
<= 2 ulp 
OpExtInst cbrt 
<= 2 ulp 
<= 2 ulp 
<= 2 ulp 
OpExtInst ceil 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst copysign 
0 ulp 
0 ulp 
0 ulp 
OpExtInst cos 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst cosh 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst cospi 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst cross 
absolute error tolerance of 'max * max * (3 * HALF_EPSILON)' per vector component, where max is the maximum input operand magnitude 
absolute error tolerance of 'max * max * (3 * FLT_EPSILON)' per vector component, where max is the maximum input operand magnitude 
absolute error tolerance of 'max * max * (3 * FLT_EPSILON)' per vector component, where max is the maximum input operand magnitude 
OpExtInst degrees 
<= 2 ulp 
<= 2 ulp 
<= 2 ulp 
OpExtInst distance 
<= 2n ulp, for gentype with vector width n 
<= 2.5 + 2n ulp, for gentype with vector width n 
<= 5.5 + 2n ulp, for gentype with vector width n 
OpExtInst dot 
absolute error tolerance of 'max * max * (2n  1) * HALF_EPSILON', for vector width n and maximum input operand magnitude max across all vector components 
absolute error tolerance of 'max * max * (2n  1) * FLT_EPSILON', for vector width n and maximum input operand magnitude max across all vector components 
absolute error tolerance of 'max * max * (2n  1) * FLT_EPSILON', for vector width n and maximum input operand magnitude max across all vector components 
OpExtInst erfc 
<= 16 ulp 
<= 16 ulp 
<= 4 ulp 
OpExtInst erf 
<= 16 ulp 
<= 16 ulp 
<= 4 ulp 
OpExtInst exp 
<= 3 ulp 
<= 3 ulp 
<= 2 ulp 
OpExtInst exp2 
<= 3 ulp 
<= 3 ulp 
<= 2 ulp 
OpExtInst exp10 
<= 3 ulp 
<= 3 ulp 
<= 2 ulp 
OpExtInst expm1 
<= 3 ulp 
<= 3 ulp 
<= 2 ulp 
OpExtInst fabs 
0 ulp 
0 ulp 
0 ulp 
OpExtInst fclamp 
0 ulp 
0 ulp 
0 ulp 
OpExtInst fdim 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst floor 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst fma 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst fmax 
0 ulp 
0 ulp 
0 ulp 
OpExtInst fmax_common 
0 ulp 
0 ulp 
0 ulp 
OpExtInst fmin 
0 ulp 
0 ulp 
0 ulp 
OpExtInst fmin_common 
0 ulp 
0 ulp 
0 ulp 
OpExtInst fmod 
0 ulp 
0 ulp 
0 ulp 
OpExtInst fract 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst frexp 
0 ulp 
0 ulp 
0 ulp 
OpExtInst hypot 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst ilogb 
0 ulp 
0 ulp 
0 ulp 
OpExtInst ldexp 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst length 
<= 0.25 + 0.5n ulp, for gentype with vector width n 
<= 2.75 + 0.5n ulp, for gentype with vector width n 
<= 5.5 + n ulp, for gentype with vector width n 
OpExtInst lgamma 
Implementationdefined 
Implementationdefined 
Implementationdefined 
OpExtInst lgamma_r 
Implementationdefined 
Implementationdefined 
Implementationdefined 
OpExtInst log 
<= 3 ulp 
<= 3 ulp 
<= 2 ulp 
OpExtInst log2 
<= 3 ulp 
<= 3 ulp 
<= 2 ulp 
OpExtInst log10 
<= 3 ulp 
<= 3 ulp 
<= 2 ulp 
OpExtInst log1p 
<= 2 ulp 
<= 2 ulp 
<= 2 ulp 
OpExtInst logb 
0 ulp 
0 ulp 
0 ulp 
OpExtInst mad 
Implemented either as a correctly rounded fma, or as a multiply followed by an add, both of which are correctly rounded 
Implemented either as a correctly rounded fma, or as a multiply followed by an add, both of which are correctly rounded 
Implemented either as a correctly rounded fma, or as a multiply followed by an add, both of which are correctly rounded 
OpExtInst maxmag 
0 ulp 
0 ulp 
0 ulp 
OpExtInst minmag 
0 ulp 
0 ulp 
0 ulp 
OpExtInst mix 
Implementationdefined 
absolute error tolerance of 1e3 
Implementationdefined 
OpExtInst modf 
0 ulp 
0 ulp 
0 ulp 
OpExtInst nan 
0 ulp 
0 ulp 
0 ulp 
OpExtInst nextafter 
0 ulp 
0 ulp 
0 ulp 
OpExtInst normalize 
<= 1 + n ulp, for gentype with vector width n 
<= 2 + n ulp, for gentype with vector width n 
<= 4.5 + n ulp, for gentype with vector width n 
OpExtInst pow 
<= 16 ulp 
<= 16 ulp 
<= 4 ulp 
OpExtInst pown 
<= 16 ulp 
<= 16 ulp 
<= 4 ulp 
OpExtInst powr 
<= 16 ulp 
<= 16 ulp 
<= 4 ulp 
OpExtInst radians 
<= 2 ulp 
<= 2 ulp 
<= 2 ulp 
OpExtInst remainder 
0 ulp 
0 ulp 
0 ulp 
OpExtInst remquo 
0 ulp for the remainder, at least the lower 7 bits of the integral quotient 
0 ulp for the remainder, at least the lower 7 bits of the integral quotient 
0 ulp for the remainder, at least the lower 7 bits of the integral quotient 
OpExtInst rint 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst rootn 
<= 16 ulp 
<= 16 ulp 
<= 4 ulp 
OpExtInst round 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst rsqrt 
<= 2 ulp 
<= 2 ulp 
<= 1 ulp 
OpExtInst sign 
0 ulp 
0 ulp 
0 ulp 
OpExtInst sin 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst sincos 
<= 4 ulp for sine and cosine values 
<= 4 ulp for sine and cosine values 
<= 2 ulp for sine and cosine values 
OpExtInst sinh 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst sinpi 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst smoothstep 
Implementationdefined 
absolute error tolerance of 1e5 
Implementationdefined 
OpExtInst sqrt 
Correctly rounded 
<= 3 ulp 
Correctly rounded 
OpExtInst step 
0 ulp 
0 ulp 
0 ulp 
OpExtInst tan 
<= 5 ulp 
<= 5 ulp 
<= 2 ulp 
OpExtInst tanh 
<= 5 ulp 
<= 5 ulp 
<= 2 ulp 
OpExtInst tanpi 
<= 6 ulp 
<= 6 ulp 
<= 2 ulp 
OpExtInst tgamma 
<= 16 ulp 
<= 16 ulp 
<= 4 ulp 
OpExtInst trunc 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst half_cos 
<= 8192 ulp 

OpExtInst half_divide 
<= 8192 ulp 

OpExtInst half_exp 
<= 8192 ulp 

OpExtInst half_exp2 
<= 8192 ulp 

OpExtInst half_exp10 
<= 8192 ulp 

OpExtInst half_log 
<= 8192 ulp 

OpExtInst half_log2 
<= 8192 ulp 

OpExtInst half_log10 
<= 8192 ulp 

OpExtInst half_powr 
<= 8192 ulp 

OpExtInst half_recip 
<= 8192 ulp 

OpExtInst half_rsqrt 
<= 8192 ulp 

OpExtInst half_sin 
<= 8192 ulp 

OpExtInst half_sqrt 
<= 8192 ulp 

OpExtInst half_tan 
<= 8192 ulp 

OpExtInst fast_distance 
<= 8191.5 + 2n ulp, for gentype with vector width n 

OpExtInst fast_length 
<= 8191.5 + n ulp, for gentype with vector width n 

OpExtInst fast_normalize 
<= 8192 + n ulp, for gentype with vector width n 

OpExtInst native_cos 
Implementationdefined 

OpExtInst native_divide 
Implementationdefined 

OpExtInst native_exp 
Implementationdefined 

OpExtInst native_exp2 
Implementationdefined 

OpExtInst native_exp10 
Implementationdefined 

OpExtInst native_log 
Implementationdefined 

OpExtInst native_log2 
Implementationdefined 

OpExtInst native_log10 
Implementationdefined 

OpExtInst native_powr 
Implementationdefined 

OpExtInst native_recip 
Implementationdefined 

OpExtInst native_rsqrt 
Implementationdefined 

OpExtInst native_sin 
Implementationdefined 

OpExtInst native_sqrt 
Implementationdefined 

OpExtInst native_tan 
Implementationdefined 
6.6.2. ULP Values for Math Instructions  Embedded Profile
The ULP Values for Math instructions for Embedded Profile table below describes the minimum accuracy of floatingpoint math arithmetic operations given as ULP values for the embedded profile.
SPIRV Instruction  Minimum Accuracy  Float64  Minimum Accuracy  Float32  Minimum Accuracy  Float16 

OpFAdd 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpFSub 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpFMul 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpFDiv 
<= 3 ulp 
<= 3 ulp 
<= 1 ulp 
OpExtInst acos 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst acosh 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst acospi 
<= 5 ulp 
<= 5 ulp 
<= 3 ulp 
OpExtInst asin 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst asinh 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst asinpi 
<= 5 ulp 
<= 5 ulp 
<= 3 ulp 
OpExtInst atan 
<= 5 ulp 
<= 5 ulp 
<= 3 ulp 
OpExtInst atanh 
<= 5 ulp 
<= 5 ulp 
<= 3 ulp 
OpExtInst atanpi 
<= 5 ulp 
<= 5 ulp 
<= 3 ulp 
OpExtInst atan2 
<= 6 ulp 
<= 6 ulp 
<= 3 ulp 
OpExtInst atan2pi 
<= 6 ulp 
<= 6 ulp 
<= 3 ulp 
OpExtInst cbrt 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst ceil 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst copysign 
0 ulp 
0 ulp 
0 ulp 
OpExtInst cos 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst cosh 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst cospi 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst degrees 
<= 2 ulp 
<= 2 ulp 
<= 2 ulp 
OpExtInst erfc 
<= 16 ulp 
<= 16 ulp 
<= 4 ulp 
OpExtInst erf 
<= 16 ulp 
<= 16 ulp 
<= 4 ulp 
OpExtInst exp 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst exp2 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst exp10 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst expm1 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst fabs 
0 ulp 
0 ulp 
0 ulp 
OpExtInst fclamp 
0 ulp 
0 ulp 
0 ulp 
OpExtInst fdim 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst floor 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst fma 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst fmax 
0 ulp 
0 ulp 
0 ulp 
OpExtInst fmax_common 
0 ulp 
0 ulp 
0 ulp 
OpExtInst fmin 
0 ulp 
0 ulp 
0 ulp 
OpExtInst fmin_common 
0 ulp 
0 ulp 
0 ulp 
OpExtInst fmod 
0 ulp 
0 ulp 
0 ulp 
OpExtInst fract 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst frexp 
0 ulp 
0 ulp 
0 ulp 
OpExtInst hypot 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst ilogb 
0 ulp 
0 ulp 
0 ulp 
OpExtInst ldexp 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst lgamma 
Implementationdefined 
Implementationdefined 
Implementationdefined 
OpExtInst lgamma_r 
Implementationdefined 
Implementationdefined 
Implementationdefined 
OpExtInst log 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst log2 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst log10 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst log1p 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst logb 
0 ulp 
0 ulp 
0 ulp 
OpExtInst mad 
Implementationdefined 
Implementationdefined 
Implementationdefined 
OpExtInst maxmag 
0 ulp 
0 ulp 
0 ulp 
OpExtInst minmag 
0 ulp 
0 ulp 
0 ulp 
OpExtInst mix 
Implementationdefined 
Implementationdefined 
Implementationdefined 
OpExtInst modf 
0 ulp 
0 ulp 
0 ulp 
OpExtInst nan 
0 ulp 
0 ulp 
0 ulp 
OpExtInst nextafter 
0 ulp 
0 ulp 
0 ulp 
OpExtInst pow 
<= 16 ulp 
<= 16 ulp 
<= 5 ulp 
OpExtInst pown 
<= 16 ulp 
<= 16 ulp 
<= 5 ulp 
OpExtInst powr 
<= 16 ulp 
<= 16 ulp 
<= 5 ulp 
OpExtInst radians 
<= 2 ulp 
<= 2 ulp 
<= 2 ulp 
OpExtInst remainder 
0 ulp 
0 ulp 
0 ulp 
OpExtInst remquo 
0 ulp for the remainder, at least the lower 7 bits of the integral quotient 
0 ulp for the remainder, at least the lower 7 bits of the integral quotient 
0 ulp for the remainder, at least the lower 7 bits of the integral quotient 
OpExtInst rint 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst rootn 
<= 16 ulp 
<= 16 ulp 
<= 5 ulp 
OpExtInst round 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst rsqrt 
<= 4 ulp 
<= 4 ulp 
<= 1 ulp 
OpExtInst sign 
0 ulp 
0 ulp 
0 ulp 
OpExtInst sin 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst sincos 
<= 4 ulp for sine and cosine values 
<= 4 ulp for sine and cosine values 
<= 2 ulp for sine and cosine values 
OpExtInst sinh 
<= 4 ulp 
<= 4 ulp 
<= 3 ulp 
OpExtInst sinpi 
<= 4 ulp 
<= 4 ulp 
<= 2 ulp 
OpExtInst smoothstep 
Implementationdefined 
Implementationdefined 
Implementationdefined 
OpExtInst sqrt 
<= 4 ulp 
<= 4 ulp 
<= 1 ulp 
OpExtInst step 
0 ulp 
0 ulp 
0 ulp 
OpExtInst tan 
<= 5 ulp 
<= 5 ulp 
<= 3 ulp 
OpExtInst tanh 
<= 5 ulp 
<= 5 ulp 
<= 3 ulp 
OpExtInst tanpi 
<= 6 ulp 
<= 6 ulp 
<= 3 ulp 
OpExtInst tgamma 
<= 16 ulp 
<= 16 ulp 
<= 4 ulp 
OpExtInst trunc 
Correctly rounded 
Correctly rounded 
Correctly rounded 
OpExtInst half_cos 
<= 8192 ulp 

OpExtInst half_divide 
<= 8192 ulp 

OpExtInst half_exp 
<= 8192 ulp 

OpExtInst half_exp2 
<= 8192 ulp 

OpExtInst half_exp10 
<= 8192 ulp 

OpExtInst half_log 
<= 8192 ulp 

OpExtInst half_log2 
<= 8192 ulp 

OpExtInst half_log10 
<= 8192 ulp 

OpExtInst half_powr 
<= 8192 ulp 

OpExtInst half_recip 
<= 8192 ulp 

OpExtInst half_rsqrt 
<= 8192 ulp 

OpExtInst half_sin 
<= 8192 ulp 

OpExtInst half_sqrt 
<= 8192 ulp 

OpExtInst half_tan 
<= 8192 ulp 

OpExtInst native_cos 
Implementationdefined 

OpExtInst native_divide 
Implementationdefined 

OpExtInst native_exp 
Implementationdefined 

OpExtInst native_exp2 
Implementationdefined 

OpExtInst native_exp10 
Implementationdefined 

OpExtInst native_log 
Implementationdefined 

OpExtInst native_log2 
Implementationdefined 

OpExtInst native_log10 
Implementationdefined 

OpExtInst native_powr 
Implementationdefined 

OpExtInst native_recip 
Implementationdefined 

OpExtInst native_rsqrt 
Implementationdefined 

OpExtInst native_sin 
Implementationdefined 

OpExtInst native_sqrt 
Implementationdefined 

OpExtInst native_tan 
Implementationdefined 
6.6.3. ULP Values for Math Instructions  Unsafe Math Optimizations Enabled
The ULP Values for Math Instructions with Unsafe Math Optimizations table below describes the minimum accuracy of commonly used single precision floatingpoint math arithmetic instructions given as ULP values if the clunsafemathoptimizations compiler option is specified when compiling or building the OpenCL program.
For derived implementations, the operations used in the derivation may themselves be relaxed according to the ULP Values for Math Instructions with Unsafe Math Optimizations table.
The minimum accuracy of math functions not defined in the ULP Values for Math Instructions with Unsafe Math Optimizations table when the clunsafemathoptimizations compiler option is specified is as defined in the ULP Values for Math Instructions for Full Profile table when operating in the full profile, and as defined in the ULP Values for Math instructions for Embedded Profile table when operating in the embedded profile.
Function  Minimum Accuracy 

OpFDiv for 1.0 / x 
â‰¤ 2.5 ulp for x in the domain of 2^{126} to 2^{126} for the full profile, and â‰¤ 3 ulp for the embedded profile. 
OpFDiv for x / y 
â‰¤ 2.5 ulp for x in the domain of 2^{62} to 2^{62} and y in the domain of 2^{62} to 2^{62} for the full profile, and â‰¤ 3 ulp for the embedded profile. 
OpExtInst acos 
â‰¤ 4096 ulp 
OpExtInst acosh 
Derived implementations may implement as log(x + sqrt(x * x  1)). For nonderived implementations, the error is â‰¤ 8192 ulp. 
OpExtInst acospi 
Derived implementations may implement as acos(x) * 
OpExtInst asin 
â‰¤ 4096 ulp 
OpExtInst asinh 
Derived implementations may implement as log(x + sqrt(x * x + 1)). For nonderived implementations, the error is â‰¤ 8192 ulp. 
OpExtInst asinpi 
Derived implementations may implement as asin(x) * 
OpExtInst atan 
â‰¤ 4096 ulp 
OpExtInst atanh 
Defined for x in the domain (1, 1). For x in [2^{10}, 2^{10}], derived implementations may implement as x. For x outside of [2^{10}, 2^{10}], derived implementations may implement as 0.5f * log1.0f + x) / (1.0f  x. For nonderived implementations, the error is â‰¤ 8192 ulp. 
OpExtInst atanpi 
Derived implementations may implement as atan(x) * 
OpExtInst atan2 
Derived implementations may implement as atan(y / x) for x > 0,
atan(y / x) + 
OpExtInst atan2pi 
Derived implementations may implement as atan2(y, x) * 
OpExtInst cbrt 
Derived implementations may implement as rootn(x, 3). For nonderived implementations, the error is â‰¤ 8192 ulp. 
OpExtInst cos 
For x in the domain [Ï€, Ï€], the maximum absolute error is â‰¤ 2^{11} and larger otherwise. 
OpExtInst cosh 
Defined for x in the domain [88, 88]. Derived implementations may implement as 0.5f * (exp(x) + exp(x)). For nonderived implementations, the error is â‰¤ 8192 ulp. 
OpExtInst cospi 
For x in the domain [1, 1], the maximum absolute error is â‰¤ 2^{11} and larger otherwise. 
OpExtInst exp 
â‰¤ 3 + floor(fabs(2 * x)) ulp for the full profile, and â‰¤ 4 ulp for the embedded profile. 
OpExtInst exp2 
â‰¤ 3 + floor(fabs(2 * x)) ulp for the full profile, and â‰¤ 4 ulp for the embedded profile. 
OpExtInst exp10 
Derived implementations may implement as exp2(x * log2(10)). For nonderived implementations, the error is â‰¤ 8192 ulp. 
OpExtInst expm1 
Derived implementations may implement as exp(x)  1. For nonderived implementations, the error is â‰¤ 8192 ulp. 
OpExtInst log 
For x in the domain [0.5, 2] the maximum absolute error is â‰¤ 2^{21}; otherwise the maximum error is â‰¤ 3 ulp for the full profile and â‰¤ 4 ulp for the embedded profile. 
OpExtInst log2 
For x in the domain [0.5, 2] the maximum absolute error is â‰¤ 2^{21}; otherwise the maximum error is â‰¤ 3 ulp for the full profile and â‰¤ 4 ulp for the embedded profile. 
OpExtInst log10 
For x in the domain [0.5, 2] the maximum absolute error is â‰¤ 2^{21}; otherwise the maximum error is â‰¤ 3 ulp for the full profile and â‰¤ 4 ulp for the embedded profile. 
OpExtInst log1p 
Derived implementations may implement as log(x + 1). For nonderived implementations, the error is â‰¤ 8192 ulp. 
OpExtInst pow 
Undefined for x = 0 and y = 0. Undefined for x < 0 and noninteger y. Undefined for x < 0 and y outside the domain [2^{24}, 2^{24}]. For x > 0 or x < 0 and even y, derived implementations may implement as exp2(y * log2(fabs(x))). For x < 0 and odd y, derived implementations may implement as exp2(y * log2(fabs(x)). For x == 0 and nonzero y, for derived implementations may return zero. For nonderived implementations, the error is â‰¤ 8192 ulp. On some implementations, powr() or pown() may perform faster than pow(). If x is known to be >= 0, consider using powr() in place of pow(), or if y is known to be an integer, consider using pown() in place of pow(). 
OpExtInst pown 
Defined only for integer values of y. Undefined for x = 0 and y = 0. For x >= 0 or x < 0 and even y, derived implementations may implement as exp2(y * log2(fabs(x))). For x < 0 and odd y, derived implementations may implement as exp2(y * log2(fabs(x))). For nonderived implementations, the error is â‰¤ 8192 ulp. 
OpExtInst powr 
Defined only for x >= 0. Undefined for x = 0 and y = 0. Derived implementations may implement as exp2(y * log2(x)). For nonderived implementations, the error is â‰¤ 8192 ulp. 
OpExtInst rootn 
Defined for x > 0 when y is nonzero, derived implementations may implement this case as exp2(log2(x) / y). Defined for x < 0 when y is odd, derived implementations may implement this case as exp2(log2(x) / y). Defined for x = +/0 when y > 0, derived implementations may return +0 in this case. For nonderived implementations, the error is â‰¤ 8192 ulp. 
OpExtInst sin 
For x in the domain [Ï€, Ï€], the maximum absolute error is â‰¤ 2^{11} and larger otherwise. 
OpExtInst sincos 
ulp values as defined for sin(x) and cos(x). 
OpExtInst sinh 
Defined for x in the domain [88, 88]. For x in [2^{10}, 2^{10}], derived implementations may implement as x. For x outside of [2^{10}, 2^{10}], derived implementations may implement as 0.5f * (exp(x)  exp(x)). For nonderived implementations, the error is â‰¤ 8192 ulp. 
OpExtInst sinpi 
For x in the domain [1, 1], the maximum absolute error is â‰¤ 2^{11} and larger otherwise. 
OpExtInst tan 
Derived implementations may implement as sin(x) * (1.0f / cos(x)). For nonderived implementations, the error is â‰¤ 8192 ulp. 
OpExtInst tanh 
Defined for x in the domain [âˆž, âˆž]. For x in [2^{10}, 2^{10}], derived implementations may implement as x. For x outside of [2^{10}, 2^{10}], derived implementations may implement as (exp(x)  exp(x)) / (exp(x) + exp(x)). For nonderived implementations, the error is â‰¤ 8192 ULP. 
OpExtInst tanpi 
Derived implementations may implement as tan(x * 
OpFMul and OpFAdd, 
Implemented either as a correctly rounded fma or as a multiply and an add both of which are correctly rounded. 
6.7. Edge Case Behavior
The edge case behavior of the math functions shall conform to sections F.9 and G.6 of ISO/IEC 9899:TC 2, except where noted below in the Additional Requirements Beyond ISO/IEC 9899:TC2 section.
6.7.1. Additional Requirements Beyond ISO/IEC 9899:TC2
All functions that return a NaN should return a quiet NaN.
The usual allowances for rounding error (Relative Error as ULPs section) or flushing behavior (Edge Case Behavior in Flush To Zero Mode section) shall not apply for those values for which section F.9 of ISO/IEC 9899:,TC2, or Additional Requirements Beyond ISO/IEC 9899:TC2 and Edge Case Behavior in Flush To Zero Mode sections below (and similar sections for other floatingpoint precisions) prescribe a result (e.g. ceil( 1 < x < 0 ) returns 0). Those values shall produce exactly the prescribed answers, and no other. Where the Â± symbol is used, the sign shall be preserved. For example, sin(Â±0) = Â±0 shall be interpreted to mean sin(+0) is +0 and sin(0) is 0.

OpExtInst acospi:

acospi( 1 ) = +0.

acospi( x ) returns a NaN for  x  > 1.


OpExtInst asinpi:

asinpi( Â±0 ) = Â±0.

asinpi( x ) returns a NaN for  x  > 1.


OpExtInst atanpi:

atanpi( Â±0 ) = Â±0.

atanpi ( Â±âˆž ) = Â±0.5.


OpExtInst atan2pi:

atan2pi ( Â±0, 0 ) = Â±1.

atan2pi ( Â±0, +0 ) = Â± 0.

atan2pi ( Â±0, x ) returns Â± 1 for x < 0.

atan2pi ( Â±0, x) returns Â± 0 for x > 0.

atan2pi ( y, Â±0 ) returns 0.5 for y < 0.

atan2pi ( y, Â±0 ) returns 0.5 for y > 0.

atan2pi ( Â±y, âˆž ) returns Â± 1 for finite y > 0.

atan2pi ( Â±y, +âˆž ) returns Â± 0 for finite y > 0.

atan2pi ( Â±âˆž, x ) returns Â± 0.5 for finite x.

atan2pi (Â±âˆž, âˆž ) returns Â±0.75.

atan2pi (Â±âˆž, +âˆž ) returns Â±0.25.


OpExtInst ceil:

ceil( 1 < x < 0 ) returns 0.


OpExtInst cospi:

cospi( Â±0 ) returns 1

cospi( n + 0.5 ) is +0 for any integer n where n + 0.5 is representable.

cospi( Â±âˆž ) returns a NaN.


OpExtInst exp10:

exp10( Â±0 ) returns 1.

exp10( âˆž ) returns +0.

exp10( +âˆž ) returns +âˆž.


OpExtInst distance:

distance(x, y) calculates the distance from x to y without overflow or extraordinary precision loss due to underflow.


OpExtInst fdim:

fdim( any, NaN ) returns NaN.

fdim( NaN, any ) returns NaN.


OpExtInst fmod:

fmod( Â±0, NaN ) returns NaN.


OpExtInst fract:

fract( x, iptr) shall not return a value greater than or equal to 1.0, and shall not return a value less than 0.

fract( +0, iptr ) returns +0 and +0 in iptr.

fract( 0, iptr ) returns 0 and 0 in iptr.

fract( +inf, iptr ) returns +0 and +inf in iptr.

fract( inf, iptr ) returns 0 and inf in iptr.

fract( NaN, iptr ) returns the NaN and NaN in iptr.


OpExtInst frexp:

frexp( Â±âˆž, exp ) returns Â±âˆž and stores 0 in exp.

frexp( NaN, exp ) returns the NaN and stores 0 in exp.


OpExtInst length:

length calculates the length of a vector without overflow or extraordinary precision loss due to underflow.


OpExtInst lgamma_r:

lgamma_r( x, signp ) returns 0 in signp if x is zero or a negative integer.


OpExtInst nextafter:

nextafter( 0, y > 0 ) returns smallest positive denormal value.

nextafter( +0, y < 0 ) returns smallest negative denormal value.


OpExtInst normalize:

normalize shall reduce the vector to unit length, pointing in the same direction without overflow or extraordinary precision loss due to underflow.

normalize( v ) returns v if all elements of v are zero.

normalize( v ) returns a vector full of NaNs if any element is a NaN.

normalize( v ) for which any element in v is infinite shall proceed as if the elements in v were replaced as follows:
for( i = 0; i < sizeof(v) / sizeof(v[0] ); i++ ) v[i] = isinf(v[i] ) ? copysign(1.0, v[i]) : 0.0 * v [i];


OpExtInst pow:

pow( Â±0, âˆž ) returns +âˆž


OpExtInst pown:

pown( x, 0 ) is 1 for any x, even zero, NaN or infinity.

pown( Â±0, n ) is Â±âˆž for odd n < 0.

pown( Â±0, n ) is +âˆž for even n < 0.

pown( Â±0, n ) is +0 for even n > 0.

pown( Â±0, n ) is Â±0 for odd n > 0.


OpExtInst powr:

powr( x, Â±0 ) is 1 for finite x > 0.

powr( Â±0, y ) is +âˆž for finite y < 0.

powr( Â±0, âˆž) is +âˆž.

powr( Â±0, y ) is +0 for y > 0.

powr( +1, y ) is 1 for finite y.

powr( x, y ) returns NaN for x < 0.

powr( Â±0, Â±0 ) returns NaN.

powr( +âˆž, Â±0 ) returns NaN.

powr( +1, Â±âˆž ) returns NaN.

powr( x, NaN ) returns the NaN for x >= 0.

powr( NaN, y ) returns the NaN.


OpExtInst rint:

rint( 0.5 <= x < 0 ) returns 0.


OpExtInst remquo:

remquo(x, y, &quo) returns a NaN and 0 in quo if x is Â±âˆž, or if y is 0 and the other argument is nonNaN or if either argument is a NaN.


OpExtInst rootn:

rootn( Â±0, n ) is Â±âˆž for odd n < 0.

rootn( Â±0, n ) is +âˆž for even n < 0.

rootn( Â±0, n ) is +0 for even n > 0.

rootn( Â±0, n ) is Â±0 for odd n > 0.

rootn( x, n ) returns a NaN for x < 0 and n is even.

rootn( x, 0 ) returns a NaN.


OpExtInst round:

round( 0.5 < x < 0 ) returns 0.


OpExtInst sinpi:

sinpi( Â±0 ) returns Â±0.

sinpi( +n) returns +0 for positive integers n.

sinpi( n ) returns 0 for negative integers n.

sinpi( Â±âˆž ) returns a NaN.


OpExtInst tanpi:

tanpi( Â±0 ) returns Â±0.

tanpi( Â±âˆž ) returns a NaN.

tanpi( n ) is copysign( 0.0, n ) for even integers n.

tanpi( n ) is copysign( 0.0,  n) for odd integers n.

tanpi( n + 0.5 ) for even integer n is +âˆž where n + 0.5 is representable.

tanpi( n + 0.5 ) for odd integer n is âˆž where n + 0.5 is representable.


OpExtInst trunc:

trunc( 1 < x < 0 ) returns 0.

6.7.2. Changes to ISO/IEC 9899: TC2 Behavior
OpExtInst modf behaves as though implemented by:
gentype modf( gentype value, gentype *iptr )
{
*iptr = trunc( value );
return copysign( isinf( value ) ? 0.0 : value  *iptr, value );
}
OpExtInst rint always rounds according to round to nearest even rounding mode even if the caller is in some other rounding mode.
6.7.3. Edge Case Behavior in Flush To Zero Mode
If denormals are flushed to zero, then a function may return one of four results:

Any conforming result for nonflushtozero mode.

If the result given by 1 is a subnormal before rounding, it may be flushed to zero.

Any nonflushed conforming result for the function if one or more of its subnormal operands are flushed to zero.

If the result of 3 is a subnormal before rounding, the result may be flushed to zero.
In each of the above cases, if an operand or result is flushed to zero, the sign of the zero is undefined.
If subnormals are flushed to zero, a device may choose to conform to the following edge cases for OpExtInst nextafter instead of those listed in Additional Requirements Beyond ISO/IEC 9899:TC2 section:

nextafter ( +smallest normal, y < +smallest normal ) = +0.

nextafter ( smallest normal, y > smallest normal ) = 0.

nextafter ( 0, y > 0 ) returns smallest positive normal value.

nextafter ( +0, y < 0 ) returns smallest negative normal value.
For clarity, subnormals or denormals are defined to be the set of representable numbers in the range 0 < x < TYPE_MIN and TYPE_MIN < x < 0. They do not include Â±0. A nonzero number is said to be subnormal before rounding if, after normalization, its radix2 exponent is less than (TYPE_MIN_EXP  1). ^{[1]}
7. Image Addressing and Filtering
This section describes how image operations behave in an OpenCL environment.
7.1. Image Coordinates
Let w_{t}
, h_{t}
and d_{t}
be the width, height (or image array size for a 1D image array) and depth (or image array size for a 2D image array) of the image in pixels.
Let coord.xy
(also referred to as (s,t)
) or coord.xyz
(also referred to as (s,t,r)
) be the coordinates specified to an image read instruction (such as OpImageRead) or an image write instruction (such as OpImageWrite).
If image coordinates specified to an image read instruction are normalized (as specified in the sampler), the s
, t
, and r
coordinate values are multiplied by w_{t}
, h_{t}
and d_{t}
respectively to generate the unnormalized coordinate values.
For image arrays, the image array coordinate (i.e. t
if it is a 1D image array or r
if it is a 2D image array) specified to the image read instruction must always be the unnormalized image coordinate value.
Image coordinates specified to an image write instruction are always unnormalized image coordinate values.
Let (u,v,w)
represent the unnormalized image coordinate values.
If values in (s,t,r)
or (u,v,w)
are INF or NaN, the behavior of the image read instruction or image write instruction is undefined.
7.2. Addressing and Filter Modes
After generating the image coordinate (u,v,w)
we apply the appropriate addressing and filter mode to generate the appropriate sample locations to read from the image.
7.2.1. Clamp and None Addressing Modes
We first describe how the addressing and filter modes are applied to generate the appropriate sample locations to read from the image if the addressing mode is CL_ADDRESS_CLAMP
, CL_ADDRESS_CLAMP_TO_EDGE
, or CL_ADDRESS_NONE
.
7.2.1.1. Nearest Filtering
When the filter mode is CL_FILTER_NEAREST
, the result of the image read instruction is the image element that is nearest (in Manhattan distance) to the image element location (i,j,k)
.
The image element location (i,j,k)
is computed as:
For a 3D image, the image element at location (i,j,k)
becomes the color value.
For a 2D image, the image element at location (i,j)
becomes the color value.
The below table describes the address_mode
function.
Addressing Mode  Result of address_mode(coord) 


clamp (coord, 1, size) 

clamp (coord, 0, size  1) 

coord 
The size term in the table above is w_{t}
for u, h_{t}
for v and d_{t}
for w.
The clamp function used in the table above is defined as:
If the addressing mode is CL_ADDRESS_CLAMP
or CL_ADDRESS_CLAMP_TO_EDGE
, and the selected texel location (i,j,k)
refers to a location outside the image, the border color is used as the color value for the texel.
Otherwise, if the addressing mode is CL_ADDRESS_NONE
and the selected texel location (i,j,k)
refers to a location outside the image, the color value for the texel is undefined.
7.2.1.2. Linear Filtering
When the filter mode is CL_FILTER_LINEAR
, a 2 x 2 square of image elements (for a 2D image) or a 2 x 2 x 2 cube of image elements (for a 3D image is selected).
This 2 x 2 square or 2 x 2 x 2 cube is obtained as follows.
Let:
The frac function determines the fractional part of x and is computed as:
For a 3D image, the color value is computed as:
where T_{ijk}
is the image element at location (i,j,k)
in the 3D image.
For a 2D image, the color value is computed as:
where T_{ij}
is the image element at location (i,j)
in the 2D image.
If the addressing mode is CL_ADDRESS_CLAMP
or CL_ADDRESS_CLAMP_TO_EDGE
, and any of the selected T_{ijk}
or T_{ij}
refers to a location outside the image, the border color is used as the image element.
Otherwise, if the addressing mode is CL_ADDRESS_NONE
, and any of the selected T_{ijk}
or T_{ij}
refers to a location outside the image, the color value is undefined.
If the image channel type is CL_FLOAT
or CL_HALF_FLOAT
, and any of the image elements T_{ijk}
or T_{ij}
is INF or NaN, the color value is undefined.
7.2.2. Repeat Addressing Mode
We now discuss how the addressing and filter modes are applied to generate the appropriate sample locations to read from the image if the addressing mode is CL_ADDRESS_REPEAT
.
7.2.2.1. Nearest Filtering
When filter mode is CL_FILTER_NEAREST
, the result of the image read instruction is the image element that is nearest (in Manhattan distance) to the image element location (i,j,k)
.
The image element location (i,j,k)
is computed as:
For a 3D image, the image element at location (i, j, k) becomes the color value. For a 2D image, the image element at location (i, j) becomes the color value.
7.2.2.2. Linear Filtering
When filter mode is CL_FILTER_LINEAR
, a 2 x 2 square of image elements for a 2D image or a 2 x 2 x 2 cube of image elements for a 3D image is selected.
This 2 x 2 square or 2 x 2 x 2 cube is obtained as follows.
Let
For a 3D image, the color value is computed as:
where T_{ijk}
is the image element at location (i,j,k)
in the 3D image.
For a 2D image, the color value is computed as:
where T_{ij}
is the image element at location (i,j)
in the 2D image.
If the image channel type is CL_FLOAT
or CL_HALF_FLOAT
, and any of the image elements T_{ijk}
or T_{ij}
is INF or NaN, the color value is undefined.
7.2.3. Mirrored Repeat Addressing Mode
We now discuss how the addressing and filter modes are applied to generate the appropriate sample locations to read from the image if the addressing mode is CL_ADDRESS_MIRRORED_REPEAT
.
The CL_ADDRESS_MIRRORED_REPEAT
addressing mode causes the image to be read as if it is tiled at every integer seam, with the interpretation of the image data flipped at each integer crossing.
7.2.3.1. Nearest Filtering
When filter mode is CL_FILTER_NEAREST
, the result of the image read instruction is the image element that is nearest (in Manhattan distance) to the image element location (i,j,k)
.
The image element location (i,j,k)
is computed as:
For a 3D image, the image element at location (i, j, k) becomes the color value. For a 2D image, the image element at location (i, j) becomes the color value.
7.2.3.2. Linear Filtering
When filter mode is CL_FILTER_LINEAR
, a 2 x 2 square of image elements for a 2D image or a 2 x 2 x 2 cube of image elements for a 3D image is selected.
This 2 x 2 square or 2 x 2 x 2 cube is obtained as follows.
Let
For a 3D image, the color value is computed as:
where T_{ijk}
is the image element at location (i,j,k)
in the 3D image.
For a 2D image, the color value is computed as:
where T_{ij}
is the image element at location (i,j)
in the 2D image.
For a 1D image, the color value is computed as:
where T_{i}
is the image element at location (i)
in the 1D image.
If the image channel type is CL_FLOAT
or CL_HALF_FLOAT
and any of the image elements T_{ijk}
or T_{ij}
is INF or NaN, the color value is undefined.
7.3. Precision of Addressing and Filter Modes
If the sampler is specified as using unnormalized coordinates (floatingpoint or integer coordinates), filter mode set to CL_FILTER_NEAREST
and addressing mode set to one of the following modes  CL_ADDRESS_CLAMP
, CL_ADDRESS_CLAMP_TO_EDGE
or CL_ADDRESS_NONE
 the location of the image element in the image given by (i,j,k)
will be computed without any loss of precision.
For all other sampler combinations of normalized or unnormalized coordinates, filter modes, and addressing modes, the relative error or precision of the addressing mode calculations and the image filter operation are not defined.
To ensure precision of image addressing and filter calculations across any OpenCL device for these sampler combinations, developers may unnormalize the image coordinate in the kernel, and then implement the linear filter in the kernel with appropriate read image instructions with a sampler that uses unnormalized coordinates, filter mode set to CL_FILTER_NEAREST
, addressing mode set to CL_ADDRESS_CLAMP
, CL_ADDRESS_CLAMP_TO_EDGE
or CL_ADDRESS_NONE
, and finally performing the interpolation of color values read from the image to generate the filtered color value.
7.4. Conversion Rules
In this section we discuss conversion rules that are applied when reading and writing images in a kernel.
7.4.1. Conversion Rules for Normalized Integer Channel Data Types
In this section we discuss converting normalized integer channel data types to halfprecision and singleprecision floatingpoint values and viceversa.
7.4.1.1. Converting Normalized Integer Channel Data Types to Half Precision Floatingpoint Values
For images created with image channel data type of CL_UNORM_INT8
and CL_UNORM_INT16
, image read instructions will convert the channel values from an 8bit or 16bit unsigned integer to normalized half precision floatingpoint values in the range [0.0h … 1.0h].
For images created with image channel data type of CL_SNORM_INT8
and CL_SNORM_INT16
, image read instructions will convert the channel values from an 8bit or 16bit signed integer to normalized half precision floatingpoint values in the range [1.0h … 1.0h].
These conversions are performed as follows:

CL_UNORM_INT8
(8bit unsigned integer) →half
\[normalized\_half\_value(x)=round\_to\_half(\frac{x}{255})\] 
CL_UNORM_INT_101010
(10bit unsigned integer) →half
\[normalized\_half\_value(x)=round\_to\_half(\frac{x}{1023})\] 
CL_UNORM_INT16
(16bit unsigned integer) →half
\[normalized\_half\_value(x)=round\_to\_half(\frac{x}{65535})\] 
CL_SNORM_INT8
(8bit signed integer) →half
\[normalized\_half\_value(x)=max(1.0h, round\_to\_half(\frac{x}{127}))\] 
CL_SNORM_INT16
(16bit signed integer) →half
\[normalized\_half\_value(x)=max(1.0h, round\_to\_half(\frac{x}{32767}))\]
The precision of the above conversions is <= 1.5 ulp except for the following cases:
For CL_UNORM_INT8
:

0 must convert to 0.0h, and

255 must convert to 1.0h
For CL_UNORM_INT_101010
:

0 must convert to 0.0h, and

1023 must convert to 1.0h
For CL_UNORM_INT16
:

0 must convert to 0.0h, and

65535 must convert to 1.0h
For CL_SNORM_INT8
:

128 and 127 must convert to 1.0h,

0 must convert to 0.0h, and

127 must convert to 1.0h
For CL_SNORM_INT16
:

32768 and 32767 must convert to 1.0h,

0 must convert to 0.0h, and

32767 must convert to 1.0h
7.4.1.2. Converting Half Precision Floatingpoint Values to Normalized Integer Channel Data Types
For images created with image channel data type of CL_UNORM_INT8
and CL_UNORM_INT16
, image write instructions will convert the half precision floatingpoint color value to an 8bit or 16bit unsigned integer.
For images created with image channel data type of CL_SNORM_INT8
and CL_SNORM_INT16
, image write instructions will convert the half precision floatingpoint color value to an 8bit or 16bit signed integer.
OpenCL implementations may choose to approximate the rounding mode used in the conversions described below. When approximate rounding is used instead of the preferred rounding, the result of the conversion must satisfy the bound given below.
The conversions from half precision floatingpoint values to normalized integer values are performed is as follows:

half
→CL_UNORM_INT8
(8bit unsigned integer)\[\begin{aligned} & f(x)=max(0,min(255,255 \times x))\\ \\ & f_{preferred}(x) = \begin{cases} round\_to\_nearest\_even\_uint8(f(x)) & x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & x = \infty \text{ or } x = NaN \end{cases}\\ & f_{approx}(x) = \begin{cases} round\_to\_impl\_uint8(f(x)) & x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & x = \infty \text{ or } x = NaN \end{cases}\\ \\ & f(x)  f_{approx}(x)\leq 0.6, x \neq \infty \text{ and } x \neq NaN \end{aligned}\] 
half
→CL_UNORM_INT16
(16bit unsigned integer)\[\begin{aligned} & f(x)=max(0,min(65535,65535 \times x))\\ \\ & f_{preferred}(x) = \begin{cases} round\_to\_nearest\_even\_uint16(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ & f_{approx}(x) = \begin{cases} round\_to\_impl\_uint16(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ \\ & f(x)  f_{approx}(x)\leq 0.6, x \neq \infty \text{ and } x \neq NaN \end{aligned}\] 
half
→CL_SNORM_INT8
(8bit signed integer)\[\begin{aligned} & f(x)=max(128,min(127,127 \times x))\\ \\ & f_{preferred}(x) = \begin{cases} round\_to\_nearest\_even\_int8(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ & f_{approx}(x) = \begin{cases} round\_to\_impl\_int8(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ \\ & f(x)  f_{approx}(x)\leq 0.6, x \neq \infty \text{ and } x \neq NaN \end{aligned}\] 
half
→CL_SNORM_INT16
(16bit signed integer)\[\begin{aligned} & f(x)=max(32768,min(32767,32767 \times x))\\ \\ & f_{preferred}(x) = \begin{cases} round\_to\_nearest\_even\_int16(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ & f_{approx}(x) = \begin{cases} round\_to\_impl\_int16(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ \\ & f(x)  f_{approx}(x)\leq 0.6, x \neq \infty \text{ and } x \neq NaN \end{aligned}\]
7.4.1.3. Converting Normalized Integer Channel Data Types to Floatingpoint Values
For images created with image channel data type of CL_UNORM_INT8
and CL_UNORM_INT16
, image read instructions will convert the channel values from an 8bit or 16bit unsigned integer to normalized floatingpoint values in the range [0.0f … 1.0f].
For images created with image channel data type of CL_SNORM_INT8
and CL_SNORM_INT16
, image read instructions will convert the channel values from an 8bit or 16bit signed integer to normalized floatingpoint values in the range [1.0f … 1.0f].
These conversions are performed as follows:

CL_UNORM_INT8
(8bit unsigned integer) →float
\[normalized\_float\_value(x)=round\_to\_float(\frac{x}{255})\] 
CL_UNORM_INT_101010
(10bit unsigned integer) →float
\[normalized\_float\_value(x)=round\_to\_float(\frac{x}{1023})\] 
CL_UNORM_INT16
(16bit unsigned integer) →float
\[normalized\_float\_value(x)=round\_to\_float(\frac{x}{65535})\] 
CL_SNORM_INT8
(8bit signed integer) →float
\[normalized\_float\_value(x)=max(1.0f, round\_to\_float(\frac{x}{127}))\] 
CL_SNORM_INT16
(16bit signed integer) →float
\[normalized\_float\_value(x)=max(1.0f, round\_to\_float(\frac{x}{32767}))\]
The precision of the above conversions is <= 1.5 ulp except for the following cases.
For CL_UNORM_INT8
:

0 must convert to 0.0f, and

255 must convert to 1.0f
For CL_UNORM_INT_101010
:

0 must convert to 0.0f, and

1023 must convert to 1.0f
For CL_UNORM_INT16
:

0 must convert to 0.0f, and

65535 must convert to 1.0f
For CL_SNORM_INT8
:

128 and 127 must convert to 1.0f,

0 must convert to 0.0f, and

127 must convert to 1.0f
For CL_SNORM_INT16
:

32768 and 32767 must convert to 1.0f,

0 must convert to 0.0f, and

32767 must convert to 1.0f
7.4.1.4. Converting Floatingpoint Values to Normalized Integer Channel Data Types
For images created with image channel data type of CL_UNORM_INT8
and CL_UNORM_INT16
, image write instructions will convert the floatingpoint color value to an 8bit or 16bit unsigned integer.
For images created with image channel data type of CL_SNORM_INT8
and CL_SNORM_INT16
, image write instructions will convert the floatingpoint color value to an 8bit or 16bit signed integer.
OpenCL implementations may choose to approximate the rounding mode used in the conversions described below. When approximate rounding is used instead of the preferred rounding, the result of the conversion must satisfy the bound given below.
The conversions from half precision floatingpoint values to normalized integer values are performed is as follows:

float
→CL_UNORM_INT8
(8bit unsigned integer)\[\begin{aligned} & f(x)=max(0,min(255,255 \times x))\\ \\ & f_{preferred}(x) = \begin{cases} round\_to\_nearest\_even\_uint8(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ & f_{approx}(x) = \begin{cases} round\_to\_impl\_uint8(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ \\ & f(x)  f_{approx}(x)\leq 0.6, x \neq \infty \text{ and } x \neq NaN \end{aligned}\] 
float
→CL_UNORM_INT_101010
(10bit unsigned integer)\[\begin{aligned} & f(x)=max(0,min(1023,1023 \times x))\\ \\ & f_{preferred}(x) = \begin{cases} round\_to\_nearest\_even\_uint10(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ & f_{approx}(x) = \begin{cases} round\_to\_impl\_uint10(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ \\ & f(x)  f_{approx}(x)\leq 0.6, x \neq \infty \text{ and } x \neq NaN \end{aligned}\] 
float
→CL_UNORM_INT16
(16bit unsigned integer)\[\begin{aligned} & f(x)=max(0,min(65535,65535 \times x))\\ \\ & f_{preferred}(x) = \begin{cases} round\_to\_nearest\_even\_uint16(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ & f_{approx}(x) = \begin{cases} round\_to\_impl\_uint16(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ \\ & f(x)  f_{approx}(x)\leq 0.6, x \neq \infty \text{ and } x \neq NaN \end{aligned}\] 
float
→CL_SNORM_INT8
(8bit signed integer)\[\begin{aligned} & f(x)=max(128,min(127,127 \times x))\\ \\ & f_{preferred}(x) = \begin{cases} round\_to\_nearest\_even\_uint8(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ & f_{approx}(x) = \begin{cases} round\_to\_impl\_uint8(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ \\ & f(x)  f_{approx}(x)\leq 0.6, x \neq \infty \text{ and } x \neq NaN \end{aligned}\] 
float
→CL_SNORM_INT16
(16bit signed integer)\[\begin{aligned} & f(x)=max(32768,min(32767,32767 \times x))\\ \\ & f_{preferred}(x) = \begin{cases} round\_to\_nearest\_even\_uint16(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ & f_{approx}(x) = \begin{cases} round\_to\_impl\_uint16(f(x)) & \quad x \neq \infty \text{ and } x \neq NaN\\ \text{implementationdefined} & \quad x = \infty \text{ or } x = NaN \end{cases}\\ \\ & f(x)  f_{approx}(x)\leq 0.6, x \neq \infty \text{ and } x \neq NaN \end{aligned}\]
7.4.2. Conversion Rules for Half Precision Floatingpoint Channel Data Type
For images created with a channel data type of CL_HALF_FLOAT
, the conversions of half to float and half to half are lossless.
Conversions from float to half round the mantissa using the round to nearest even or round to zero rounding mode.
Denormalized numbers for the half data type which may be generated when converting a float to a half may be flushed to zero.
A float NaN must be converted to an appropriate NaN in the half type.
A float INF must be converted to an appropriate INF in the half type.
7.4.3. Conversion Rules for Floatingpoint Channel Data Type
The following rules apply for reading and writing images created with channel data type of CL_FLOAT
.

NaNs may be converted to a NaN value(s) supported by the device.

Denorms can be flushed to zero.

All other values must be preserved.
7.4.4. Conversion Rules for Signed and Unsigned 8bit, 16bit and 32bit Integer Channel Data Types
For images created with image channel data type of CL_SIGNED_INT8
, CL_SIGNED_INT16
and CL_SIGNED_INT32
, image read instructions will return the unmodified integer values stored in the image at specified location.
Likewise, for images created with image channel data type of CL_UNSIGNED_INT8
, CL_UNSIGNED_INT16
and CL_UNSIGNED_INT32
, image read instructions will return the unmodified unsigned integer values stored in the image at specified location.
Image write instructions will perform one of the following conversions:

32 bit signed integer →
CL_SIGNED_INT8
(8bit signed integer):\[int8\_value(x) = clamp(x, 128, 127)\] 
32 bit signed integer →
CL_SIGNED_INT16
(16bit signed integer):\[int16\_value(x) = clamp(x, 32768, 32767)\] 
32 bit signed integer →
CL_SIGNED_INT32
(32bit signed integer):\[int32\_value(x) = x \quad \text{(no conversion)}\] 
32 bit unsigned integer →
CL_UNSIGNED_INT8
(8bit unsigned integer):\[uint8\_value(x) = clamp(x, 0, 255)\] 
32 bit unsigned integer →
CL_UNSIGNED_INT16
(16bit unsigned integer):\[uint16\_value(x) = clamp(x, 0, 65535)\] 
32 bit unsigned integer →
CL_UNSIGNED_INT32
(32bit unsigned integer):\[uint32\_value(x) = x \quad \text{(no conversion)}\]
The conversions described in this section must be correctly saturated.
7.4.5. Conversion Rules for sRGBA and sBGRA Images
Standard RGB data, which roughly displays colors in a linear ramp of luminosity levels such that an average observer, under average viewing conditions, can view them as perceptually equal steps on an average display. All 0s maps to 0.0f, and all 1s maps to 1.0f. The sequence of unsigned integer encodings between all 0s and all 1s represent a nonlinear progression in the floatingpoint interpretation of the numbers between 0.0f to 1.0f. For more detail, see the SRGB color standard.
Conversion from sRGB space is automatically done the image read instruction if the image channel order is one of the sRGB values described above. When reading from an sRGB image, the conversion from sRGB to linear RGB is performed before filtering is applied. If the format has an alpha channel, the alpha data is stored in linear color space. Conversion to sRGB space is automatically done by the image write instruction if the image channel order is one of the sRGB values described above and the device supports writing to sRGB images.
If the format has an alpha channel, the alpha data is stored in linear color space.

The following process is used by image read instructions to convert a normalized 8bit unsigned integer sRGB color value x to a floatingpoint linear RGB color value y:

Convert a normalized 8bit unsigned integer sRGB value x to a floatingpoint sRGB value r as per rules described in Converting Normalized Integer Channel Data Types to Floatingpoint Values section.
\[r=normalized\_float\_value(x)\] 
Convert a floatingpoint sRGB value r to a floatingpoint linear RGB color value y:
\[\begin{aligned} & c_{linear}(x) = \begin{cases} \frac{r}{12.92} & \quad r \geq 0 \text{ and } r \leq 0.04045\\ (\frac{r + 0.055}{1.055})^{2.4} & \quad r > 0.04045 \text{ and } \leq 1 \end{cases}\\ \\ & y = c_{linear}(r) \end{aligned}\]


The following process is used by image write instructions to convert a linear RGB floatingpoint color value y to a normalized 8bit unsigned integer sRGB value x:

Convert a floatingpoint linear RGB value y to a normalized floating point sRGB value r:
\[\begin{aligned} & c_{linear}(x) = \begin{cases} 0 & \quad y \geq NaN \text{ or } y < 0\\ 12.92 \times y & \quad y \geq 0 \text{ and } y < 0.0031308\\ 1.055 \times y^{(\frac{1}{2.4})} & \quad y \geq 0.0031308 \text{ and } y \leq 1\\ 1 & \quad y > 1 \end{cases}\\ \\ & r = c_{sRGB}(y) \end{aligned}\] 
Convert a normalized floatingpoint sRGB value r to a normalized 8bit unsigned integer sRGB value x as per rules described in Converting Floatingpoint Values to Normalized Integer Channel Data Types section.
\[\begin{aligned} & g(r) = \begin{cases} f_{preferred}(r) & \quad \text{if rounding mode is round to even}\\ f_{approx}(r) & \quad \text{if implementationdefined rounding mode} \end{cases}\\ \\ & x = g(r) \end{aligned}\]

The accuracy required when converting a normalized 8bit unsigned integer sRGB color value x to a floatingpoint linear RGB color value y is given by:
The accuracy required when converting a linear RGB floatingpoint color value y to a normalized 8bit unsigned integer sRGB value x is given by:
7.5. Selecting an Image from an Image Array
Let (u,v,w)
represent the unnormalized image coordinate values for reading from and/or writing to a 2D image in a 2D image array.
When read using a sampler, the 2D image layer selected is computed as:
otherwise the layer selected is computed as:
(since w is already an integer) and the result is undefined if w is not one of the integers 0, 1, … d_{t}
 1.
Let (u,v)
represent the unnormalized image coordinate values for reading from and/or writing to a 1D image in a 1D image array.
When read using a sampler, the 1D image layer selected is computed as:
otherwise the layer selected is computed as:
(since v is already an integer) and the result is undefined if v is not one of the integers 0, 1, … h_{t}
 1.
7.6. Data Format for Reading and Writing Images
This section describes how image element data is returned by an image read instruction or passed as the Texel data that is written by an image write instruction:
For the following image channel orders, the data is a four component vector type:
Image Channel Order  Components 


(R, 0, 0, 1) 

(0, 0, 0, A) 

(R, G, 0, 1) 

(R, G, B, 1) 

(R, G, B, A) 

(I, I, I, I) 

(L, L, L, 1) 
For the following image channel orders, the data is a scalar type:
Image Channel Order  Scalar Value 


D 

D 
The following table describes the mapping from image channel data type to the data vector component type or scalar type:
Image Channel Order  Data Type 


OpTypeFloat, with Width equal to 16 or 32. 

OpTypeInt, with Width equal to 32. 
7.7. Sampled and Samplerless Reads
SPIRV instructions that read from an image without a sampler (such as OpImageRead) behave exactly the same as the corresponding image read instruction with a sampler that has Sampler Filter Mode set to Nearest, NonNormalized coordinates, and Sampler Addressing Mode set to None.
There is one exception for cases where the image being read has Image Format equal to a floatingpoint type (such as R32f). In this exceptional case, when channel data values are denormalized, the nonsampler image read instruction may return the denormalized data, while the sampler image read instruction may flush denormalized channel data values to zero. The coordinates must be between 0 and image size in that dimension, non inclusive.
8. Normative References

IEEE Standard for FloatingPoint Arithmetic, IEEE Std 7542008, http://dx.doi.org/10.1109/IEEESTD.2008.4610935 , August, 2008.

“ISO/IEC 9899:1999  Programming Languages  C”, with technical corrigenda TC1 and TC2, https://www.iso.org/standard/29237.html .

“ISO/IEC 14882:2014  Information technology  Programming languages  C++”, https://www.iso.org/standard/64029.html .

“The OpenCL Specification, Version 3.0, Unified”, https://www.khronos.org/registry/OpenCL/ .

“The OpenCL C Specification, Version 3.0”, https://www.khronos.org/registry/OpenCL/ .

“The OpenCL C++ 1.0 Specification”, https://www.khronos.org/registry/OpenCL/ .

“The OpenCL Extension Specification, Version 3.0, Unified”, https://www.khronos.org/registry/OpenCL/ .

“SPIRV Specification, Version 1.5, Unified”, https://www.khronos.org/registry/spirv/ .

“OpenCL Extended Instruction Set Specification”, https://www.khronos.org/registry/spirv/ .

JeanMichel Muller. On the definition of ulp(x). RR5504, INRIA. 2005, pp.16. <inria00070503> Currently hosted at https://hal.inria.fr/inria00070503/document.

“IEC 6196621:1999 Multimedia systems and equipment  Colour measurement and management  Part 21: Colour management  Default RGB colour space  sRGB”, https://webstore.iec.ch/publication/6169 .
Appendix A: Changes to OpenCL
Changes to the OpenCL SPIRV Environment specifications between successive versions are summarized below.
Summary of changes from OpenCL 3.0
The first nonprovisional version of the OpenCL 3.0 specifications was v3.0.5.
Changes from v3.0.5:

Clarified subgroup barrier behavior in nonuniform control flow.

Added required alignment of types.

Added new extensions:

cl_khr_subgroup_extended_types

cl_khr_subgroup_non_uniform_vote

cl_khr_subgroup_ballot

cl_khr_subgroup_non_uniform_arithmetic

cl_khr_subgroup_shuffle

cl_khr_subgroup_shuffle_relative

cl_khr_subgroup_clustered_reduce

Changes from v3.0.6:

Explicitly say that OpTypeSampledImage may be used in an OpenCL environment.

Added the required type for SPIRV builtin variables.

Fixed several bugs and formatting in the fast math ULP tables.

Added new extensions:

cl_khr_extended_bit_ops

cl_khr_spirv_extended_debug_info

cl_khr_spirv_linkonce_odr

Changes from v3.0.8:

Clarified that some OpenCL
khr
extensions also require SPIRV extensions.