Copyright 2014-2023 The Khronos Group Inc.
This Specification is protected by copyright laws and contains material proprietary to Khronos. Except as described by these terms, it or any components may not be reproduced, republished, distributed, transmitted, displayed, broadcast or otherwise exploited in any manner without the express prior written permission of Khronos.
This Specification has been created under the Khronos Intellectual Property Rights Policy, which is Attachment A of the Khronos Group Membership Agreement available at www.khronos.org/files/member_agreement.pdf.
Khronos grants a conditional copyright license to use and reproduce the unmodified Specification for any purpose, without fee or royalty, EXCEPT no licenses to any patent, trademark or other intellectual property rights are granted under these terms. Parties desiring to implement the Specification and make use of Khronos trademarks in relation to that implementation, and receive reciprocal patent license protection under the Khronos Intellectual Property Rights Policy must become Adopters and confirm the implementation as conformant under the process defined by Khronos for this Specification; see https://www.khronos.org/adopters.
Khronos makes no, and expressly disclaims any, representations or warranties, express or implied, regarding this Specification, including, without limitation: merchantability, fitness for a particular purpose, non-infringement of any intellectual property, correctness, accuracy, completeness, timeliness, and reliability. Under no circumstances will Khronos, or any of its Promoters, Contributors or Members, or their respective partners, officers, directors, employees, agents or representatives be liable for any damages, whether direct, indirect, special or consequential damages for lost revenues, lost profits, or otherwise, arising from or in connection with these materials.
This Specification contains substantially unmodified functionality from, and is a successor to, Khronos specifications including all versions of "The SPIR Specification", "The OpenGL Shading Language", "The OpenGL ES Shading Language", as well as all Khronos OpenCL API and OpenCL programming language specifications.
The Khronos Intellectual Property Rights Policy defines the terms Scope, Compliant Portion, and Necessary Patent Claims.
Where this Specification uses technical terminology, defined in the Glossary or otherwise, that refer to enabling technologies that are not expressly set forth in this Specification, those enabling technologies are EXCLUDED from the Scope of this Specification. For clarity, enabling technologies not disclosed with particularity in this Specification (e.g. semiconductor manufacturing technology, hardware architecture, processor architecture or microarchitecture, memory architecture, compiler technology, object oriented technology, basic operating system technology, compression technology, algorithms, and so on) are NOT to be considered expressly set forth; only those application program interfaces and data structures disclosed with particularity are included in the Scope of this Specification.
For purposes of the Khronos Intellectual Property Rights Policy as it relates to the definition of Necessary Patent Claims, all recommended or optional features, behaviors and functionality set forth in this Specification, if implemented, are considered to be included as Compliant Portions.
Khronos® and Vulkan® are registered trademarks, and ANARI™, WebGL™, glTF™, NNEF™, OpenVX™, SPIR™, SPIR-V™, SYCL™, OpenVG™, Vulkan SC™, 3D Commerce™ and Kamaros™ are trademarks of The Khronos Group Inc. OpenXR™ is a trademark owned by The Khronos Group Inc. and is registered as a trademark in China, the European Union, Japan and the United Kingdom. OpenCL™ is a trademark of Apple Inc. used under license by Khronos. OpenGL® is a registered trademark and the OpenGL ES™ and OpenGL SC™ logos are trademarks of Hewlett Packard Enterprise used under license by Khronos. ASTC is a trademark of ARM Holdings PLC. All other product names, trademarks, and/or company names are used solely for identification and belong to their respective owners.
Contributors and Acknowledgments
-
Yaxun Liu, AMD
-
Brian Sumner, AMD
-
Marty Johnson, AMD
-
Mandana Baregheh, AMD
-
Andrew Richards, Codeplay
-
Ben Ashbaugh, Intel
-
Alexey Bader, Intel
-
Guy Benyei, Intel
-
Raun Krisch, Intel
-
Boaz Ouriel, Intel
-
Yuan Lin, NVIDIA
-
Lee Howes, Qualcomm
-
Chihong Zang, Qualcomm
-
Ben Gaster, Qualcomm
-
Jack Liu, Qualcomm
-
Ronan Keryell, Xilinx
1. Introduction
This is the specification of OpenCL.std extended instruction set.
The library is imported into a SPIR-V module in the following manner:
<ext-inst-id> OpExtInstImport "OpenCL.std"
The library can only be imported if Memory Model is set to OpenCL
2. Binary Form
This section contains the semantics and exact form of execution of OpenCL extended instructions using the OpExtInst instruction.
In this section we use the following naming conventions:
-
void denote an OpTypeVoid.
-
half, float and double denote an OpTypeFloat with a width of 16, 32 and 64 bits respectively.
-
i8, i16, i32 and i64 denote an OpTypeInt with a width of 8, 16, 32 and 64 bits respectively.
-
bool denotes an OpTypeBool.
-
size_t denotes an i32 if the Addressing Model is Physical32 and i64 if the Addressing Model is Physical64.
-
vector(n) denotes an OpTypeVector where n indicates the component count.
-
vector(n1, n2, …, ni) abbreviates vector(n1), vector(n2), … or vector(ni).
-
-
integer denotes i8, i16, i32 or i64.
-
floating-point denotes half, float, double.
-
pointer(storage) denotes an OpTypePointer which points to storage Storage Class.
-
pointer(constant) denotes an OpTypePointer with UniformConstant Storage Class.
-
pointer(generic) denotes an OpTypePointer with Generic Storage Class.
-
pointer(global) denotes an OpTypePointer with CrossWorkgroup Storage Class.
-
pointer(local) denotes an OpTypePointer with Workgroup Storage Class.
-
pointer(private) denotes an OpTypePointer with Function Storage Class.
-
pointer(s1, s2, …, si) abbreviates pointer(s1), pointer(s2), … or pointer(si).
-
-
image defines all types of image memory objects (See image encoding section).
-
sampler a SPIR-V sampler object (See sampler encoding section).
2.1. Math extended instructions
This section describes the list of external math instructions. The external math instructions are categorized into the following:
-
A list of instructions that have scalar or vector argument versions, and,
-
A list of instructions that only take scalar float arguments.
The vector versions of the math instructions operate component-wise. The description is per-component.
The math instructions are not affected by the prevailing rounding mode in the calling environment, and always return the same value as they would if called with the round to nearest even rounding mode.
For environments that allow use of FPFastMathMode decorations on OpExtInst instructions, FPFastMathMode decorations may be applied to the math instructions.
Result Type and x must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the Result Type operand, must be of the same type. |
||||||
6 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
22 |
<id> |
Result Type and x must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the Result Type operand, must be of the same type. |
||||||
6 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
40 |
<id> |
Result Type, x and y must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the Result Type operand, must be of the same type. |
|||||||
7 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
48 |
<id> |
<id> |
Result Type, x and y must be float or vector(2,3,4,8,16) of float values. All of the operands, including the Result Type operand, must be of the same type. |
|||||||
7 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
68 |
<id> |
<id> |
2.2. Integer instructions
This section describes the list of integer instructions that take scalar or vector arguments. The vector versions of the integer instructions operate component-wise. The description is per-component.
2.3. Common instructions
This section describes the list of common instructions that take scalar or vector arguments. The vector versions of the integer instructions operate component-wise. The description is per-component. The common instructions are implemented using the round to nearest even rounding mode.
For environments that allow use of FPFastMathMode decorations on OpExtInst instructions, FPFastMathMode decorations may be applied to the common instructions.
2.4. Geometric instructions
This section describes the list of geometric instructions. In this section x,y,z and w denote the first, second, third and fourth component respectively, of vectors with 3 and four components. The geometric instructions are implemented using the round to nearest even rounding mode.
Note: The geometric instructions can be implemented using contractions such as mad or fma
For environments that allow use of FPFastMathMode decorations on OpExtInst instructions, FPFastMathMode decorations may be applied to the geometric instructions.
2.5. Relational instructions
This section describes the list of relational instructions that take scalar or vector arguments. The vector versions of the integer instructions operate component-wise. The description is per-component.
2.6. Vector Data Load and Store instructions
This section describes the list of instructions that allow reading and writing of vector types from a pointer to memory.
For environments that allow use of FPFastMathMode decorations on OpExtInst instructions, FPFastMathMode decorations may be applied to vector data load and store instructions that convert to or from half values.
vloadn Behavior is undefined if the computed address is not 8-bit aligned when p points to an i8 value; 16-bit aligned when p points to an i16 or half value; 32-bit aligned when p points to an i32 or float value; 64-bit aligned when p points to an i64 or double value.
offset must be size_t. p must be a pointer(global, local, private, constant, generic) to floating-point, integer. Result Type must be vector(2,3,4,8,16) of floating-point or integer values. Result Type component count must be equal to n and its component type must be equal to the type pointed by p. n must be 2, 3, 4, 8 or 16. |
||||||||
8 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
171 |
<id> |
<id> |
Literal |
vload_halfn Behavior is undefined if the computed address is not 16-bit aligned.
offset must be size_t. p must be a pointer(global, local, private, constant, generic) to half. Result Type must be vector(2,3,4,8,16) of float values. Result Type component count must be equal to n. n must be 2, 3, 4, 8 or 16. |
||||||||
8 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
174 |
<id> |
<id> |
Literal |
vstore_half_r Behavior is undefined if the computed address is not 16-bit aligned.
data must be float or double. offset must be size_t. Result Type must be void. p must be a pointer(global, local, private, generic) to half. |
|||||||||
9 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
176 |
<id> |
<id> |
<id> |
FP Rounding Mode |
vstore_halfn_r Let n be the component count of the vector data. The n components from the converted vector of half values are written to the address computed as (p + (offset * n)). Behavior is undefined if the computed address is not 16-bit aligned.
offset must be size_t. Result Type must be void. p must be a pointer(global, local, private, generic) to half. data must be vector(2,3,4,8,16) of float or double values. |
|||||||||
9 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
178 |
<id> |
<id> |
<id> |
FP Rounding Mode |
vloada_halfn For n equal to 2, 4, 8, and 16, the vector of n half values is read from the address computed as (p + (offset * n)). Behavior is undefined if the computed address is not aligned to (sizeof(half) * n) bytes. For n equal to 3, the vector of n half values are read from the address computed as (p + (offset * 4)). Behavior is undefined if the computed address is not aligned to (sizeof(half) * 4) bytes.
offset must be size_t. p must be a pointer(global, local, private, constant, generic) to half. Result Type must be vector(2,3,4,8,16) of float values. Result Type component count must be equal to n. n must be 2, 3, 4, 8 or 16. |
||||||||
8 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
179 |
<id> |
<id> |
Literal |
vstorea_halfn_r Let n be the component count of the vector data. For n equal to 2, 4, 8, and 16, the converted vector of half values is written to the address computed as (p + (offset * n)). Behavior is undefined if the computed address is not aligned to (sizeof(half) * n) bytes. For n equal to 3, the converted vector of half values is written to the address computed as (p + (offset * 4)). Behavior is undefined if the computed address is not aligned to (sizeof(half) * 4) bytes.
offset must be size_t. Result Type must be void. p must be a pointer(global, local, private, generic) to half. data must be vector(2,3,4,8,16) of float or double values. |
|||||||||
9 |
12 |
<id> |
Result <id> |
extended instructions set <id> |
181 |
<id> |
<id> |
<id> |
FP Rounding Mode |
2.7. Miscellaneous Vector instructions
This section describes additional vector instructions.
2.8. Misc instructions
This section describes additional miscellaneous instructions.
3. Appendix A: Changes and TBD
-
Fork the revision stream, changes section, TBD, etc. from the core specification, so this specification has its own, starting numbering at revision 1. This document now lives independently.
3.1. Changes from Version 0.99, Revision 1
-
Move to use the updated image/texturing/sampling, instead of extended instructions. Also, see changes in core specification related to this.
-
14241 Implement OpenCL Extended Instructions for images/samplers with core OpImageSample instructions
-
-
Fixed internal bugs
-
13455 Merged the OpenCL 1.2, 2.0, and 2.1 extended-instruction set into a single OpenCL extended-instruction set.
-
-
Fixed public bugs
3.2. Changes from Version 0.99, Revision 2
-
14679 moved precision information to the OpenCL environment spec
-
14636 clarified trig functions to accept and return radians
3.3. Changes from Version 0.99, Revision 3
-
Fixed internal bugs:
-
14862 removed remaining image instructions as core versions are sufficient
-
14636 Fixed type-o’s in several trig functions accepting radian inputs and/or producing radian results
-
Flattened opcode numbers
-
3.4. Changes from Version 1.0, Revision 1
-
Fixed internal bugs:
-
Issue 8 - order of parameters for prefetch was reversed; pointer operand should be first.
-
Issue 103 - typo: singp should be signp
-
-
Fixed public bugs
-
1469 - incorrect specification of pow and pown
-
3.5. Changes from Version 1.0, Revision 2
-
Fixed internal bugs:
-
Issue 261 - clarified that s_mad24 and u_mad24 only support 32-bit integers
-
Issue 262 - added scalars to the types supported by length
-
Issue 266 - fixed shuffle and shuffle2 description
-
Issue 267 - fixed description of ldexp operands
-
3.6. Changes from Version 1.0, Revision 3
-
Moved image and sampler encoding to the OpenCL environment specification
-
Editorial fixes and improvements
-
Fixed internal bugs:
-
Issue 271 - storage class inconsistency between vloadn/vstoren and vload_half/vstore_half
-
Issue 312 - bad wording for vstorea_halfn
-
3.7. Changes from Version 1.0, Revision 4
Support SPV_KHR_no_integer_wrap_decoration, in the s_abs instruction.
3.8. Changes from Version 1.0, Revision 5
-
Fixed internal bugs:
-
Issue 497 - fixed description for s_upsample
-
3.9. Changes from Version 1.0, Revision 6
-
Fixed internal bugs:
-
Issue 515 - permit use of FPFastMathMode decorations with math, common, geometric, and vector data load/store instructions for environments that allow it.
-
3.10. Changes from Version 1.0, Revision 7
-
Fixed internal bugs:
-
Corrected the description of u_upsample and s_upsample.
-