Name NV_gpu_shader5 Name Strings GL_NV_gpu_shader5 Contact Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com) Contributors Barthold Lichtenbelt, NVIDIA Chris Dodd, NVIDIA Eric Werness, NVIDIA Greg Roth, NVIDIA Jeff Bolz, NVIDIA Piers Daniell, NVIDIA Daniel Rakos, AMD Mathias Heyer, NVIDIA Status Shipping. Version Last Modified Date: 03/07/2017 NVIDIA Revision: 11 Number OpenGL Extension #389 OpenGL ES Extension #260 Dependencies This extension is written against the OpenGL 3.2 (Compatibility Profile) Specification. This extension is written against version 1.50 (revision 09) of the OpenGL Shading Language Specification. If implemented in OpenGL, OpenGL 3.2 and GLSL 1.50 are required. If implemented in OpenGL, ARB_gpu_shader5 is required. This extension interacts with ARB_gpu_shader5. This extension interacts with ARB_gpu_shader_fp64. This extension interacts with ARB_tessellation_shader. This extension interacts with NV_shader_buffer_load. This extension interacts with EXT_direct_state_access. This extension interacts with EXT_vertex_attrib_64bit and NV_vertex_attrib_integer_64bit. This extension interacts with OpenGL ES 3.1 (dated October 29th 2014). This extension interacts with OpenGL ES Shading Language 3.1 (revision 3). If implemented in OpenGL ES, OpenGL ES 3.1 and GLSL ES 3.10 are required. If implemented in OpenGL ES, OES/EXT_gpu_shader5 and EXT_shader_implicit- _conversions are required. This extension interacts with OES/EXT_tessellation_shader This extension interacts with OES/EXT_geometry_shader Overview This extension provides a set of new features to the OpenGL Shading Language and related APIs to support capabilities of new GPUs. Shaders using the new functionality provided by this extension should enable this functionality via the construct #extension GL_NV_gpu_shader5 : require (or enable) This extension was developed concurrently with the ARB_gpu_shader5 extension, and provides a superset of the features provided there. The features common to both extensions are documented in the ARB_gpu_shader5 specification; this document describes only the addition language features not available via ARB_gpu_shader5. A shader that enables this extension via an #extension directive also implicitly enables the common capabilities provided by ARB_gpu_shader5. In addition to the capabilities of ARB_gpu_shader5, this extension provides a variety of new features for all shader types, including: * support for a full set of 8-, 16-, 32-, and 64-bit scalar and vector data types, including uniform API, uniform buffer object, and shader input and output support; * the ability to aggregate samplers into arrays, index these arrays with arbitrary expressions, and not require that non-constant indices be uniform across all shader invocations; * new built-in functions to pack and unpack 64-bit integer types into a two-component 32-bit integer vector; * new built-in functions to pack and unpack 32-bit unsigned integer types into a two-component 16-bit floating-point vector; * new built-in functions to convert double-precision floating-point values to or from their 64-bit integer bit encodings; * new built-in functions to compute the composite of a set of boolean conditions a group of shader threads; * vector relational functions supporting comparisons of vectors of 8-, 16-, and 64-bit integer types or 16-bit floating-point types; and * extending texel offset support to allow loading texel offsets from regular integer operands computed at run-time, except for lookups with gradients (textureGrad*). This extension also provides additional support for processing patch primitives (introduced by ARB_tessellation_shader). ARB_tessellation_shader requires the use of a tessellation evaluation shader when processing patches, which means that patches will never survive past the tessellation pipeline stage. This extension lifts that restriction, and allows patches to proceed further in the pipeline and be used * as input to a geometry shader, using a new "patches" layout qualifier; * as input to transform feedback; * by fixed-function rasterization stages, in which case the patches are drawn as independent points. Additionally, it allows geometry shaders to read per-patch attributes written by a tessellation control shader using input variables declared with "patch in". New Procedures and Functions void Uniform1i64NV(int location, int64EXT x); void Uniform2i64NV(int location, int64EXT x, int64EXT y); void Uniform3i64NV(int location, int64EXT x, int64EXT y, int64EXT z); void Uniform4i64NV(int location, int64EXT x, int64EXT y, int64EXT z, int64EXT w); void Uniform1i64vNV(int location, sizei count, const int64EXT *value); void Uniform2i64vNV(int location, sizei count, const int64EXT *value); void Uniform3i64vNV(int location, sizei count, const int64EXT *value); void Uniform4i64vNV(int location, sizei count, const int64EXT *value); void Uniform1ui64NV(int location, uint64EXT x); void Uniform2ui64NV(int location, uint64EXT x, uint64EXT y); void Uniform3ui64NV(int location, uint64EXT x, uint64EXT y, uint64EXT z); void Uniform4ui64NV(int location, uint64EXT x, uint64EXT y, uint64EXT z, uint64EXT w); void Uniform1ui64vNV(int location, sizei count, const uint64EXT *value); void Uniform2ui64vNV(int location, sizei count, const uint64EXT *value); void Uniform3ui64vNV(int location, sizei count, const uint64EXT *value); void Uniform4ui64vNV(int location, sizei count, const uint64EXT *value); void GetUniformi64vNV(uint program, int location, int64EXT *params); (The following function is also provided by NV_shader_buffer_load.) void GetUniformui64vNV(uint program, int location, uint64EXT *params); (All of the following ProgramUniform* functions are supported if and only if implemented in OpenGL ES or EXT_direct_state_access is supported.) void ProgramUniform1i64NV(uint program, int location, int64EXT x); void ProgramUniform2i64NV(uint program, int location, int64EXT x, int64EXT y); void ProgramUniform3i64NV(uint program, int location, int64EXT x, int64EXT y, int64EXT z); void ProgramUniform4i64NV(uint program, int location, int64EXT x, int64EXT y, int64EXT z, int64EXT w); void ProgramUniform1i64vNV(uint program, int location, sizei count, const int64EXT *value); void ProgramUniform2i64vNV(uint program, int location, sizei count, const int64EXT *value); void ProgramUniform3i64vNV(uint program, int location, sizei count, const int64EXT *value); void ProgramUniform4i64vNV(uint program, int location, sizei count, const int64EXT *value); void ProgramUniform1ui64NV(uint program, int location, uint64EXT x); void ProgramUniform2ui64NV(uint program, int location, uint64EXT x, uint64EXT y); void ProgramUniform3ui64NV(uint program, int location, uint64EXT x, uint64EXT y, uint64EXT z); void ProgramUniform4ui64NV(uint program, int location, uint64EXT x, uint64EXT y, uint64EXT z, uint64EXT w); void ProgramUniform1ui64vNV(uint program, int location, sizei count, const uint64EXT *value); void ProgramUniform2ui64vNV(uint program, int location, sizei count, const uint64EXT *value); void ProgramUniform3ui64vNV(uint program, int location, sizei count, const uint64EXT *value); void ProgramUniform4ui64vNV(uint program, int location, sizei count, const uint64EXT *value); New Tokens Returned by the parameter of GetActiveAttrib, GetActiveUniform, and GetTransformFeedbackVarying: INT64_NV 0x140E UNSIGNED_INT64_NV 0x140F INT8_NV 0x8FE0 INT8_VEC2_NV 0x8FE1 INT8_VEC3_NV 0x8FE2 INT8_VEC4_NV 0x8FE3 INT16_NV 0x8FE4 INT16_VEC2_NV 0x8FE5 INT16_VEC3_NV 0x8FE6 INT16_VEC4_NV 0x8FE7 INT64_VEC2_NV 0x8FE9 INT64_VEC3_NV 0x8FEA INT64_VEC4_NV 0x8FEB UNSIGNED_INT8_NV 0x8FEC UNSIGNED_INT8_VEC2_NV 0x8FED UNSIGNED_INT8_VEC3_NV 0x8FEE UNSIGNED_INT8_VEC4_NV 0x8FEF UNSIGNED_INT16_NV 0x8FF0 UNSIGNED_INT16_VEC2_NV 0x8FF1 UNSIGNED_INT16_VEC3_NV 0x8FF2 UNSIGNED_INT16_VEC4_NV 0x8FF3 UNSIGNED_INT64_VEC2_NV 0x8FF5 UNSIGNED_INT64_VEC3_NV 0x8FF6 UNSIGNED_INT64_VEC4_NV 0x8FF7 FLOAT16_NV 0x8FF8 FLOAT16_VEC2_NV 0x8FF9 FLOAT16_VEC3_NV 0x8FFA FLOAT16_VEC4_NV 0x8FFB (If ARB_tessellation_shader is supported, the following enum is accepted by a new primitive.) Accepted by the parameter of BeginTransformFeedback: PATCHES Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification (OpenGL Operation) Modify Section 2.6.1, Begin and End, p. 22 (Extend language describing PATCHES introduced by ARB_tessellation_shader. It particular, add the following to the end of the description of the primitive type.) If a patch primitive is drawn, each patch is drawn separately as a collection of points, which each patch vertex definining a separate point. Extra vertices from an incomplete patch are never drawn. Modify Section 2.14.3, Vertex Attributes, p. 86 (modify the second paragraph, p. 87) ... exceeds MAX_VERTEX_ATTRIBS. For the purposes of this comparison, attribute variables of the type i64vec3, u64vec3, i64vec4, and u64vec4 count as consuming twice as many attributes as equivalent single-precision types. (extend the list of types in the first paragraph, p. 88) ... UNSIGNED_INT_VEC3, UNSIGNED_INT_VEC4, INT8_NV, INT8_VEC2_NV, INT8_VEC3_NV, INT8_VEC4_NV, INT16_NV, INT16_VEC2_NV, INT16_VEC3_NV, INT16_VEC4_NV, INT64_NV, INT64_VEC2_NV, INT64_VEC3_NV, INT64_VEC4_NV, UNSIGNED_INT8_NV, UNSIGNED_INT8_VEC2_NV, UNSIGNED_INT8_VEC3_NV, UNSIGNED_INT8_VEC4_NV, UNSIGNED_INT16_NV, UNSIGNED_INT16_VEC2_NV, UNSIGNED_INT16_VEC3_NV, UNSIGNED_INT16_VEC4_NV, UNSIGNED_INT64_NV, UNSIGNED_INT64_VEC2_NV, UNSIGNED_INT64_VEC3_NV, UNSIGNED_INT64_VEC4_NV, FLOAT16_NV, FLOAT16_VEC2_NV, FLOAT16_VEC3_NV, or FLOAT16_VEC4_NV. Modify Section 2.14.4, Uniform Variables, p. 89 (modify third paragraph, p. 90) ... uniform variable storage for a vertex shader. A scalar or vector uniform with with 64-bit integer components will consume no more than 2 components, where is 1 for scalars, and the component count for vectors. A link error is generated ... (add to Table 2.13, p. 96) Type Name Token Keyword -------------------- ---------------- INT8_NV int8_t INT8_VEC2_NV i8vec2 INT8_VEC3_NV i8vec3 INT8_VEC4_NV i8vec4 INT16_NV int16_t INT16_VEC2_NV i16vec2 INT16_VEC3_NV i16vec3 INT16_VEC4_NV i16vec4 INT64_NV int64_t INT64_VEC2_NV i64vec2 INT64_VEC3_NV i64vec3 INT64_VEC4_NV i64vec4 UNSIGNED_INT8_NV uint8_t UNSIGNED_INT8_VEC2_NV u8vec2 UNSIGNED_INT8_VEC3_NV u8vec3 UNSIGNED_INT8_VEC4_NV u8vec4 UNSIGNED_INT16_NV uint16_t UNSIGNED_INT16_VEC2_NV u16vec2 UNSIGNED_INT16_VEC3_NV u16vec3 UNSIGNED_INT16_VEC4_NV u16vec4 UNSIGNED_INT64_NV uint64_t UNSIGNED_INT64_VEC2_NV u64vec2 UNSIGNED_INT64_VEC3_NV u64vec3 UNSIGNED_INT64_VEC4_NV u64vec4 FLOAT16_NV float16_t FLOAT16_VEC2_NV f16vec2 FLOAT16_VEC3_NV f16vec3 FLOAT16_VEC4_NV f16vec4 (modify list of commands at the bottom of p. 99) void Uniform{1,2,3,4}{i64,ui64}NV(int location, T value); void Uniform{1,2,3,4}{i64,ui64}vNV(int location, T value); (insert after fourth paragraph, p. 100) The Uniform*i64{v}NV and Uniform*ui64{v}NV commands will load sets of one to four 64-bit signed or unsigned integer values into a uniform location defined as a 64-bit signed or unsigned integer scalar or vector types. (modify "Uniform Buffer Object Storage", p. 102, adding two bullets after the last "Members of type", and modifying the subsequent bullet) * Members of type int8_t, int16_t, and int64_t are extracted from a buffer object by reading a single byte, short, or int64-typed value at the specified offset. * Members of type uint8_t, uint16_t, and uint64_t are extracted from a buffer object by reading a single ubyte, ushort, or uint64-typed value at the specified offset. * Members of type float16_t are extracted from a buffer object by reading a single half-typed value at the specified offset. * Vectors with N elements with basic data types of bool, int, uint, float, double, int8_t, int16_t, int64_t, uint8_t, uint16_t, uint64_t, or float16_t are extracted as N values in consecutive memory locations beginning at the specified offset, with components stored in order with the first (X) component at the lowest offset. The GL data type used for component extraction is derived according to the rules for scalar members above. Modify Section 2.14.6, Varying Variables, p. 106 (modify third paragraph, p. 107) ... For the purposes of counting input and output components consumed by a shader, variables declared as vectors, matrices, and arrays will all consume multiple components. Each component of variables declared as 64-bit integer scalars or vectors, will be counted as consuming two components. (add after the bulleted list, p. 108) For the purposes of counting the total number of components to capture, each component of outputs declared as 64-bit integer scalars or vectors will be counted as consuming two components. Modify Section 2.15.1, Geometry Shader Input Primitives, p. 118 (add new qualifier at the end of the section, p. 120) Patches (patches) Geometry shaders that operate on patches are valid for the PATCHES primitive type. The number of vertices available to each program invocation is equal to the vertex count of the variable-size patch, with vertices presented to the geometry shader in the order specified in the patch. Modify Section 2.15.4, Geometry Shader Execution Environment, p. 121 (add to the end of "Geometry Shader Inputs", p. 123) Geometry shaders also support built-in and user-defined per-primitive inputs. The following built-in inputs, not replicated per-vertex and not contained in gl_in[], are supported: * The variable gl_PatchVerticesIn is filled with the number of the vertices in the input primitive. * The variables gl_TessLevelOuter[] and gl_TessLevelInner[] are arrays holding outer and inner tessellation levels of an input patch. If a tessellation control shader is active, the tessellation levels will be taken from the corresponding outputs of the tessellation control shader. Otherwise, the default levels provided as patch parameters are used. Tessellation level values loaded in these variables will be prior to the clamping and rounding operations performed by the primitive generator as described in Section 2.X.2 of ARB_tessellation_shader. For triangular tessellation, gl_TessLevelOuter[3] and gl_TessLevelInner[1] will be undefined. For isoline tessellation, gl_TessLevelOuter[2], gl_TessLevelOuter[3], and both values in gl_TessLevelInner[] are undefined. Additionally, a geometry shader with an input primitive type of "patches" may declare per-patch input variables using the qualifier "patch in". Unlike per-vertex inputs, per-patch inputs do not correspond to any specific vertex in the input primitive, and are not indexed by vertex number. Per-patch inputs declared as arrays have multiple values for the input patch; similarly declared per-vertex inputs would indicate a single value for each vertex in the output patch. User-defined per-patch input variables are filled with corresponding per-patch output values written by the tessellation control shader. If no tessellation control shader is active, all such variables are undefined. Per-patch input variables and the built-in inputs "gl_PatchVerticesIn", "gl_TessLevelOuter[]", and "gl_TessLevelInner[]" are supported only for geometry shaders with an input primitive type of "patches". A program will fail to link if any such variable is used in a geometry shader with a input primitive type other than "patches". Modify Section 2.19, Transform Feedback, p. 130 (add to Table 2.14, p. 131) Transform Feedback primitiveMode allowed render primitive modes ---------------------- --------------------------------- PATCHES PATCHES (modify first paragraph, p. 131) ... is one of TRIANGLES, LINES, POINTS, or PATCHES and specifies the type of primitives that will be recorded into the buffer objects bound for transform feedback (see below). ... (modify last paragraph, p. 131 and first paragraph, p. 132, adding patch support, and dealing with capture of 8- and 16-bit components) When an individual point, line, triangle, or patch primitive reaches the transform feedback stage ... When capturing line, triangle, and patch primitives, all attributes ... For multi-component varying variables or varying array elements, the individual components are written in order. For variables with 8- or 16-bit fixed- or floating-point components, individual components will be converted to and stored as equivalent values of type "int", "uint", or "float". The value for any attribute specified ... (modify next-to-last paragraph, p. 132) ... is not incremented. If transform feedback receives a primitive that fits in the remaining space after such an overflow occurs, that primitive may or may not be recorded. Primitives that fail to fit in the remaining space are never recorded. Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification (Rasterization) None. Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification (Per-Fragment Operations and the Frame Buffer) None. Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification (Special Functions) None. Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification (State and State Requests) Modify Section 6.1.15, Shader and Program Queries, p. 332 (add to the first list of commands, p. 337) void GetUniformi64vNV(uint program, int location, int64EXT *params); void GetUniformui64vNV(uint program, int location, uint64EXT *params); Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile) Specification (Invariance) None. Additions to the AGL/GLX/WGL Specifications None. Modifications to The OpenGL Shading Language Specification, Version 1.50 (Revision 09) Including the following line in a shader can be used to control the language features described in this extension: #extension GL_NV_gpu_shader5 : where is as specified in section 3.3. New preprocessor #defines are added to the OpenGL Shading Language: #define GL_NV_gpu_shader5 1 If the features of this extension are enabled by an #extension directive, shading language features documented in the ARB_gpu_shader5 extension will also be provided. Modify Section 3.6, Keywords, p. 15 (add the following to the list of reserved keywords) int8_t i8vec2 i8vec3 i8vec4 int16_t i16vec2 i16vec3 i16vec4 int32_t i32vec2 i32vec3 i32vec4 int64_t i64vec2 i64vec3 i64vec4 uint8_t u8vec2 u8vec3 u8vec4 uint16_t u16vec2 u16vec3 u16vec4 uint32_t u32vec2 u32vec3 u32vec4 uint64_t u64vec2 u64vec3 u64vec4 float16_t f16vec2 f16vec3 f16vec4 float32_t f32vec2 f32vec3 f32vec4 float64_t f64vec2 f64vec3 f64vec4 (note: the "float64_t" and "f64vec*" types are available if and only if ARB_gpu_shader_fp64 is also supported) Modify Section 4.1, Basic Types, p. 18 (add to the basic "Transparent Types" table, p. 18) Types Meaning -------- ---------------------------------------------------------- int8_t an 8-bit signed integer i8vec2 a two-component signed integer vector (8-bit components) i8vec3 a three-component signed integer vector (8-bit components) i8vec4 a four-component signed integer vector (8-bit components) int16_t a 16-bit signed integer i16vec2 a two-component signed integer vector (16-bit components) i16vec3 a three-component signed integer vector (16-bit components) i16vec4 a four-component signed integer vector (16-bit components) int32_t a 32-bit signed integer i32vec2 a two-component signed integer vector (32-bit components) i32vec3 a three-component signed integer vector (32-bit components) i32vec4 a four-component signed integer vector (32-bit components) int64_t a 64-bit signed integer i64vec2 a two-component signed integer vector (64-bit components) i64vec3 a three-component signed integer vector (64-bit components) i64vec4 a four-component signed integer vector (64-bit components) uint8_t a 8-bit unsigned integer u8vec2 a two-component unsigned integer vector (8-bit components) u8vec3 a three-component unsigned integer vector (8-bit components) u8vec4 a four-component unsigned integer vector (8-bit components) uint16_t a 16-bit unsigned integer u16vec2 a two-component unsigned integer vector (16-bit components) u16vec3 a three-component unsigned integer vector (16-bit components) u16vec4 a four-component unsigned integer vector (16-bit components) uint32_t a 32-bit unsigned integer u32vec2 a two-component unsigned integer vector (32-bit components) u32vec3 a three-component unsigned integer vector (32-bit components) u32vec4 a four-component unsigned integer vector (32-bit components) uint64_t a 64-bit unsigned integer u64vec2 a two-component unsigned integer vector (64-bit components) u64vec3 a three-component unsigned integer vector (64-bit components) u64vec4 a four-component unsigned integer vector (64-bit components) float16_t a single 16-bit floating-point value f16vec2 a two-component floating-point vector (16-bit components) f16vec3 a three-component floating-point vector (16-bit components) f16vec4 a four-component floating-point vector (16-bit components) float32_t a single 32-bit floating-point value f32vec2 a two-component floating-point vector (32-bit components) f32vec3 a three-component floating-point vector (32-bit components) f32vec4 a four-component floating-point vector (32-bit components) float64_t a single 64-bit floating-point value f64vec2 a two-component floating-point vector (64-bit components) f64vec3 a three-component floating-point vector (64-bit components) f64vec4 a four-component floating-point vector (64-bit components) Modify Section 4.1.3, Integers, p. 20 (add after the first paragraph of the section, p. 20) Variables with the types "int8_t", "int16_t", and "int64_t" represent signed integer values with exactly 8, 16, or 64 bits, respectively. Variables with the type "uint8_t", "uint16_t", and "uint64_t" represent unsigned integer values with exactly 8, 16, or 64 bits, respectively. Variables with the type "int32_t" and "uint32_t" represent signed and unsigned integer values with 32 bits, and are equivalent to "int" and "uint" types, respectively. (modify the grammar, p. 21, adding "L" and "UL" suffixes) integer-suffix: one of u U l L ul UL (modify next-to-last paragraph, p. 21) ... When the suffix "u" or "U" is present, the literal has type . When the suffix "l" or "L" is present, the literal has type . When the suffix "ul" or "UL" is present, the literal has type . Otherwise, the type is . ... Modify Section 4.1.4, Floats, p. 22 (insert after second paragraph, p. 22) Variables of type "float16_t" represent floating-point using exactly 16 bits and are stored using the 16-bit floating-point representation described in the OpenGL Specification. Variables of type "float32_t" and "float64_t" represent floating-point with 32 or 64 bits, and are equivalent to "float" and "double" types, respectively. Modify Section 4.1.7, Samplers, p. 23 (modify 1st paragraph of the section, deleting the restriction requiring constant indexing of sampler arrays) ... Samplers may aggregated into arrays within a shader (using square brackets [ ]) and can be indexed with general integer expressions. The results of accessing a sampler array with an out-of-bounds index are undefined. ... (remove the additional restriction added by ARB_gpu_shader5 making a similar edit requiring uniform indexing across shader invocations for defined results. NV_gpu_shader5 has no such limitation.) Modify Section 4.1.10, Implicit Conversions, p. 27 (modify table of implicit conversions) Can be implicitly Type of expression converted to -------------------- ----------------------------------------- int uint, int64_t, uint64_t, float, double(*) ivec2 uvec2, i64vec2, u64vec2, vec2, dvec2(*) ivec3 uvec3, i64vec3, u64vec3, vec3, dvec3(*) ivec4 uvec4, i64vec4, u64vec4, vec4, dvec4(*) int8_t int16_t int, int64_t, uint, uint64_t, float, double(*) i8vec2 i16vec2 ivec2, i64vec2, uvec2, u64vec2, vec2, dvec2(*) i8vec3 i16vec3 ivec3, i64vec3, uvec3, u64vec3, vec3, dvec3(*) i8vec4 i16vec4 ivec4, i64vec4, uvec4, u64vec4, vec4, dvec4(*) int64_t uint64_t, double(*) i64vec2 u64vec2, dvec2(*) i64vec3 u64vec3, dvec3(*) i64vec4 u64vec4, dvec4(*) uint uint64_t, float, double(*) uvec2 u64vec2, vec2, dvec2(*) uvec3 u64vec3, vec3, dvec3(*) uvec4 u64vec4, vec4, dvec4(*) uint8_t uint16_t uint, uint64_t, float, double(*) u8vec2 u16vec2 uvec2, u64vec2, vec2, dvec2(*) u8vec3 i16vec3 uvec3, u64vec3, vec3, dvec3(*) u8vec4 i16vec4 uvec4, u64vec4, vec4, dvec4(*) uint64_t double(*) u64vec2 dvec2(*) u64vec3 dvec3(*) u64vec4 dvec4(*) float double(*) vec2 dvec2(*) vec3 dvec3(*) vec4 dvec4(*) float16_t float, double(*) f16vec2 vec2, dvec2(*) f16vec3 vec3, dvec3(*) f16vec4 vec4, dvec4(*) (*) if ARB_gpu_shader_fp64 is supported (Note: Expressions of type "int32_t", "uint32_t", "float32_t", and "float64_t" are treated as identical to those of type "int", "uint", "float", and "double", respectively. Implicit conversions to and from these explicitly-sized types are allowed whenever conversions involving the equivalent base type are allowed.) (modify second paragraph of the section) No implicit conversions are provided to convert from unsigned to signed integer types, from floating-point to integer types, from higher-precision to lower-precision types, from 8-bit to 16-bit types, or between matrix types. There are no implicit array or structure conversions. (add before the final paragraph of the section, p. 27) (insert before the final paragraph of the section) When performing implicit conversion for binary operators, there may be multiple data types to which the two operands can be converted. For example, when adding an int8_t value to a uint16_t value, both values can be implicitly converted to uint, uint64_t, float, and double. In such cases, a floating-point type is chosen if either operand has a floating-point type. Otherwise, an unsigned integer type is chosen if either operand has an unsigned integer type. Otherwise, a signed integer type is chosen. If operands can be converted to both 32- and 64-bit versions of the chosen base data type, the 32-bit version is used. Modify Section 4.3.4, Inputs, p. 31 (modify third paragraph of section, p. 31, allowing explicitly-sized types) ... Vertex shader inputs variables can only be signed and unsigned integers, floats, doubles, explicitly-sized integers and floating-point values, vectors of any of these types, and matrices. ... (modify edits done in ARB_tessellation_shader adding support for "patch in", allowing for geometry shaders as well) Additionally, tessellation evaluation and geometry shaders support per-patch input variables declared with the "patch in" qualifier. Per-patch input ... (modify third paragraph, p. 32) ... Fragment inputs can only be signed and unsigned integers, floats, doubles, explicitly-sized integers and floating-point values, vectors of any of these types, matrices, or arrays or structures of these. Fragment inputs declared as signed or unsigned integers, doubles, 64-bit floating-point values, including vectors, matrices, or arrays derived from those types, must be qualified as "flat". Modify Section 4.3.6, Outputs, p. 33 (modify third paragraph of the section, p. 33) ... They can only be signed and unsigned integers, floats, doubles, explicitly-sized integers and floating-point values, vectors of any of these types, matrices, or arrays or structures of these. (modify last paragraph, p. 33) ... Fragment outputs can only be signed and unsigned integers, floats, explicitly-sized integers and floating-point values with 32 or fewer bits, vectors of any of these types, or arrays of these. Doubles, 64-bit integers or floating-point values, vectors or arrays of those types, matrices, and structures cannot be output. ... Modify Section 4.3.8.1, Input Layout Qualifiers, p. 37 (add to the list of qualifiers for geometry shaders, p. 37) layout-qualifier-id: ... triangles_adjacency patches (modify the "size of input arrays" table, p. 38) Layout Size of Input Arrays ------------ -------------------- patches gl_MaxPatchVertices (add paragraph below that table, p. 38) When using the input primitive type "patches", the geometry shader is used to process a set of patches with vertex counts that may vary from patch to patch. For the purposes of input array sizing, patches are treated as having a vertex count fixed at the implementation-dependent maximum patch size, gl_MaxPatchVertices. If a shader reads an input corresponding to a vertex not found in the patch being processed, the values read are undefined. Modify Section 5.4.1, Conversion and Scalar Constructors, p. 49 (add after first list of constructor examples) Similar constructors are provided to convert to and from explicitly-sized scalar data types, as well: float(uint8_t) // converts an 8-bit uint value to a float int64_t(double) // converts a double value to a 64-bit int float64_t(int16_t) // converts a 16-bit int value to a 64-bit float uint16_t(bool) // converts a Boolean value to a 16-bit uint (replace final two paragraphs, p. 49, and the first paragraph, p. 50, using more general language) When constructors are used to convert any floating-point type to any integer type, the fractional part of the floating-point value is dropped. It is undefined to convert a negative floating point value to an unsigned integer type. When a constructor is used to convert any integer or floating-point type to bool, 0 and 0.0 are converted to false, and non-zero values are converted to true. When a constructor is used to convert a bool to any integer or floating-point type, false is converted to 0 or 0.0, and true is converted to 1 or 1.0. Constructors converting between signed and unsigned integers with the same bit count always preserve the bit pattern of the input. This will change the value of the argument if its most significant bit is set, converting a negative signed integer to a large unsigned integer, or vice versa. Modify Section 5.9, Expressions, p. 57 (modify bulleted list as follows, adding support for expressions with 64-bit integer types) Expressions in the shading language are built from the following: * Constants of type bool, int, int64_t, uint, uint64_t, float, all vector types, and all matrix types. ... * The arithmetic binary operators add (+), subtract (-), multiply (*), and divide (/) operate on 32-bit integer, 64-bit integer, and floating-point scalars, vectors, and matrices. If the fundamental types of the operands do not match, the conversions from Section 4.1.10 "Implicit Conversions" are applied to produce matching types. ... * The operator modulus (%) operate on 32- and 64-bit integer scalars or vectors. If the fundamental types of the operands do not match, the conversions from Section 4.1.10 "Implicit Conversions" are applied to produce matching types. ... * The arithmetic unary operators negate (-), post- and pre-increment and decrement (-- and ++) operate on 32-bit integer, 64-bit integer, and floating-point values (including vectors and matrices). ... * The relational operators greater than (>), less than (<), and less than or equal (<=) operate only on scalar 32-bit integer, 64-bit integer, and floating-point expressions. The result is scalar Boolean. The fundamental type of the two operands must match, either as specified, or after one of the implicit type conversions specified in Section 4.1.10. ... * The equality operators equal (==), and not equal (!=) operate only on scalar 32-bit integer, 64-bit integer, and floating-point expressions. The result is scalar Boolean. The fundamental type of the two operands must match, either as specified, or after one of the implicit type conversions specified in Section 4.1.10. ... Modify Section 6.1, Function Definitions, p. 63 (ARB_gpu_shader5 adds a set of rules for defining whether implicit conversions for one matching function definition are better or worse than those for another. These comparisons are done argument by argument. Extend the edits made by ARB_gpu_shader5 to add several new rules for comparing implicit conversions for a single argument, corresponding to the new data types introduced by this extension.) To determine whether the conversion for a single argument in one match is better than that for another match, the following rules are applied, in order: 1. An exact match is better than a match involving any implicit conversion. 2. A match involving a conversion from a signed integer, unsigned integer, or floating-point type to a similar type having a larger number of bits is better a match not involving another conversion. The set of conversions qualifying under this rule are: source types destination types ----------------- ----------------- int8_t, int16_t int, int64_t int int64_t uint8_t, uint16_t uint, uint64_t uint uint64_t float16_t float float double 3. A match involving one conversion in rule 2 is better than a match involving another conversion in rule 2 if: (a) both conversions start with the same type and the first conversion is to a type with a smaller number of bits (e.g., converting from int16_t to int is preferred to converting int16_t to int64_t), or (b) both conversions end with the same type and the first conversion is from a type with a larger number of bits (e.g., converting an "out" parameter from int16_t to int is preferred to convering from int8_t to int). 4. A match involving an implicit conversion from any integer type to float is better than a match involving an implicit conversion from any integer type to double. Modify Section 7.1, Vertex and Geometry Shader Special Variables, p. 69 (NOTE: These edits are written against the re-organized section in the ARB_tessellation_shader specification.) (add to the list of built-ins inputs for geometry shaders) In the geometry language, built-in input and output variables are intrinsically declared as: in int gl_PatchVerticesIn; patch in float gl_TessLevelOuter[4]; patch in float gl_TessLevelInner[2]; ... The input variable gl_PatchVerticesIn behaves as in the identically-named tessellation control and evaluation shader inputs. The input variables gl_TessLevelOuter[] and gl_TessLevelInner[] behave as in the identically-named tessellation evaluation shader inputs. Modify Chapter 8, Built-in Functions, p. 81 (add to description of generic types, last paragraph of p. 69) ... Where the input arguments (and corresponding output) can be int64_t, i64vec2, i64vec3, or i64vec4, is used as the argument. Where the input arguments (and corresponding output) can be uint64_t, u64vec2, u64vec3, or u64vec4, is used as the argument. Modify Section 8.3, Common Functions, p. 84 (add support for 64-bit integer packing and unpacking functions) Syntax: int64_t packInt2x32(ivec2 v); uint64_t packUint2x32(uvec2 v); ivec2 unpackInt2x32(int64_t v); uvec2 unpackUint2x32(uint64_t v); The functions packInt2x32() and packUint2x32() return a signed or unsigned 64-bit integer obtained by packing the components of a two-component signed or unsigned integer vector, respectively. The first vector component specifies the 32 least significant bits; the second component specifies the 32 most significant bits. The functions unpackInt2x32() and unpackUint2x32() return a signed or unsigned integer vector built from a 64-bit signed or unsigned integer scalar, respectively. The first component of the vector contains the 32 least significant bits of the input; the second component consists the 32 most significant bits. (add support for 16-bit floating-point packing and unpacking functions) Syntax: uint packFloat2x16(f16vec2 v); f16vec2 unpackFloat2x16(uint v); The function packFloat2x16() returns an unsigned integer obtained by interpreting the components of a two-component 16-bit floating-point vector as integers according to OpenGL Specification, and then packing the two 16-bit integers into a 32-bit unsigned integer. The first vector component specifies the 16 least significant bits of the result; the second component specifies the 16 most significant bits. The function unpackFloat2x16() returns a two-component vector with 16-bit floating-point components obtained by unpacking a 32-bit unsigned integer into a pair of 16-bit values, and interpreting those values as 16-bit floating-point numbers according to the OpenGL Specification. The first component of the vector is obtained from the 16 least significant bits of the input; the second component is obtained from the 16 most significant bits. (add functions to get/set the bit encoding for floating-point values) 64-bit floating-point data types in the OpenGL shading language are specified to be encoded according to the IEEE specification for double-precision floating-point values. The functions below allow shaders to convert double-precision floating-point values to and from 64-bit signed or unsigned integers representing their encoding. To obtain signed or unsigned integer values holding the encoding of a floating-point value, use: genI64Type doubleBitsToInt64(genDType value); genU64Type doubleBitsToUint64(genDType value); Conversions are done on a component-by-component basis. To obtain a floating-point value corresponding to a signed or unsigned integer encoding, use: genDType int64BitsToDouble(genI64Type value); genDType uint64BitsToDouble(genU64Type value); (add functions to evaluate predicates over groups of threads) Syntax: bool anyThreadNV(bool value); bool allThreadsNV(bool value); bool allThreadsEqualNV(bool value); Implementations of the OpenGL Shading Language may, but are not required, to run multiple shader threads for a single stage as a SIMD thread group, where individual execution threads are assigned to thread groups in an undefined, implementation-dependent order. Algorithms may benefit from being able to evaluate a composite of boolean values over all active threads in the thread group. The function anyThreadNV() returns true if and only if is true for at least one active thread in the group. The function allThreadsNV() returns true if and only if is true for all active threads in the group. The function allThreadsEqualNV() returns true if is the same for all active threads in the group; the result of allThreadsEqualNV() will be true if and only if anyThreadNV() and allThreadsNV() would return the same value. Since these functions depends on the values of in an undefined group of threads, the value returned by these functions is largely undefined. However, anyThreadNV() is guaranteed to return true if is true, and allThreadsNV() is guaranteed to return false if is false. Since implementations are generally not required to combine threads into groups, simply returning for anyThreadNV() and allThreadsNV() and returning true for allThreadsEqualNV() is a legal implementation of these functions. Modify Section 8.6, Vector Relational Functions, p. 90 (modify the first paragraph, p. 90, adding support for relational functions operating on explicitly-sized types) Relational and equality operators (<, <=, >, >=, ==, !=) are defined (or reserved) to operate on scalars and produce scalar Boolean results. For vector results, use the following built-in functions. In the definitions below, the following terms are used as placeholders for all vector types for a given fundamental data type: placeholder fundamental types ----------- ------------------------------------------------ bvec bvec2, bvec3, bvec4 ivec ivec2, ivec3, ivec4, i8vec2, i8vec3, i8vec4, i16vec2, i16vec3, i16vec4, i64vec2, i64vec3, i64vec4 uvec uvec2, uvec3, uvec4, u8vec2, u8vec3, u8vec4, u16vec2, u16vec3, u16vec4, u64vec2, u64vec3, u64vec4 vec vec2, vec3, vec4, dvec2(*), dvec3(*), dvec4(*), f16vec2, f16vec3, f16vec4 (*) only if ARB_gpu_shader_fp64 is supported In all cases, the sizes of the input and return vectors for any particular call must match. Modify Section 8.7, Texture Lookup Functions, p. 91 (modify text for textureOffset() functions, p. 94, allowing non-constant offsets) Do a texture lookup as in texture but with offset added to the (u,v,w) texel coordinates before looking up each texel. The value need not be constant; however, a limited range of offset values are supported. If any component of is less than MIN_PROGRAM_TEXEL_OFFSET_EXT or greater than MAX_PROGRAM_TEXEL_OFFSET_EXT, the offset applied to the texture coordinates is undefined. Note that offset does not apply to the layer coordinate for texture arrays. This is explained in detail in section 3.9.9 of the OpenGL Specification (Version 3.2, Compatibility Profile), where offset is (delta_u, delta_v, delta_w). Note that texel offsets are also not supported for cube maps. (Note: This lifting of the constant offset restriction also applies to texelFetchOffset, p. 95, textureProjOffset, p. 95, textureLodOffset, p. 96, textureProjLodOffset, p. 96.) (modify the description of the textureGradOffset() functions, p. 97, preserving the restriction on constant offsets) Do a texture lookup with both explicit gradient and offset, as described in textureGrad and textureOffset. For these functions, the offset value must be a constant expression. A limited range of offset values are supported; the minimum and maximum offset values are implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET and MAX_PROGRAM_TEXEL_OFFSET, respectively. (modify the description of the textureProjGradOffset() functions, p. 98, preserving the restriction on constant offsets) Do a texture lookup projectively and with explicit gradient as described in textureProjGrad, as well as with offset, as described in textureOffset. For these functions, the offset value must be a constant expression. A limited range of offset values are supported; the minimum and maximum offset values are implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET and MAX_PROGRAM_TEXEL_OFFSET, respectively. (modify the description of the textureGatherOffsets() functions, added in ARB_gpu_shader5, to remove the restriction on constant offsets) The textureGatherOffsets() functions operate identically ... selecting the texel T_i0_j0 of that footprint. The specified values in need not be constant. A limited range of ... Modify Section 9, Shading Language Grammar, p. 92 !!! TBD !!! GLX Protocol TBD Interactions with OpenGL ES 3.1 If implemented in OpenGL ES, NV_gpu_shader5 acts as a superset of functionality provided by OES_gpu_shader5. A shader that enables this extension via an #extension directive also implicitly enables the common capabilities provided by OES_gpu_shader5. Replace references to ARB_gpu_shader5 with OES_gpu_shader5 and EXT_shader_implicit_conversions (as appropriate). Replace references to ARB_geometry_shader with OES/EXT_geometry_shader. Replace references to ARB_tessellation_shader with OES/EXT_tessellation_shader. Replace references to int64EXT and uint64EXT with int64 and uint64, respectively. The specification should be edited as follows to include new ProgramUniform* functions. (modify the ProgramUniform* language) The following commands: .... void ProgramUniform{1,2,3,4}{i64,ui64}NV (uint program int location, T value); void ProgramUniform{1,2,3,4}{i64,ui64}vNV (uint program, int location, const T *value); operate identically to the corresponding command where "Program" is deleted from the name (and extension suffixes are dropped or updated appropriately) except, rather than updating the currently active program object, these "Program" commands update the program object named by the parameter. ... Changes to Section 2.6.1 "Begin and End" don't apply. Disregard introduction of 64bit -integer or -floating point vertex attribute types. Interactions with OpenGL ES Shading Language 3.10, revision 3 If implemented in GLSL ES, NV_gpu_shader5 acts as a superset of functionality provided by OES_gpu_shader5 and EXT_shader_implicit_conversions. A shader that enables this extension via an #extension directive also implicitly enables the common capabilities provided by OES_gpu_shader5 and EXT_shader_implicit_conversions. Replace references to ARB_tessellation_shader with OES/EXT_tessellation_shader. Implicit conversion between GLSL ES types are introduced by EXT_shader_implicit_conversions instead of ARB_gpu_shader5. Disregard the notion of 'double' types as vertex shader inputs. Section 4.1.7.2 "Images" Remove the third sentence restricts access to arrays of images to constant integral expression. This essentially leaves it to the 'dynamically uniform integral expressions' default as OES_gpu_shader5 introduced. Modify Section 4.3.9 "Interface Blocks", as modified OES_gpu_shader5 NV_gpu_shader5 also lifts OES_gpu_shader5 restrictions with regard to indexing into arrays of uniforms blocks and shader storage blocks. Change sentence "All indices used to index a shader storage block array must be constant integral expressions. A uniform block array can only be indexed with a dynamically uniform integral expression, otherwise results are undefined." into "Arbitrary indices may be used to index a uniform block array; integral constant expressions are not required. If the index used to access an array of uniform blocks is out-of-bounds, the results of the access are undefined." Indexing into arrays of shader storage blocks defaults to 'dynamically uniform integral expressions'. Changes to Section 4.3.9, p.48 "Interface Blocks" Replace the sentence "All indices used to index a shader storage block array must be constant integral expressions. A uniform block array can only be indexed with a dynamically uniform integral expression, otherwise results are undefined." with "Arbitrary indices may be used to index a uniform block array; integral constant expressions are not required. If the index used to access an array of uniform blocks is out-of-bounds, the results of the access are undefined." 4.4.1.1 "Compute Shader Inputs" change "layout-qualifier-id: local_size_x = integer-constant local_size_y = integer-constant local_size_z = integer-constant" into "layout-qualifier-id: local_size_x = integer-constant-expression local_size_y = integer-constant-expression local_size_z = integer-constant-expression" Section 4.4.1.gs "Geometry Shader Inputs" change " ... invocations = integer-constant" into " ... invocations = integer-constant-expression" Section 4.4.2 "Output Layout Qualifiers" change "layout-qualifier-id: location = integer-constant" into "layout-qualifier-id: location = integer-constant-expression" Section 4.4.2.ts "Tessellation Control Outputs" change "layout-qualifier-id vertices = integer-constant" into "layout-qualifier-id: vertices = integer-constant-expression" Section 4.4.3 "Uniform Variable Layout Qualifiers" change "layout-qualifier-id: location = integer-constant" into "layout-qualifier-id: location = integer-constant-expression" Section 4.4.4 "Uniform and Shader Storage Block Layout Qualifiers" change "layout-qualifier-id: ... binding = integer-constant" into "layout-qualifier-id: ... binding = integer-constant-expression" Section 4.4.5 "Opaque Uniform Layout Qualifiers" change "layout-qualifier-id: binding = integer-constant" into "layout-qualifier-id: binding = integer-constant-expression" Change sentence "A link-time error will result if two shaders in a program specify different integer-constant bindings for the same opaque-uniform name." into "A link-time error will result if two shaders in a program specify different bindings for the same opaque-uniform name." Section 4.4.6 "Atomic Counter Layout Qualifiers" change "layout-qualifier-id: binding = integer-constant offset = integer-constant" into "layout-qualifier-id: binding = integer-constant-expression offset = integer-constant-expression" Section 4.4.7 "Format Layout Qualifiers" change "layout-qualifier-id: ... binding = integer-constant" into "layout-qualifier-id: ... binding = integer-constant-expression" Section 4.7.3 "Precision Qualifiers" After "Literal constants do not have precision qualifiers." add "Neither do explicitly sized types such as int8_t, uint32_t, float16_t etc." Dependencies on OES_gpu_shader5 In addition to allowing arbitrary indexing arrays of samplers, this extension also lifts OES_gpu_shader5 restrictions for indexing arrays of images and shader storage blocks. Additionally, it allows usage of 'integer-constant-expressions' for layout qualifiers that formerly took 'integer-constant'. In Section 'Overview': change the bullet point "* the ability to aggregate samplers into arrays...." to "* the ability to index into arrays of samplers, uniforms and shader storage blocks with arbitrary expressions, and not require that non-constant indices be uniform across all shader invocations." "* the ability to index into arrays of images using dynamically uniform integers." "* the ability to use 'integer-constant-expressions' in place of 'integer-constant' for layout qualifiers." Dependencies on OES/EXT_tessellation_shader and OpenGL ES 3.2 If implemented in OpenGL ES 3.1 or earlier and OES/EXT_tessellation_shader is not supported, language introduced by this extension describing processing patches in geometry shaders, transform feedback, and rasterization should be removed. If implemented in OpenGL ES 3.2 or implemented in OpenGL ES 3.1 and OES/EXT_tessellation_shader is supported: It is legal to send patches past the tessellation stage -- the following language from OES/EXT_tessellation_shader is removed: Patch primitives are not supported by pipeline stages below the tessellation evaluation shader. It is legal to use a tessellation control shader without a tessellation evaluation shader. Remove from the bullet list describing reasons for link failure below the LinkProgram command on p. 70 (as modified by OES/EXT_tessellation_shader): * the program is not separable and contains no object to form a tessellation evaluation shader; or Modify section 11.1.2.1, "Output Variables" on p. 262 (as modified by the OES/EXT_geometry_shader extension): Into the paragraph starting with "Each program object can specify a set of output variables from one shader to be recorded in transform feedback mode..." Insert after the tesselation evaluation shader bullet point: * tesselation control shader Modify section 11.1.3.11, "Validation" to replace the bullet point starting with "One but not both of the tessellation..." on p. 271 * the tessellation evaluation but not tessellation control stage has an active program with corresponding executable shader. Modify section 11.1ts, "Tessellation" Replace "Tessellation is considered active if and only if the active program object or program pipeline object includes both a tessellation control shader and a tessellation evaluation shader." with "Tessellation is considered active if and only if the active program object or program pipeline object includes a tessellation control shader." Replace "An INVALID_OPERATION error is generated by any command that transfers vertices to the GL if the current program state has one but not both of a tessellation control shader and tessellation evaluation shader." with "An INVALID_OPERATION error is generated by any command that transfers vertices to the GL if the current program state has a tessellation evaluation shader but not a tessellation control shader." Modify section 12.1.2 "Transform Feedback Primitive Capture" Replace the second paragraph of the section on p. 274 (as modified by OES/EXT_tessellation_shader): The data captured in transform feedback mode depends on the active programs on each of the shader stages. If a program is active for the geometry shader stage, transform feedback captures the vertices of each primitive emitted by the geometry shader. Otherwise, if a program is active for the tessellation evaluation shader stage, transform feedback captures each primitive produced by the tessellation primitive generator, whose vertices are processed by the tessellation evaluation shader. Otherwise, if a program is active for the tessellation control shader stage, transform feedback captures each output patch of that stage. Otherwise, transform feedback captures each primitive processed by the vertex shader. Modify the second paragraph following ResumeTransformFeedback on p. 277 (as modified by OES/EXT_tessellation_shader): When transform feedback is active and not paused ... If a tessellation or geometry shader is active, the type of primitive emitted by that shader is used instead of the parameter passed to drawing commands for the purposes of this error check. If tessellation and geometry shaders are both active, the output primitive type of the geometry shader will be used for the purposes of this error. Any primitive type may be used while transform feedback is paused. Modify section 13.3, "Points" After "The point size is determined by the last active stage before the rasterizer:" Add a new bullet point to the list, between the tessellation evaluation shader and the vertex shader: * the tessellation control shader, if active and no tessellation evaluation shader is active; Dependencies on OES/EXT_geometry_shader If implemented in GLSL ES and OES/EXT_geometry_shader is not supported, disregard all changes to geometry shader related functionality. Dependencies on ARB_gpu_shader5 This extension also incorporates all the changes to the OpenGL Shading Language made by ARB_gpu_shader5; enabling this extension by a #extension directive in shader code also enables all features of ARB_gpu_shader5 as though the shader code has also declared #extension GL_ARB_gpu_shader5 : enable The converse is not true; implementations supporting both extensions should not provide the shading language features in this extension if shader code #extension directives enable only ARB_gpu_shader5. This specification and ARB_gpu_shader5 both lift the restriction in GLSL 1.50 requiring that indexing in arrays of samplers must be done with constant expressions. However, ARB_gpu_shader5 specifies that results are undefined if the indices would diverge if multiple shader invocations are run in lockstep. This extension does not impose the non-divergent indexing requirement. Dependencies on ARB_gpu_shader_fp64 This extension and ARB_gpu_shader_fp64 both provide support for shading language variables with 64-bit components. If both extensions are supported, the various edits describing this new support should be combined. If ARB_gpu_shader_fp64 is not supported, the following edits should be removed: * language adding the data types "float64_t", "f64vec2", "f64vec3", and "f64vec4"; * language allowing implicit conversions of various types to double, dvec2, dvec3, or dvec4; and * the built-in functions doubleBitsToInt64(), doubleBitsToUint64(), int64BitsToDouble(), and uint64BitsToDouble(). Dependencies on ARB_tessellation_shader If ARB_tessellation_shader is not supported, language introduced by this extension describing processing patches in geometry shaders, transform feedback, and rasterization should be removed. If this extension and ARB_tessellation_shader are supported, it is legal to send patches past the tessellation stage -- the following language from ARB_tessellation_shader is removed: Patch primitives are not supported by pipeline stages below the tessellation evaluation shader. If there is no active program object or the active program object does not contain a tessellation evaluation shader, the error INVALID_OPERATION is generated by Begin (or vertex array commands that implicitly call Begin) if the primitive mode is PATCHES. Dependencies on NV_shader_buffer_load If NV_shader_buffer_load is supported, that specification should be edited as follows, to allow pointers to dereference the new data types added by this extension. Modify "Section 2.20.X, Shader Memory Access" from NV_shader_buffer_load. (add rules for loads of variables having the new data types from this extension to the list of bullets following "When a shader dereferences a pointer variable") - Data of type "int8_t," "int16_t", "int32_t", and "int64_t" are read from or written to memory as a single 8-, 16-, 32-, or 64-bit signed integer value at the specified GPU address. - Data of type "uint8_t," "uint16_t", "uint32_t", and "uint64_t" are read from or written to memory as a single 8-, 16-, 32-, or 64-bit unsigned integer value at the specified GPU address. - Data of type "float16_t", "float32_t", and "float64_t" are read from or written to memory as a single 16-, 32-, or 64-bit floating-point value at the specified GPU address. Dependencies on EXT_direct_state_access If EXT_direct_state_access is supported, that specification should be edited as follows to include new ProgramUniform* functions. (modify the ProgramUniform* language) The following commands: .... void ProgramUniform{1,2,3,4}{i64,ui64}NV (uint program int location, T value); void ProgramUniform{1,2,3,4}{i64,ui64}vNV (uint program, int location, const T *value); operate identically to the corresponding command where "Program" is deleted from the name (and extension suffixes are dropped or updated appropriately) except, rather than updating the currently active program object, these "Program" commands update the program object named by the parameter. ... Dependencies on EXT_vertex_attrib_64bit and NV_vertex_attrib_integer_64bit The EXT_vertex_attrib_64bit extension provides the ability to specify 64-bit floating-point vertex attributes in a GLSL vertex shader and the specify the values of these attributes via the OpenGL API. To successfully compile vertex shaders with fp64 input variables, is necessary to include #extension GL_EXT_vertex_attrib_64bit : enable in the shader text. However, this extension is considered to enable 64-bit floating-point and integer inputs. Provided EXT_vertex_attrib_64bit and NV_vertex_attrib_integer_64bit are supported, including the following code in a vertex shader #extension GL_NV_gpu_shader5 : enable will enable 64-bit floating-point or integer input variables whose values would be specified using the OpenGL API mechanisms found in the EXT_vertex_attrib_64bit and NV_vertex_attrib_integer_64bit extensions. Errors None. New State None. New Implementation Dependent State None. Issues (1) What implicit conversions are supported by this extension on top of those provided by related extensions? RESOLVED: ARB_gpu_shader5 and ARB_gpu_shader_fp64 provide new implicit conversions from "int" to "uint", and from "int", "uint", and "float" to "double". This extension provides integer types of multiple sizes and supports implicit conversions from small integer types to 32- or 64-bit integer types of the same signedness, as well as float and double. It also provides floating-point types of multiple sizes and supports implicit conversions from smaller to larger types. Additionally, it supports conversion from 64-bit integer types to double. (2) How do these implicit conversions impact binary operators? RESOLVED: For binary operators, we prefer converting to a common type that is as close as possible in size and type to the original expression. (3) How do these implicit conversions impact function overloading rules? RESOLVED: We extend the preference rules in ARB_gpu_shader5 to account for the new data types, adding rules to: * favor new "promotions" in integer/floating point types (previously, the only promotion was float-to-double) * for promotions, favor conversion to the type closer in size (e.g., prefer converting from int16_t to int over converting to int64_t) (4) What should be done to distinguish between 32- and 64-bit integer constants? RESOLVED: We will use "L" and "UL" to identify signed and unsigned 64-bit integer constants; the use of "L" matches a similar ("long") suffix in the C programming language. C leaves the size of integer types implementation-dependent, and many implementations require an "LL" suffix to declare 64-bit integer constants. With our size definitions, "L" will be considered sufficient to make an integer constant 64-bit. (5) Should provide support for vertex attributes with 64-bit components, and if so, how should the support be provided in the OpenGL API? RESOLVED: Yes, this seems like useful functionality, particularly for applications wanting to provide double-precision or 64-bit integer data to shaders performing computations on such types. We provide VertexAttribL* entry points for 64-bit components in the separate EXT_vertex_attrib_64bit and NV_vertex_attrib_64bit extensions, which should be supported on all implementations supporting this extension. (6) Should we allow vertex attributes with 8- or 16-bit components in the shading language, and if so, how does it interact with the OpenGL API? RESOLVED: Yes, but we will use existing APIs to specify such attributes, which already typically allow 8- and 16-bit components on the API side. Vertex attribute components (other than 64-bit ones) specified by the API will be converted from the type specified in the vertex attribute commands to the component type of the attribute. For floating-point values, that may involve 16-to-32 bit conversion or vice versa. For integer types, that may involve dropping all but the least significant bits of attribute components. (7) Should we support uniforms with double or 64-bit attribute types, and if so, how? Should we support uniforms with <32-bit components, and if so, how? RESOLVED: We will support uniforms of all component types, either in a buffer object (via OpenGL 3.1 or ARB_uniform_buffer_object) or in storage associated with the program. When uniforms are stored in buffer object, they are stored using their native data types according to the pre-existing packing and layout rules. Those rules were already written to be able to accommodate both the larger and smaller new data types. Uniforms stored in program objects are loaded with Uniform* APIs. There are no pre-existing uniform APIs accepting doubles or other "long" types, so there was no clear need to add an extra "L" to the name to distinguish from other APIs like we do with VertexAttribL* APIs. Uniforms with 8- and 16- bit components are loaded with the "larger" Uniform*{i,ui,f} APIs; it didn't seem worth it to add numerous entry points to the APIs to handle all those new types. (8) How do the uniform loading commands introduced by this extension interact similar commands added by NV_shader_buffer_load? RESOLVED: NV_shader_buffer_load provided the command Uniformui64NV to load pointer uniforms with a single 64-bit unsigned integer. This extension provides vectors of 64-bit unsigned integers, so we needed Uniform{2,3,4}ui64NV commands. We chose to provide a Uniform1ui64NV command, which will be functionally equivalent to Uniformui64NV. (9) How will transform feedback work for capturing variables with double or 64-bit components? Should we support transform feedback on variables with components with fewer than 32 bits? RESOLVED: Transform feedback will support variables with any component size. Components with fewer than 32-bits are converted to their equivalent 32-bit types. For doubles and variables with 64-bit components, each component captured will count as 64-bit values and occupy two components for the purpose of component counting rules. This could be a problem for the SEPARATE_ATTRIBS mode, since the minimum component limit is four, which would not be sufficient to capture a dvec3 or dvec4. However, implementations supporting this extension should also be able to support ARB_transform_feedback3, which extends INTERLEAVED_ATTRIBS mode to capture vertex attribute values interleaved into multiple buffers. That functionality effectively obsoletes the SEPARATE_ATTRIBS mode, since it is a functional superset. We considered support for capturing 8- and 16-bit values directly, which had a number of problems. First, full byte addressing might impose both alignment issues (e.g., capturing a uint8_t followed by a float might misalign the float) and additional hardware implementation burdens. One other option would be to pack multiple values into a 32-bit integer (e.g., f16vec2 would be packed with .x in the LSBs and .y in the MSBs). This could work, even with word addressing, but would require padding for odd sizes (e.g., f16vec2 padded to two words, with the second word holding only .z). It would also have endianness issues; packed values would look like arrays of the corresponding smaller type on little-endian systems, but not on big-endian ones. (10) What precision will be used for computation, storage, and inter-stage transfer of 8- and 16-bit component data types? RESOLVED: The components may be considered to occupy a full 32 bits for the purposes of input/output component count limits. 8- and 16-bit values should, however, be passed at that precision. (11) Is the new support for non-constant texel offsets completely orthogonal? RESOLVED: No. Non-constant offsets are not supported for the existing functions textureGradOffset() and textureProjGradOffset(). (12) Should we provide functions like intBitsToFloat() that operate on 16-bit floating-point values? RESOLVED: Not in this extension. Such conversions can be performed using the following code: uint16_t float16BitsToUint16(float16_t v) { return uint16_t(packFloat2x16(f16vec2(v, 0)); } float16_t uint16BitsToFloat16(uint16_t v) { return unpackFloat2x16(uint(v)).x; } (13) Should we provide distinct sized types for 32-bit integers and floats, and 64-bit floats? Should we provide those types as aliases for existing unsized types? Or should we provide no such types at all? RESOLVED: We will provide sized versions of these types, which are defined as completely equivalent to unsized types according to the following table: unsized type sized types ------------- --------------- int int32_t uint uint32_t float float32_t double float64_t Vector types with sized and unsized components have equivalent relationships. Note that the nominally "unsized" data types in the GLSL 1.30 spec are actually sized. The specification explicitly defines signed and unsized integers (int, uint) to be 32-bit values. It also defines floating-point values to "match the IEEE single precision floating-point definition for precision and dynamic range", which are also 32-bit values. This type equivalence has minor implications on function overloading: * You can't declare separate versions of a function with an "int" argument in one version and an "int32_t" argument in another. * Because there is no implicit conversion between equivalent types, we will get an exact match if an argument is declared with one type (e.g., "int") in the caller and a textually different but equivalent type ("int32_t") in the function. Note that the type equivalence also applies to API data type queries. For example, the type INT will be returned for a variable declared as "int32_t". (14) What are functions like anyThreadNV() and allThreadsNV() good for? NRESOLVED: If an implementation performs SIMD thread execution, divergent branching may result in reduced performance if the "if" and "else" blocks of an "if" statement are executed sequentially. For example, an algorithm may have both a "fast path" that performs a computation quickly for a subset of all cases and a "fast path" that performs a computation quickly but correctly. When performing SIMD execution, code like the following: if (condition) { result = do_fast_path(...); } else { result = do_slow_path(...); } may end up executing *both* the fast and slow paths for a SIMD thread group if diverges, and may execute more slowly than simply executing the slow path unconditionally. These functions allow code like: if (allThreadsNV(condition)) { result = do_fast_path(...); } else { result = do_slow_path(...); } that executes the fast path if and only if it can be used for *all* threads in the group. For thread groups where diverges, this algorithm would unconditionally run the slow path, but would never run both in sequence. There may be other cases where "voting" across shader invocations may be useful. Note that we provide no control over how shader invocations may be packed within a SIMD thread group, unlike various "compute" APIs (CUDA, OpenCL). (15) Can the 64-bit uniform APIs be used to load values for uniforms of type "bool", "bvec2", "bvec3", or "bvec4"? RESOLVED: No. OpenGL 2.0 and beyond did allow "bool" variable to be set with Uniform*i* and Uniform*f APIs, and OpenGL 3.0 extended that support to Uniform*ui* for orthogonality. But it seems pointless to extended this capability forward to 64-bit Uniform APIs as well. (19) The ARB_tessellation_shader extension adds support for patch primitives that might survive to the transform feedback stage. How are such primitives captured? RESOLVED: If patch primitives survive to the transform feedback stage, they are recorded on a patch-by-patch basis. Incomplete patches are not recorded. As with other primitive types, if the transform feedback buffers do not contain enough space to capture an entire patch, no vertices are recorded. Note that the only way to get patch primitives all the way to transform feedback is to have tessellation evaluation and geometry shaders disabled; the output streams from both of those shader stages are collections of points, lines, or triangles. (20) Previous transform feedback allowed capturing only fixed-size primitives; this extension supports variable-sized patches. What interactions does this functionality have with transform feedback buffer overflow? RESOLVED: With fixed-size point, line, or triangle primitives, once any primitive fails to be recorded due to insufficient space, all subsequent primitives would also fail. With variable-size patch primitives, the transform feedback stage might first receive a large patch that doesn't fit, followed by a smaller patch that could squeeze into the remaining space. To allow for different types of implementation of this extension without requiring special-case handling of this corner case, we've chosen to leave this behavior undefined -- the smaller patch may or may not be recorded. Revision History Rev. Date Author Changes ---- -------- -------- ----------------------------------------- 11 03/07/17 mheyer Update OpenGL ES interactions to clarify that using a tessellation control shader without a tessellation evaluation shader is legal, and PATCHES can be sent past the tessellation stage. 10 04/16/16 mheyer Add OpenGL ES interactions (written before revision 9, but not published) 9 02/19/16 pbrown Clarify that non-constant offset vectors are supported in textureGatherOffsets(). 8 09/11/14 pbrown Fix incorrect implicit conversions, which follow the general pattern of little->big and int->uint->float. Thanks to Daniel Rakos, author of similar functionality in the AMD_gpu_shader_int64 spec. 7 11/08/10 pbrown Fix typos in description of packFloat2x16 and unpackFloat2x16. 6 03/23/10 pbrown Update overview, dependencies, remove references to old extension names. Extend the function overloading prioritization rules from ARB_gpu_shader5 to account for new data types. Major overhaul of the issues section to match the refactoring done to produce ARB specs. 5 03/08/10 pbrown Add interaction with EXT_vertex_attrib_64bit and NV_vertex_attrib_integer_64bit; enabling this extension automatically enables 64-bit floating- point and integer vertex inputs. 4 03/01/10 pbrown Fix prototype for GetUniformui64vNV. 3 01/14/10 pbrown Fix with updated enum assignments. 2 12/08/09 pbrown Add explicit component counting rules for 64-bit integer attributes similar to those in the ARB_gpu_shader_fp64 spec. 1 pbrown Internal revisions.