Name QCOM_shading_rate Name Strings GL_QCOM_shading_rate Contributors Jeff Leger Robert VanReenen Contact Jeff Leger - jleger 'at' qti.qualcomm.com Status Complete Version Last Modified Date: April 22, 2020 Revision: #2 Number OpenGL ES Extension #279 Dependencies OpenGL ES 2.0 is required. This extension is written against OpenGL ES 3.2. This extension interacts with OVR_Multiview. This extension interacts with QCOM_framebuffer_foveated and QCOM_texture_foveated When this extension is advertised, the implementation must also advertise GLSL extension "GL_EXT_fragment_invocation_density" (documented separately), which provides new built-in variables that allow fragment shaders to determine the effective shading rate used for fragment invocations. Overview By default, OpenGL runs a fragment shader once for each pixel covered by a primitive being rasterized. When using multisampling, the outputs of that fragment shader are broadcast to each covered sample of the fragment's pixel. When using multisampling, applications can optionally request that the fragment shader be run once per color sample (e.g., by using the "sample" qualifier on one or more active fragment shader inputs), or run a minimum number of times per pixel using SAMPLE_SHADING enable and the MinSampleShading frequency value. This extension allows applications to specify fragment shading rates of less than 1 invocation per pixel. Instead of invoking the fragment shader once for each covered pixel, the fragment shader can be run once for a group of adjacent pixels in the framebuffer. The outputs of that fragment shader invocation are broadcast to each covered samples for all of the pixels in the group. The initial version of this extension allows for groups of 1, 2, 4, 8, and 16 pixels. This can be useful for effects like motion volumetric rendering where a portion of scene is processed at full shading rate and a portion can be processed at a reduced shading rate, saving power and processing resources. The requested rate can vary from (finest and default) 1 fragment shader invocation per pixel to (coarsest) one fragment shader invocation for each 4x4 block of pixels. Implementations are given wide latitude to rasterize at the requested rate or any other rate that is less coarse. New Tokens Accepted by the parameter of GetIntegerv, GetInterger64v and GetFloatv: SHADING_RATE_QCOM 0x96A4 Accepted by the parameter of Enable, Disable, IsEnabled: SHADING_RATE_PRESERVE_ASPECT_RATIO_QCOM 0x96A5 Allowed in the parameter in ShadingRateQCOM: SHADING_RATE_1X1_PIXELS_QCOM 0x96A6 SHADING_RATE_1X2_PIXELS_QCOM 0x96A7 SHADING_RATE_2X1_PIXELS_QCOM 0x96A8 SHADING_RATE_2X2_PIXELS_QCOM 0x96A9 SHADING_RATE_4X2_PIXELS_QCOM 0x96AC SHADING_RATE_4X4_PIXELS_QCOM 0x96AE New Procedures and Functions void ShadingRateQCOM(enum rate); Modifications to the OpenGL ES 3.2 Specification Modify Section 8.14.1, Scale Factor and Level of Detail, p. 196 (Modify the function approximating Scale Factor (P), to allow implementations to scale implicit derivatives based on the shading rate. The scale occurs before the LOD bias and before LOD clamping). Modify the definitions of (mu, mv, mw): | du du | mu = max | ----- , ----- | | dx dy | | dv dv | mv = max | ----- , ----- | | dx dy | | dw dw | mw = max | ----- , ----- | | dx dy | to: | du du | mu = max | ---- * sx , ---- * sy | | dx dy | | dv dv | mv = max | ---- * sx , ---- * sy | | dx dy | | dw dw | mw = max | ---- * sx , ---- * sy | | dx dy | where (sx, sy) refer to _effective shading rate_ (w', h') specified in section 13.X.2. Modify Section 13.4, Multisampling, p. 353 (add to the end of the section) When SHADING_RATE_QCOM is set to a value other than SHADING_RATE_1x1_PIXELS_QCOM, the rasterization will occur at the _effective shading rate_ (Section 13.X) and will result in fragments covering a x group of pixels. When multisample rasterization is enabled, the samples of the fragment will consist of the samples for each of the pixels in the group. The fragment center will be the center of this group of pixels. Each fragment will include a coverage value with (W x H x SAMPLES) bits. For example, if GL_SHADING_RATE_QCOM is is 2X2 and the currently bound framebuffer object has SAMPLES equal to 4 (4xMSAA), then the fragment will consist of 4 pixels and 16 samples. Similarly, each fragment will have (W * H * SAMPLES) depth values and associated data. The contents of Section 13.4.1, Sample Shading, p. 355 is moved to the new Section 13.X.3, "Sample Shading". Add new section 13.X before Section 13.5, Points, p. 355 Section 13.X, Shading Rate By default, each fragment processed by programmable fragment processing corresponds to a single pixel with a single (x,y) coordinate. When using multisampling, implementations are permitted to run separate fragment shader invocations for each sample, but often only run a single invocation for all samples of the fragment. We will refer to the density of fragment shader invocations as the _shading rate_. Applications can use the shading rate to increase the size of fragments to cover multiple pixels and reduce the amount of fragment shader work. Applications can also use the shading rate to explicitly control the minimum number of fragment shader invocations when multisampling. Section 13.X.1, Shading Rate Control The shading rate can be controlled with the command void ShadingRateQCOM(enum rate); specifies the value of SHADING_RATE_QCOM, and defines the _shading rate_. Valid values for are described in table X.1 Shading Rate Size ---------------------------- ----- SHADING_RATE_1X1_PIXELS_QCOM 1x1 SHADING_RATE_1X2_PIXELS_QCOM 1x2 SHADING_RATE_2X1_PIXELS_QCOM 2x1 SHADING_RATE_2X2_PIXELS_QCOM 2x2 SHADING_RATE_4X2_PIXELS_QCOM 4x2 SHADING_RATE_4X4_PIXELS_QCOM 4x4 Table X.1: Shading rates accepted by ShadingRateQCOM. An entry of "x" in the "Size" column indicates that the shading rate request for fragments with a width and height (in pixels) of and , respectively. If the shading rate is specified with ShadingRateCOM, it will apply to all draw buffers. If the shading rate has not been set , the shading rate will be SHADING_RATE_1x1_PIXELS_QCOM. In either case, the shading rate will be further adjusted as described in the following sections. Section 13.X.2, Effective Shading Rate The value of SHADING_RATE_QCOM, in combination with other GL state, is used to derive an adjusted rate or _effective shading rate_, as as described in this section. Where possible, implementations should provide an _effective shading rate_ equal to the SHADING_RATE_QCOM. When this is not possible, an adjusted _effective shading rate_ may be used as described in this section. While there is no API for querying the _effective shading rate_, the value of this parameter exists, can be queried from the fragment shader built-in gl_FragSizeEXT, and is referred to in a number of places in the specification. Implementations may also adjust the shading rate for other reasons not listed here. Implementations derive the _effective shading rate_ in an implementation-dependent manner. When rendering to the default framebuffer, the rate may be adjusted to 1x1. When sample shading (section 13.X.3 Sample Shading) is enabled, the rate may be adjusted to 1x1. When the fragment shader uses GLSL built-in input variables gl_SampleMaskIn[], gl_SampleMask[], or uses variables declared with "centroid in", the rate may be adjusted to 1x1. When sample coverage or sample mask operations are enabled (Section 13.8.3 Multisample Fragment Operations), the rate may be adjusted to 1x1. The shading rate may be adjusted to limit the number of samples covered by a fragment. For example, if the implementation supports a maximum of 16 samples per fragment and if GL_SHADING_RATE_QCOM is 4X4 and the currently bound framebuffer object has SAMPLES equal to 4 (4xMSAA), then the number of samples per coarse fragment would be 64. In such an example, an implementation may adjust the shading rate to a rate with 16 or fewer samples (e.g., 2x2). If the active fragment shader uses any inputs that are qualified with "sample" (unique values per sample), including the built-ins "gl_SampleID" and "gl_SamplePosition", or the built-in function "interpolateAtSample", the shader code is written to expect a separate shader invocation for each shaded sample. For such fragment shaders, the shading rate is adjusted to 1x1. If the x value of SHADING_RATE_QCOM is expressed as then the adjusted rate may be any as long as (w' * h') <= (w * h). If PRESERVE_SHADING_RATE_ASPECT_RATIO is TRUE, then the implementation further guarantees that (w'/h') equals (w/h) or that w'=1 and h'=1. Section 13.X.3 Sample Shading [[The contents from Section 13.4.1, Sample Shading, p. 355 is copied here]] Modifications to Section 13.8.2, Scissor Test (p. 367) (add to the end of the section) When the _effective shading rate_ results in fragments covering more than one pixel, the scissor tests are performed separately for each pixel in the fragment. If a pixel covered by a fragment fails the scissor test, that pixel is treated as though it was not covered by the primitive. If all pixels covered by a fragment are either not covered by the primitive being rasterized or fail the scissor test, the fragment is discarded. Modifications to Section 13.8.3, Multisample Fragment Operations (p. 368) (modify the last sentence of the the first paragraph to indicate that sample mask operations are performed when shading rate is used, even if multisampling is not enabled which can produce fragments covering more than one pixel where each pixel is considered a "sample") Change the following sentence from: "If the value of SAMPLE_BUFFERS is not one, this step is skipped." to: "This step is skipped if SAMPLE_BUFFERS is not one, unless SHADING_RATE_QCOM is set to a value other than SHADING_RATE_1x1_PIXELS_QCOM." (add to the end of the section) When the _effective shading rate_ results in fragments covering more than one pixel, each fragment will generate a composite coverage mask that includes separate coverage bits for each sample in each pixel covered by the fragment. This composite coverage mask will be used by the GLSL built-in input variable gl_SampleMaskIn[] and updated according to the built-in output variable gl_SampleMask[]. The number of composite coverage mask bits in the built-in variables and their mapping to a specific pixel and sample number within that pixel is implementation-defined. Modify Section 14.1, Fragment Shader Variables (p. 370) (modify sixth paragraph, p. 371, specifying that the "centroid" location for multi-pixel fragments is implementation-dependent, and is allowed to be outside the primitive) After the following sentence: "When interpolating variables declared using "centroid in", the variable is sampled at a location within the pixel covered by the primitive generating the fragment." Add the following sentence: "When the _effective shading rate_ results in fragments covering more than one pixel, variables declared using "centroid in" are sampled from an implementation-dependent location within any one of the covered pixels." Modify Section 15.1, Per-Fragment Operations (p. 378) (insert a new paragraph after the first paragraph of the section) When the _effective shading rate_ results in fragments covering multiple pixels, the operations described in the section are performed independently for each pixel covered by the fragment. The set of samples covered by each pixel is determined by extracting the portion of the fragment's composite coverage that applies to that pixel, as described in section 13.8.3. Errors INVALID_ENUM is generated by ShadingRateQCOM if is not a valid shading rate from table X.1 New State Add to table 21.7, Rasterization Get Value Type Get Command Initial Value Description Sec ------------------------------------- ---- ----------- -------------------------------- -------------- ------ SHADING_RATE_QCOM E GetIntegerV SHADING_RATE_1x1_PIXELS_BIT_QCOM shading rate 13.X.1 PRESERVE_SHADING_RATE_ASPECT_RATIO_QCOM B IsEnabled FALSE maintain aspect 13.X.2 Interactions with OVR_Multiview If OVR_Multiview is supported, SHADING_RATE_QCOM applies to all views. Interactions with QCOM_framebuffer_foveated and QCOM_texture_foveated QCOM_framebuffer_foveated and QCOM_texture_foveated specify a pixel density which is exposed as a fragment size via the fragment shader built-in gl_FragSizeEXT. This extension defines an effective shading rate which is also exposed as a fragment size using the via the same built-in. If either foveation extension is enabled in conjunction with this extension, then the value of gl_FragSizeEXT is the component-wise product of both fragment sizes. Issues (1) Should the application-specified rate in ShadingRateCOM() be a "hint" that can be ignored by the driver, or is the driver reqired to honor the requested rate? RESOLVED: The driver should honor the application-specified rate where possible, but is allowed to use an adjusted rate due to implementation- depdendent reasons. The specific rates supported in the hardware and the specific conditions when the rates needs to be adjusted can differ across different Adreno GPU families. This extension gives drivers the flexibility to expose this extension on early hardware that may have restrictions and oddities while providing applications some (admittedly limited) control over the adjusted rate that will be selected. The actual rate is always exposed via the fragment shader built-in. (2) If the application-specified rate is only a hint, can developers expect that all the shading rates exposed by this extension are supported natively by the HW? RESOLVED: The initial version of this extension exposes token values for shading rates of 1x1, 1x2, 2x1, 2x2, 4x2, and 4x4. Most Adreno GPUs supporting this extension are expected to support all those rates, although some early HW may support fewer rates. Note that this extension does not include shading rates of 1x4, 4x1, nor 2x4 because Adreno GPUs may never support those rates. Because a future version of this extension could support those rates, we have reserved the token values (0x96AA, 0x96AB, and 0x96AD) for those rates. (3) How does this feature work with per-sample shading? RESOLVED: When using per-sample shading, an application is expecting a fragment shader to run with a separate invocation per sample. The shading rate might allow for a "coarsening" that would break such shaders. Furthermore, some Adreno families may not support this combination. We've chosen not to explicitly disallow this combination, while giving implementions the flexibility to use an adjusted 1x1 sample rate. (4) How do centroid-sampled variables work with fragments larger than one pixel? RESOLVED: For single-pixel fragments, attributes declared with "centroid" are sampled at an implementation-dependent location in the intersection of the area of the primitive being rasterized and the area of the pixel that corresponds to the fragment. With multi-pixel fragments, attributes declared with "centroid" are sampled from an implementation-dependent location within any of the covered pixels. This wide allowance for implementation-dependent behavior enables the extension to be exposed on early Adreno hardware. (5) How do built-in variables gl_SampleMask[] and gl_SampleMaskIn[] work with fragments larger than one pixel? RESOLVED: For single-pixel fragments, gl_SampleMaskIn[] and gl_SampleMask[] specify the input and output coverage bits for a single pixel, where bit 'B' corresonds to SampleID 'B'. With this extension enabled, these built-ins would specify the coverage bits for all the samples in all the pixels covered by the fragment. In this extension, the exact behavior of gl_SampleMaskIn[] and gl_SampleMask[] is implementation-dependent. For some Adreno GPUs, use of these built-in variables will cause the driver to use a 1x1 adjusted sample rate. In other cases, the exact mapping of bits to samples/pixels is implementation- defined. This wide allowance for implementation-dependent behavior enables the extension to be exposed on early Adreno hardware. (6) Are there any restrictions on framebuffer formats used with this feature? For example, are EglImages that may contain multi-plane YUV formats supported? RESOLVED: It is implementation-dependent whether shading rate is supported for all formats, or only certain formats. Implementations are allowed to adjust the _effective sample rate_ based on the format. (7) Does the value of SHADING_RATE_QCOM affect the built in variable gl_Fragcoord? RESOLVED: Yes, when the shading rate results in fragments covering multiple pixels, gl_Fragcoord will be the window relative coordinates (x,y,z,1/w) of the center of the fragment. For non multisample cases this may not be at a pixel center. This may break shaders that assume pixel center (0.5, 0.5) values for fragcoord. (8) Does the shading rate affect the value of gl_SamplePosition or gl_NumSamples? RESOLVED: No, neither built-in is affected. If the shader usess gl_SamplePosition, the shader runs at sample-rate causing the shading rate to be ignored. gl_NumSamples is is the number of samples in the framebuffer object which is unaffected by the value of shading rate. (9) Should shading rate affect screen-space derivatives? RESOLVED: This extension scales the gradients between ajacent fragments by the effecive shading rate (w', h'). The resulting increase in computed LOD aligns well with the reduced fragment shader invocations in most use cases; in other cases the shader author may want to bias the LOD to compensate. Shader built-in instructions that return gradient values (dFdx, dFdy, and fwidth) are similarly scaled for the same reason. Revision History Rev. Date Author Changes ---- -------- -------- ---------------------------------------------- 1 03/17/20 jleger Initial draft. 2 04/22/20 jleger Relaxed the guarantee from "w'<=w and h'<=h" to "w’*h’ <= w*h".