Dependencies
This extension is written against the OpenCL SPIR-V Environment Specification Version 2.2, Revision v2.2-3.
This extension requires OpenCL support for SPIR-V, either via OpenCL 2.1 or via the cl_khr_il_program
extension, and support for the cl_intel_media_block_io
extension.
This extension interacts with the cl_intel_packed_yuv
extension.
This extension interacts with the cl_intel_planar_yuv
extension.
Overview
This extension defines how modules using the SPIR-V extension SPV_INTEL_media_block_io
may behave in an OpenCL environment.
This extension is a companion to the cl_intel_media_block_io
OpenCL extension, and the functionality described in this extension and SPV_INTEL_media_block_io
is sufficient to implement the built-in functions defined in the cl_intel_media_block_io
extension.
Modifications to the OpenCL SPIR-V Environment Specification
Add a new Section 7.1.X - cl_intel_spirv_media_block_io
If the OpenCL environment supports the extension cl_intel_spirv_media_block_io
, then the environment must accept SPIR-V modules that declare use of the SPV_INTEL_media_block_io
extension via OpExtension.
If the OpenCL environment supports the extension cl_intel_spirv_media_block_io
and use of the SPV_INTEL_media_block_io
extension is declared in the module via OpExtension, then the environment must accept modules that declare the following SPIR-V capabilities:
-
SubgroupImageMediaBlockIOINTEL
Additionally, the environment must accept the following types for 'Result Type' for OpSubgroupImageMediaBlockReadINTEL, and for the type of 'Data' for OpSubgroupImageMediaBlockWriteINTEL:
-
Scalars and OpTypeVectors with 2, 4, 8, or 16 Component Count components of the following Component Type types:
-
OpTypeInt with a Width of 8 bits and Signedness of 0 (equivalent to
char
anduchar
) -
OpTypeInt with a Width of 16 bits and Signedness of 0 (equivalent to
short
andushort
) -
OpTypeInt with a Width of 32 bits and Signedness of 0 (equivalent to
int
anduint
)
-
For Image:
-
Dim must be 2D
-
Depth must be 0 (not a depth image)
-
Arrayed must be 0 (non-arrayed content)
-
MS must be 0 (single-sampled content)
-
(equivalent to
image2d_t
)
For 'Coordinate', the following types are supported:
-
OpTypeVectors with 2 Component Count components of Component Type OpTypeInt with a Width of 32 bits and Signedness of 0 (equivalent to
int2
)
For 'Width' and 'Height', the following type is supported:
-
Scalars of OpTypeInt with a Width of 32 bits and a Signedness of 0 (equivalent to
int
)
Add a new Section 7.1.X.1 - Notes and Restrictions
Both OpSubgroupImageMediaBlockReadINTEL and OpSubgroupImageMediaBlockWriteINTEL must be encountered by all work items in the sub-group executing the kernel, otherwise the behavior is undefined (i.e. they can only be used in convergent control flow where all the work items in the sub-group are enabled).
The block 'Width' determines the maximum 'Height' for OpSubgroupImageMediaBlockReadINTEL and OpSubgroupImageMediaBlockWriteINTEL, and is summarized in the following table:
Width (bytes) | Maximum Height (rows) |
---|---|
4 |
64 |
8 |
32 |
12, 16 |
16 |
20, 24, 28, 32 |
8 |
Both the first component of 'Component', which represents the byte offset into the 'Image', and the block 'Width' must be four byte aligned.
Both the block 'Width' and 'Height' must be compile time constants.
The 'Image' operand must only be used by other SubgroupImageMediaBlockIOINTEL instructions or image query instructions. They may not be used by any other instructions that read texels from or write texels to the 'Image'.
Behavior is undefined if 'Image' is a planar YUV image, however 'Image' may represent an individual plane of a planar YUV image.
The 'Image' operand must be created such that the image byte width, defined as the image width multiplied by the 'Image Format' size, is a multiple of four bytes.
For OpSubgroupImageMediaBlockReadINTEL, if the 'Image Format' size is smaller than the block read 'Component Type', then an out-of-bounds read will return data replicated from the nearest edge element, otherwise out-of-bound read behavior is undefined. For example:
-
For an image with 'Image Format' size equal to a single byte (for example R8), and a 32-bit boundary value
B0B1B2B3
, replicating off the left edge may result in the 32-bit valueB0B0B0B0
, and replicating off the right edge may result in the 32-bit valueB3B3B3B3
. -
For an image with an 'Image Format' size equal to two bytes (for example R16), replicating off the left edge may result in the 32-bit value
B0B1B0B1
, and replicating off the right edge may result in the 32-bit valueB2B3B2B3
. -
For an image with an 'Image Format' size equal to four bytes (for example Rgba8), the entire boundary value is replicated, for both the left or right edges.
-
Because the maximum 'Component Type' is a four byte component type, there is no defined out-of-bounds behavior for images with an 'Image Format' size greater than four bytes.
-
As a special case, an image with a packed YUV 'Image Format' (and hence an 'Image Format' size equal to two bytes) behaves as follows:
-
Replicating off of the left edge replicates the UV components and the first Y component, so, for example, replicating the 32-bit boundary value
Y0U0Y1V0
will result in the 32-bit valueY0U0Y0V0
. -
Replicating off the right edge replicates the UV components and the second Y component, so, for example, replicating the 32-bit boundary value
Y0U0Y1V0
will result in the 32-bit valueY1U0Y1V0
.
-
For OpSubgroupImageMediaBlockWriteINTEL, if the 'Image Format' size is smaller than the block write 'Component Type', then out-of-bounds writes will be dropped, otherwise out-of-bounds write behavior is undefined.
When reading or writing a 2D 'Image' created from a buffer:
-
The 'image row pitch' is required to be a multiple of 64-bytes, in addition to the
CL_DEVICE_IMAGE_PITCH_ALIGNMENT
requirements. -
If the buffer is a
cl_mem
that was created withCL_MEM_USE_HOST_PTR
, then the host_ptr must be 256-bit (32-byte) aligned. -
If the buffer is a
cl_mem
that is a sub-buffer, then the origin must be a multiple of 32-bytes. Additionally, if the buffer that the sub-buffer is created from was created withCL_MEM_USE_HOST_PTR
, then the host_ptr for the buffer must be 256-bit (32-byte) aligned. -
The maximum 'Height' is further restricted to 16 rows or less.
Behavior is undefined if the size of the 2D source region (defined by the type of 'Data' and SubgroupMaxSize) is smaller than the size of the 2D region to write (defined by 'Width', 'Height', and block write 'Component Type').