Name String cl_intel_advanced_motion_estimation Contributors Biju George James Holland RaghuKrishnan Embar Adam Herr Tomasz Olejniczak Scott Pillow Ben Ashbaugh Contact Biju George (biju.george 'at' intel.com) IP Status TBD Version Version 2, February 15, 2016 Number OpenCL Extension TBD Status Final Draft Extension Type OpenCL platform extension Dependencies OpenCL 1.2 cl_intel_accelerator version 1 cl_intel_motion_estimation version 1 Overview This document presents the advanced motion estimation extension for OpenCL. This extension builds upon the cl_intel_motion_estimation extension by providing block- based estimation and greater control over the estimation algorithm. This extension reuses the set of host-callable functions and "motion estimation accelerator objects" defined in the cl_intel_motion_estimation extension version 1. This extension depends on the OpenCL 1.2 built-in kernel infrastructure and on the cl_intel_accelerator extension version 1, which provides an abstraction for domain- specific acceleration in the OpenCL runtime. New Procedures and Functions None New Tokens Accepted as arguments to clGetDeviceInfo CL_DEVICE_ME_VERSION_INTEL 0x407E Accepted as flags passed to the kernel: CL_ME_CHROMA_INTRA_PREDICT_ENABLED_INTEL 0x1 CL_ME_LUMA_INTRA_PREDICT_ENABLED_INTEL 0x2 CL_ME_SKIP_BLOCK_TYPE_16x16_INTEL 0x0 CL_ME_SKIP_BLOCK_TYPE_8x8_INTEL 0x4 CL_ME_COST_PENALTY_NONE_INTEL 0x0 CL_ME_COST_PENALTY_LOW_INTEL 0x1 CL_ME_COST_PENALTY_NORMAL_INTEL 0x2 CL_ME_COST_PENALTY_HIGH_INTEL 0x3 CL_ME_COST_PRECISION_QPEL_INTEL 0x0 CL_ME_COST_PRECISION_HEL_INTEL 0x1 CL_ME_COST_PRECISION_PEL_INTEL 0x2 CL_ME_COST_PRECISION_DPEL_INTEL 0x3 Valid intra-search predictor mode constants: CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_INTEL 0x0 CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_INTEL 0x1 CL_ME_LUMA_PREDICTOR_MODE_DC_INTEL 0x2 CL_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_LEFT_INTEL 0x3 CL_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_RIGHT_INTEL 0x4 CL_ME_LUMA_PREDICTOR_MODE_PLANE_INTEL 0x4 CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_RIGHT_INTEL 0x5 CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_DOWN_INTEL 0x6 CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_LEFT_INTEL 0x7 CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_UP_INTEL 0x8 CL_ME_CHROMA_PREDICTOR_MODE_DC_INTEL 0x0 CL_ME_CHROMA_PREDICTOR_MODE_HORIZONTAL_INTEL 0x1 CL_ME_CHROMA_PREDICTOR_MODE_VERTICAL_INTEL 0x2 CL_ME_CHROMA_PREDICTOR_MODE_PLANE_INTEL 0x3 Valid constant values returned by clGetDeviceInfo: CL_ME_VERSION_ADVANCED_VER_1_INTEL 0x1 CL_ME_VERSION_ADVANCED_VER_2_INTEL 0x2 Valid macroblock type constants: CL_ME_MB_TYPE_16x16_INTEL 0x0 CL_ME_MB_TYPE_8x8_INTEL 0x1 CL_ME_MB_TYPE_4x4_INTEL 0x2 Valid skip mode constants: CL_ME_FORWARD_INPUT_MODE_INTEL 0x1 CL_ME_BACKWARD_INPUT_MODE_INTEL 0x2 CL_ME_BIDIRECTION_INPUT_MODE_INTEL 0x3 Valid bidirectional weight constants: CL_ME_BIDIR_WEIGHT_QUARTER_INTEL 0x10 CL_ME_BIDIR_WEIGHT_THIRD_INTEL 0x15 CL_ME_BIDIR_WEIGHT_HALF_INTEL 0x20 CL_ME_BIDIR_WEIGHT_TWO_THIRD_INTEL 0x2B CL_ME_BIDIR_WEIGHT_THREE_QUARTER_INTEL 0x30 New Types None Additions to Chapter 4 of the OpenCL Specification: Modify the description of function clGetDeviceInfo Table 4.3 must be extended to include the following enumeration constants: cl_device_info Return Type Description -------------- ----------- --------------- CL_DEVICE_ME_VERSION_INTEL cl_uint The motion estimation API version number supported by the device and driver. This extension requires a minimum motion estimation device version number of CL_ME_VERSION_ADVANCED_VER_1_INTEL. The cl_intel_motion_estimation extension defines a motion estimation accelerator object. This object is used without modification in this extension. The tokens defined in the "New Tokens" section of the cl_intel_motion_estimation extension are used by this extension. This extension includes two new built-in kernels for block-based motion estimation. The second built-in kernel is supported only for devices that report a motion estimation device version number of CL_ME_VERSION_ADVANCED_VER_2_INTEL. A program object for one or both kernels is obtained via clCreateProgramWithBuiltInKernels, passing the kernel name as a string to the kernel_names argument. A kernel object is obtained from this program by calling the clCreateKernel function passing the kernel name as a string to the kernel_names argument. Each kernel operates on 16x16 pixel blocks (macroblocks) on the source and reference images. The number of macroblocks (MBs) in a given image is determined by number of 16x16 regions that evenly divide the global_work_size[0] (width) and global_work_size[1] (height) arguments passed to the clEnqueNDRangeKernel function. If the image dimensions are not evenly divisible by 16, a partial MB is defined for the remaining pixels. The kernel references macroblocks sequentially using contiguous row-major ordering. For example, a 128x128 source image would have the following macroblock ordering: ------------------------- | 0| 1| 2| 3| 4| 5| 6| 7| +--+--+--+--+--+--+--+--+ | 8| 9|10|11|12|13|14|15| +--+--+--+--+--+--+--+--+ |16|17|18|19|20|21|22|23| +--+--+--+--+--+--+--+--+ |24|25|26|27|28|29|30|31| +--+--+--+--+--+--+--+--+ |32|33|34|35|36|37|38|39| +--+--+--+--+--+--+--+--+ |40|41|42|43|44|45|46|47| +--+--+--+--+--+--+--+--+ |48|49|50|51|52|53|54|55| +--+--+--+--+--+--+--+--+ |56|57|58|59|60|61|62|63| ------------------------- The data-layout of the kernel's input and output arrays are based on this ordering and require a specific data layout per macroblock as described below. Inter-prediction is the process of determining the best inter-frame motion vectors that describe the transform from a 2D reference image to another 2D source image. This is done by searching for temporal patterns, usually in adjacent frames in a video sequence. The estimation algorithm operates on 16x16 macroblocks, with either 4x4, 8x8 or 16x16 sub-block sizes. Each of these sub-block sizes has a corresponding number of motion vectors within a given macroblock: Sub-block Size MVs per MB -------------- -------------- 4x4 16 8x8 4 16x16 1 The algorithm searches for the best match of each pixel block in the source image by searching an image region in the reference image, centered on the coordinates of that pixel block in the source image. This center coordinate can be offset by a set of prediction motion vectors (MVs). The predictor_motion_vector_buffer argument is used to define up to eight prediction MVs per macroblock. The count_motion_vector_buffer argument is used to configure the number of actual prediction motion vectors used within each macroblock. A cost function scheme can be specified for motion search. Distortion for a MV is computed as a sum of the SAD and the MV cost penalty. Cost penalty is computed based on the distance between the computed MV and a specific cost-center. This cost-center is specified as the first predictor motion vector configured for a given MB. The search_cost_penalty argument specifies the cost penalty function and can be configured for low, normal or high penalty. The search_cost_precision argument is used to configure the range of the cost function by specifying the precision of control points at which the cost penalties are applied to quarter, half, full, or double pixel precision. The cost penalties at in-between control points are linearly interpolated. Generally, a low penalty can be used when using low quantization parameter values during encoding and a high penalty can be used when using high quantization parameter values during encoding. Search results are populated in the search_motion_vector_buffer array. This array contains a set of best-search motion vectors per MB; the number of MVs per MB is determined by the sub-block size. It is also possible to obtain the SAD-adjusted residual values corresponding to the best search MVs via the array specified via the search_residuals argument. The kernel can perform skip-checks to produce distortion values based on the skip- check MVs specified for each macroblock. Skip-checks may be configured with either 8x8 or 16x16 sub-block sizes, via the skip_block_type argument. The skip_motion_vector_buffer is used to configure multiple sets of skip-check MVs per MB. The number of vectors in each set is determined by the sub-block size: Sub-block Size MVs per MB -------------- -------------- 8x8 4 16x16 1 Results are obtained via the skip_residuals argument as SAD-adjusted distortion values corresponding to each skip-check MV defined for each macroblock. Intra-prediction describes the transform from previous adjacent macroblocks to subsequent macroblocks within the same 2D source frame by searching for spatial patterns and produces the predictor modes from previous adjacent macroblocks within the same frame. The kernel may be configured to report the intra-prediction modes via the intra_search_prediction_modes_buffer argument. This array contains, for each macroblock, a record containing the predictor mode constants for 1 16x16, 4 8x8 and 16 4x4 luma blocks. There is also a predictor mode entry reserved for an 8x8 chroma block. Residual values derived during intra-prediction process are accessible via the intra_search_residuals argument. The first kernel block_advanced_motion_estimate_check_intel( accelerator_intel_t accelerator, __read_only image2d_t src_image, __read_only image2d_t ref_image, uint flags, uint skip_block_type, uint search_cost_penalty, uchar search_cost_precision, __global short2 *count_motion_vector_buffer, __global short2 *predictor_motion_vector_buffer, __global short2 *skip_motion_vector_buffer, __global short2 *search_motion_vector_buffer, __global char *intra_search_predictor_modes, __global ushort *search_residuals, __global ushort *skip_residuals, __global ushort *intra_residuals ); defines a kernel that provides various block-based motion estimation computations. There are three basic use cases for this kernel: 1.) Perform inter-prediction motion estimation on the source and reference images to obtain the best search motion vectors and their associated distortion values. 2.) Perform skip-checks on the source and reference images by providing a set of motion vectors, then obtain the corresponding distortion values. 3.) Perform intra-prediction computations to obtain the best-search prediction modes between adjacent macroblocks and associated residual values. This kernel can be set up to do some or all of these operations in a single enqueue. block_advanced_motion_estimate_check_intel arguments: accelerator is a valid accelerator object created by clCreateAcceleratorINTEL, where the type of the accelerator must be CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL. Refer to the cl_intel_motion_estimation extension for a detailed description of configuring accelerator object with the cl_motion_estimation_desc_intel structure. src_image is the input source image, typically representing 8-bit luminance information. Currently, the image_channel_order and the image_data_type of src_image are restricted as follows: Channel Order Src Channel Data Type -------------- --------------------- CL_R CL_UNORM_INT8 Additional formats will be support by future extensions. The host program is responsible for populating the tiled image using the clEnqueueWriteImage or other appropriate API function. ref_image is the input reference image, representing 8-bit luminance information. image_channel_order and the image_data_type must match src_image, as follows: Channel Order Src Channel Data Type -------------- --------------------- CL_R CL_UNORM_INT8 Additional formats will be support by future extensions. The host program is responsible for populating the tiled image using the clEnqueueWriteImage or other appropriate API function function. flags defines any optional modes or behaviors used in computing motion estimation, skip check and/or intra-prediction algorithms. Currently supported are: Type Description -------------------------------------- ----------- CL_ME_LUMA_INTRA_PREDICT_ENABLED_INTEL Enabled Luma-based intra-prediction. The following additional token are reserved for future support: Type Description -------------------------------------- ----------- CL_ME_CHROMA_INTRA_PREDICT_ENABLED_INTEL Enabled chroma- based intra-prediction. skip_block_type flag specifies the sub-block size used in evaluating skip checks. The specified sub-block size will determine the data layout of the skip_motion_vector_buffer array: Type Sub-block size MVs per MB entry --------------------- --------------- ---------------- CL_ME_MB_TYPE_16x16_INTEL 16x16 1 CL_ME_MB_TYPE_8x8_INTEL 8x8 4 search_cost_penalty defines the cost function scheme used in computing cost penalties. Type Description -------------------------------------- ----------- CL_ME_COST_PENALTY_NONE_INTEL penalty is zero CL_ME_COST_PENALTY_LOW_INTEL penalty for low motion CL_ME_COST_PENALTY_NORMAL_INTEL penalty for normal motion CL_ME_COST_PENALTY_HIGH_INTEL penalty for high motion search_cost_precision defines the pixel precision of the cost penalty calculations. If the search_cost_penalty flag is set to CL_ME_COST_PENALTY_NONE_INTEL, this argument is ignored. Possible values are: Type Description -------------------------------------- ----------- CL_ME_COST_PRECISION_QPEL_INTEL quarter pixel CL_ME_COST_PRECISION_HPEL_INTEL half pixel CL_ME_COST_PRECISION_PEL_INTEL full pixel CL_ME_COST_PRECISION_DPEL_INTEL double pixel count_motion_vector_buffer defines the number of predictor motion vectors and skip- check motion vectors defined for each macroblock. The buffer contains an array of short integer pairs, one pair per MB. The indices of the array correspond to the contiguous row-major block layout of the input frame. The first value in each pair defines the number of predictor motion vectors for a given MB; this value defines the range of valid entries for the MB contained within the predictor_motion_vector_buffer array. The second value in each pair defines the number of skip-check motion vectors for the MB; this value defines the range of valid entries in the skip_motion_vector_buffer array. All size values must be between 0 and 8 inclusive; size values greater than 8 result in undefined behavior. predictor_motion_vector_buffer defines an input array of signed short integer predictor MVs with quarter-pixel resolution. The array is partitioned into clusters of 8 motion vectors per MB in contiguous row-major ordering. The buffer layout assumes the maximum size of 8 predictor MVs per MB even if the count_motion_vector_buffer array specifies a smaller predictor count. If the value of the search_cost_penalty argument does not equal CL_ME_COST_PENALTY_NONE_INTEL, the first predictor MV for each MB is used as the cost center for cost penalty calculations. If the array passed to count_motion_vector_buffer argument specifies a predictor size of zero for all macroblocks this argument can be NULL. skip_motion_vector_buffer defines an input array of signed short integer skip-check MVs. The array is partitioned into clusters of 8 sets of motion vectors per MB, in contiguous row-major ordering. The value of skip_block_type determines the number of MVs for each of the 8 entries: Value of skip_block_type Number MVs in each entry ------------------------- --------------------------- CL_ME_MB_TYPE_16x16_INTEL 1 MVs per entry CL_ME_MB_TYPE_8x8_INTEL 4 MVs per entry The buffer layout assumes the maximum size of 8 MV entries per MB, even if the count_motion_vector_buffer array specifies a smaller skip-check count. If the array passed to count_motion_vector_buffer specifies a skip-check size of zero for all macro blocks, no skip check computation is performed and this argument can be NULL. search_motion_vector_buffer defines an output array of signed short integers pairs defining the best search motion vectors per macro block. The array contains 1, 4 or 16 motion vectors per MB in contiguous row-major ordering. The number of vectors per MB is determined by the value of mb_block_type specified during the creation of the accelerator object: Value of mb_block_type Number of MVs ---------------------- -------------- CL_ME_MB_TYPE_16x16_INTEL 1 MVs per MB CL_ME_MB_TYPE_8x8_INTEL 4 MVs per MB CL_ME_MB_TYPE_4x4_INTEL 16 MVs per MB intra_search_prediction_modes_buffer specifies an output buffer containing a sequence of signed chars describing the predictor modes used during motion estimation. The array is divided into a sequence of 22 bytes per MB in contiguous row-major ordering. Each entry in the array has the following form: struct search_predictor_modes { char luma_16x16_block; char luma_8x8_block[4]; char luma_4x4_block[16]; char chroma_8x8_block; }; The luma_16x16_block, luma_8x8_block and luma_4x4_block fields contain valid values only when the CL_ME_LUMA_INTRA_PREDICT_ENABLED_INTEL flag is set. Each value in the luma_8x8_block and luma_4x4_block arrays contains one of the following constants: CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_INTEL CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_INTEL CL_ME_LUMA_PREDICTOR_MODE_DC_INTEL CL_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_LEFT_INTEL CL_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_RIGHT_INTEL CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_RIGHT_INTEL CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_DOWN_INTEL CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_LEFT_INTEL CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_UP_INTEL The value of luma_16x16_block contains one of the following constants: CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_INTEL CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_INTEL CL_ME_LUMA_PREDICTOR_MODE_DC_INTEL CL_ME_LUMA_PREDICTOR_MODE_PLANE_INTEL The chroma_8x8_block field only contain valid values if the CL_ME_CHROMA_INTRA_PREDICT_ENABLED_INTEL flag is set. If enabled, the chroma_8x8_block contains one of the following constants: CL_ME_CHROMA_PREDICTOR_MODE_VERTICAL_INTEL CL_ME_CHROMA_PREDICTOR_MODE_HORIZONTAL_INTEL CL_ME_CHROMA_PREDICTOR_MODE_DC_INTEL CL_ME_CHROMA_PREDICTOR_MODE_PLANE_INTEL This argument can be NULL. search_residuals defines an output buffer containing vectors of unsigned short SAD- adjusted values corresponding to the best search motion vectors populated in the search_motion_vector_buffer array. The array is divided into one vector per MB in contiguous row-major block ordering. Each vector contains 1, 4, or 16 components depending on the value of mb_block_type specified during the creation of the accelerator object: Value of mb_block_type Vector Size ------------------------- ------------- CL_ME_MB_TYPE_16x16_INTEL 1 CL_ME_MB_TYPE_8x8_INTEL 4 CL_ME_MB_TYPE_4x4_INTEL 16 This argument can be NULL. skip_residuals defines an output buffer containing vectors of unsigned short SAD- adjusted values corresponding to the skip-check MVs defined by skip_motion_vector_buffer. The array is partitioned into clusters of 8 sets of residual values per MB, in contiguous row-major ordering. The value of skip_block_type determines the number of values in each of the 8 entries: Value of skip_block_type Number MVs in each entry ------------------------ ------------------------ CL_ME_MB_TYPE_16x16_INTEL 1 residual per entry CL_ME_MB_TYPE_8x8_INTEL 4 residuals per entry The buffer layout assumes the maximum size of 8 residual values per MB, however the number of valid residual entries corresponds to the skip-check MV count specified in count_motion_vector_buffer for each MB. This argument can be NULL. intra_search_residuals defines an output buffer of unsigned short SAD-adjusted vectors that correspond to the residual values used during intra-prediction. The buffer contains 4 values per MB in contiguous row-major ordering using the following layout: struct intra_search_residuals { short luma_16x16_block_residual; short luma_8x8_block_residual; short luma_4x4_block_residual; short chroma_8x8_block_residual; }; The chroma_8x8_block_residuals value is only valid if the CL_ME_CHROMA_INTRA_PREDICT_ENABLED flag is set. This argument can be NULL. The second built-in kernel extends upon the functionality provided by the first kernel by additionally supporting bidirectional skip checks and minor additional control over the estimation algorithm. The additional control includes the ability to perform skip checks on a different set of source and reference frames as used for motion search, and the option to specify a pair of scalar counts for input search predictors and skip motion vectors globally for all macro-blocks in the frame instead of having to specify such counts on a per macroblock basis. This kernel is only supported for devices that report a motion estimation device version number of CL_ME_VERSION_ADVANCED_VER_2_INTEL. The second kernel block_advanced_motion_estimate_bidirectional_check_intel( accelerator_intel_t accelerator, __read_only image2d_t src_search_image, __read_only image2d_t ref_search_image, __read_only image2d_t src_check_image, __read_only image2d_t ref0_check_image, __read_only image2d_t ref1_check_image, uint flags, uint search_cost_penalty, uint search_cost_precision, short2 count_global, uchar bidir_weight, __global short2 * count_motion_vector_buffer, __global short2 * prediction_motion_vector_buffer, __global char *skip_input_mode_buffer, __global short2 * skip_motion_vector_buffer, __global short2 *search_motion_vector_buffer, __global char *intra_search_predictor_modes, __global ushort *search_residuals, __global ushort * skip_residuals, __global ushort * intra_residuals ); defines a kernel that provides various block-based motion estimation computations. There are three basic use cases for this kernel: 1.) Perform unidirectional inter-prediction motion estimation on the source and reference images to obtain the best search motion vectors and their associated distortion values. 2.) Perform unidirectional or bidirectional skip-checks on the source and reference images by providing a set of motion vectors, then obtain the corresponding distortion values. 3.) Perform intra-prediction computations to obtain the best-search prediction modes between adjacent macroblocks and associated residual values. This kernel can be set up to do some or all of these operations in a single enqueue. The kernel can perform unidirectional or bidirectional skip-checks to produce distortion values based on the input skip-check motion vectors specified for each sub-block in the macroblock. Skip-checks for each motion vector for a sub-block may be independently configured for either unidirectional or bidirectional skip-checks by means of the skip_input_mode_buffer argument. A sub-block configured for bidirectional skip-check will have two component input motion vectors; one specifying a rectangular region in the forward reference frame and the other in the backward reference frame. A bidir_weight argument is additionally specified for sub-blocks configured for bidirectional skip checks. The effective reference region is a weighted blend of the forward and backward reference region as specified by the bidirectional skip motion vector and bidir_weight arguments. If a motion vector is configured for bidirectional skip check, then the ref0_check_image argument is taken as the forward reference image and ref1_check_image is taken as the backward reference image, else only ref0_check_image is taken as the single reference image. Skip-checks may be configured with either 8x8 or 16x16 sub-block sizes, via the flags argument. The skip_motion_vector_buffer is used to configure up to four sets of bidirectional skip-check MVs pairs per MB. The number of vectors in each set is determined by the sub-block size: Sub-block Size Bidirectional MV pairs per MB -------------- ----------------------------- 8x8 4 16x16 1 Results are obtained via the skip_residuals argument as SAD-adjusted distortion values corresponding to each skip-check MV defined for each macroblock. block_advanced_motion_estimate_bidirectional_check_intel arguments: accelerator is a valid accelerator object created by clCreateAcceleratorINTEL, where the type of the accelerator must be CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL. Refer to the cl_intel_motion_estimation extension for a detailed description of configuring accelerator object with the cl_motion_estimation_desc_intel structure. src_search_image is the input source image for motion search operations, typically representing 8-bit luminance information. Currently, the image_channel_order and the image_data_type of src_search_image are restricted as follows: Channel Order Src Channel Data Type ------------- --------------------- CL_R CL_UNORM_INT8 Additional formats will be support by future extensions. The host program is responsible for populating the tiled image using the clEnqueueWriteImage function or other appropriate API function. ref_search_image is the input reference image for motion search operations, representing 8-bit luminance information. The image_channel_order and the image_data_type must match src_search_image. The host program is responsible for populating the tiled image using the clEnqueueWriteImage function or other appropriate API function. src_check_image is the input source image for skip checks operations, typically representing 8-bit luminance information. It has the same restrictions as src_search_image. The host program is responsible for populating the tiled image using the clEnqueueWriteImage function or other appropriate API function. ref0_check_image is the input forward reference image for unidirectional and bidirectional skip check operations, representing 8-bit luminance information. The image_channel_order and the image_data_type must match src_check_image. The host program is responsible for populating the tiled image using the clEnqueueWriteImage function or other appropriate API function. ref1_check_image is the input backward reference image for bidirectional skip check operations, representing 8-bit luminance information. The image_channel_order and the image_data_type must match src_check_image. If bidirectional skip checks are not used then, this must be set to the same image as ref0_check_image. The host program is responsible for populating the tiled image using the clEnqueueWriteImage function or other appropriate API function. flags defines any optional modes or behaviors used in computing motion estimation, skip check and/or intra-prediction algorithms. Currently supported are: Type Description ---- ----------- CL_ME_SKIP_BLOCK_TYPE_16x16_INTEL Specifies a 16x16 skip check sub-block type. CL_ME_SKIP_BLOCK_TYPE_8x8_INTEL Specifies a 8x8 skip check sub-block type. CL_ME_LUMA_INTRA_PREDICT_ENABLED_INTEL Enables Luma- based intra- prediction The following additional token is reserved for future support: Type Description ---- ----------- CL_ME_CHROMA_INTRA_PREDICT_ENABLED_INTEL Enables chroma-based intra- prediction. The CL_ME_SKIP_BLOCK_TYPE_16x16_INTEL flag cannot be set along with CL_ME_SKIP_BLOCK_TYPE_8x8_INTEL as skip checks can either be configured for a 16x16 sub-block size or an 8x8 sub-block size. The behavior is undefined if both flags are set. search_cost_penalty defines the cost function scheme used in computing cost penalties. Refer to the description of argument search_cost_penalty in the first built-in kernel block_advanced_motion_estimate_check_intel for details. search_cost_precision defines the pixel precision of the cost penalty calculations. Refer to the description of argument search_cost_precision in the first built-in kernel block_advanced_motion_estimate_check_intel for details. count_global can be used to specify the scalar counts of predictor motion vectors and skip-check motion vectors globally for all macroblocks. This can be used in lieu of setting the count_motion_vector_buffer if the number of predictor motion vectors and skip-check motion vectors are uniform for all macro-blocks. It is specified as a pair of short integers. The first value in the pair is the count of predictor MVs and the second value is the count of skip check MVs. If either value is -1, then the corresponding count is taken from the per macroblock entry for each macroblock as specified in count_motion_vector_buffer. count_motion_vector_buffer defines the number of predictor motion vectors and skip-check motion vectors defined for each macroblock. The buffer contains an array of short integer pairs, one pair per MB. The indices of the array correspond to the contiguous row-major block layout of the input frame. The first value in each pair defines the number of predictor motion vectors for a given MB; this value defines the range of valid entries for the MB contained within the predictor_motion_vector_buffer array. This value is used only if the predictor count in count_global is -1. The second value in each pair defines the number of skip-check motion vectors for the MB; this value defines the range of valid entries in the skip_motion_vector_buffer array. This value is used only if the skip-check MV count in count_global is -1. All size values must be between 0 and 4 inclusive; size values greater than 4 result in undefined behavior. If both the pair values in count_global is -1, then this argument can be NULL. bidir_weight defined the implicit bidirectional weight to be used when performing bidirectional skip checks. This is used to obtain the weighted reference pixels from the forward and backward blocks. There are 5 possible weights: Weight Description ------ ----------- CL_ME_BIDIR_WEIGHT_QUARTER_INTEL quarter distance from forward and three- quarters from backward CL_ME_BIDIR_WEIGHT_THIRD_INTEL one-third distance from forward and two- thirds from backward CL_ME_BIDIR_WEIGHT_HALF_INTEL half distance from forward and backward CL_ME_BIDIR_WEIGHT_TWO_THIRD_INTEL two-third distance from forward and one- quarters from backward CL_ME_BIDIR_WEIGHT_THREE_QUARTER_INTEL quarter distance from forward and three- quarters from backward predictor_motion_vector_buffer defines an input array of signed short integer predictor MVs with quarter-pixel resolution. The array is partitioned into clusters of 4 motion vectors per MB in contiguous row-major ordering. The buffer layout assumes the maximum size of 4 predictor MVs per MB even if the count_motion_vector_buffer array specifies a smaller predictor count. If the value of the search_cost_penalty argument does not equal CL_ME_COST_PENALTY_NONE_INTEL, the first predictor MV for each MB is used as the cost center for cost penalty calculations. If the array passed to count_motion_vector_buffer argument specifies a predictor size of zero for all macroblocks this argument can be NULL. skip_input_mode_buffer defines an input array of unsigned char integers defining the skip modes for each macroblock. The indices of the array correspond to the contiguous row-major block layout of the input frame. There is one unsigned char integer per macroblock. The following 2-bit skip mode enumeration values are defined for each macroblock sub-block: Skip Mode Enumeration Description --------------------- ----------- CL_ME_FORWARD_INPUT_MODE_INTEL Unidirectional skip MV from forward frame CL_ME_BACKWARD_INPUT_MODE_INTEL Unidirectional skip MV from backward frame CL_ME_BIRECTIONAL_INPUT_MODE_INTEL Bidirectional skip MV from forward and backward frames The format of each macroblock unsigned char entry depends on the skip block type specified in the flags parameter. Skip Block Type Format --------------- ------ CL_ME_SKIP_BLOCK_TYPE_16x16_INTEL 1 2-bit skip mode enumeration value for one sub-block CL_ME_SKIP_BLOCK_TYPE_8x8_INTEL 4 2-bit skip modes enumeration values (one for each sub-block component MV) packed into successive two bits of char skip_motion_vector_buffer defines an input array of pairs of signed short integer skip-check MVs – one component MV for the forward reference image and the other component MV for the backward reference image. The buffer layout assumes the MVs for both the forward and backward reference images are specified. If unidirectional forward or backward check is specified for a skip-check sub-block, then the corresponding backward or forward component of the MV pair will be ignored. The array is partitioned into clusters of 4 sets of bidirectional pairs motion vectors per MB, in contiguous row-major ordering. The value of skip_block_type determines the number of pairs of MVs for each of the 4 entries: Value of skip_block_type Number MV pairs in each entry ---------------------- --------------------------- CL_ME_MB_TYPE_16x16_INTEL 1 MV forward/backward pair per entry CL_ME_MB_TYPE_8x8_INTEL 4 MV forward/backward pairs per MB The buffer layout assumes the maximum size of 4 MV pair entries per MB, even if the count_motion_vector_buffer array specifies a smaller skip- check count. If the array passed to count_motion_vector_buffer specifies a skip-check size of zero for all macro blocks, no skip check computation is performed and this argument can be NULL. search_motion_vector_buffer defines an output array of signed short integers pairs defining the best search motion vectors per macro block. Refer to the description of argument search_motion_vector_buffer in the first built-in kernel block_advanced_motion_estimate_check_intel for details. intra_search_prediction_modes_buffer specifies an output buffer containing a sequence of signed chars describing the predictor modes used during motion estimation. Refer to the description of argument intra_search_prediction_modes in the first built-in kernel block_advanced_motion_estimate_check_intel for details. search_residuals defines an output buffer containing vectors of unsigned short SAD-adjusted values corresponding to the best search motion vectors populated in the search_motion_vector_buffer array. Refer to the description of argument search_residuals in the first built-in kernel block_advanced_motion_estimate_check_intel for details. skip_residuals defines an output buffer containing vectors of unsigned short SAD-adjusted values corresponding to the skip-check MVs defined by skip_motion_vector_buffer. The array is partitioned into clusters of 4 sets of residual values per MB, in contiguous row-major ordering. The value of skip_block_type determines the number of values in each of the 4 entries: Value of skip_block_type Number MVs in each entry ---------------------- --------------------------- CL_ME_MB_TYPE_16x16_INTEL 1 residual per entry CL_ME_MB_TYPE_8x8_INTEL 4 residuals per entry The buffer layout assumes the maximum size of 4 residual values per MB, however the number of valid residual entries corresponds to the skip-check MV count specified in count_motion_vector_buffer for each MB. This argument can be NULL. intra_search_residuals defines an output buffer of unsigned short SAD-adjusted vectors that correspond to the residual values used during intra-prediction. Refer to the description of argument intra_search_residuals in the first built-in kernel block_advanced_motion_estimate_check_intel for details. This kernel is queued for execution using clEnqueueNDRangeKernel(). Several arguments passed to this function are specific for this kernel: For both built-in kernels: work_dim must be 2. global_work_size represents the height and width of the area of interest to be processed. global_work_offset specifies the top-left point of the area of interest. local_work_size must NULL. The count and layout of macroblocks processed in the frame is based on the arguments passed to the global_work_size and global_work_offset argument and not on the dimensions of the input and reference images. The client must ensure that the data layout of all arrays passed as arguments define the correct number of macroblocks. The height and width dimensions of the area of interest specified by global_work_size and global_work_offset must be less than or equal to the width and height of the source image. The clEnqueueNDRangeKernel function returns the usual error codes, augmented with the following specific error codes for this kernel: - CL_INVALID_WORK_DIMENSION if work_dim is not 2. This built-in kernel requires a 2D ND-range. - CL_INVALID_WORK_GROUP_SIZE if local_work_size is not NULL. - CL_INVALID_WORK_GROUP_SIZE if the respective values of global_work_size[0] and global_work_size[1] exceed the width and/or height of input images. - CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if an image object passed as an argument does not have a supported format, as listed above. - CL_INVALID_IMAGE_FORMAT if the image objects passed to src_image and ref_image arguments do not contain matching formats and sizes. - CL_INVALID_GLOBAL_OFFSET if the respective values of global_work_offset[0] and global_work_offset[1] exceed the width and/or height of input images. - CL_INVALID_KERNEL_ARGS if predictor_motion_vector_buffer is NULL and one or more predictor MV sizes passed to count_motion_vector_buffer are greater than 0. - CL_INVALID_KERNEL_ARGS if skip_motion_vector_buffer is NULL and one or more but skip-check MV sizes passed to count_motion_vector_buffer are greater than 0. - CL_INVALID_BUFFER_SIZE if any of the cl_mem objects passed as arguments has a size less than the expected size. Interactions with Other Extensions The advanced motion estimation extension is based on the cl_intel_accelerator and cl_intel_motion_estimation extensions, and is defined in terms of additions to the base accelerator and motion extension documents.