Name String cl_intel_motion_estimation Contributors Nico Galoppo, Intel Craig Hansen-Sturm, Intel Yejun Guo, Intel Ben Ashbaugh, Intel Contact Ben Ashbaugh, Intel (ben.ashbaugh 'at' intel.com) IP Status TBD Version Version 2, March 18, 2016 Number OpenCL Extension #23 Status Final draft Dependencies OpenCL 1.2 and support for the cl_intel_accelerator extension is required. This extension is written against revision 19 of the OpenCL 1.2 specification and against version 2 of the cl_intel_accelerator extension specification. Overview This document describes an OpenCL extension to expose basic motion estimation capabilities. There are two key parts of this extension: The first is a built-in kernel for frame-based motion estimation that is exposed using the built-in kernel infrastructure added to OpenCL 1.2. The second is the ability to create motion estimation accelerators, which can be used to control behavior of the motion estimation engine. The motion estimation accelerators use the framework described in the cl_intel_accelerator extension. New Procedures and Functions None New Tokens Accepted as the parameter of clCreateAcceleratorINTEL: CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL 0x0 Accepted as values for fields of the cl_motion_estimation_desc_intel structure: Accepted values for : CL_ME_MB_TYPE_16x16_INTEL 0x0 CL_ME_MB_TYPE_8x8_INTEL 0x1 CL_ME_MB_TYPE_4x4_INTEL 0x2 Accepted values for : CL_ME_SUBPIXEL_MODE_INTEGER_INTEL 0x0 CL_ME_SUBPIXEL_MODE_HPEL_INTEL 0x1 CL_ME_SUBPIXEL_MODE_QPEL_INTEL 0x2 Accepted values for : CL_ME_SAD_ADJUST_MODE_NONE_INTEL 0x0 CL_ME_SAD_ADJUST_MODE_HAAR_INTEL 0x1 Accepted values for : CL_ME_SEARCH_PATH_RADIUS_2_2_INTEL 0x0 CL_ME_SEARCH_PATH_RADIUS_4_4_INTEL 0x1 CL_ME_SEARCH_PATH_RADIUS_16_12_INTEL 0x5 New Types Accepted as a type pointed to by for clCreateAcceleratorINTEL, and pointed to by for clGetAcceleratorInfoINTEL: typedef struct _cl_motion_estimation_desc_intel { cl_uint mb_block_type; cl_uint subpixel_mode; cl_uint sad_adjust_mode; cl_uint search_path_type; } cl_motion_estimation_desc_intel; Modifications to Section 5.XX.1 - "Creating Accelerator Objects" in the cl_intel_accelerator specification: Add a table of accelerator types, immediately following the description of : "---------------------------------------------------------------------------- cl_accelerator_type_intel Description ------------------------- ----------- CL_ACCELERATOR_TYPE_ Create a basic full-frame motion estimation MOTION_ESTIMATION_INTEL accelerator ----------------------------------------------------------------------------" Add a table of accelerator descriptors, immediately following the description of : "---------------------------------------------------------------------------- Descriptor Type Description --------------- ----------- cl_motion_estimation_desc_intel Used to represent the configuration state of basic full-frame motion estimation accelerators ----------------------------------------------------------------------------" Add a new sub-section "Motion Estimation Accelerators" to Section 5.XX.3 - "Accelerator Descriptors" in the cl_intel_accelerator specification: "The cl_motion_estimation_desc_intel descriptor structure is used to create an accelerator to control behavior of the motion estimation engine. This motion estimation descriptor structure is defined as: typedef struct _cl_motion_estimation_desc_intel { cl_uint mb_block_type; cl_uint subpixel_mode; cl_uint sad_adjust_mode; cl_uint search_path_type; } cl_motion_estimation_desc_intel; describes the size of the blocks (partitioning mode) considered by the motion estimator. A frame is first divided into macroblocks of size 16x16, and then may be sub-divided into blocks of size 8x8 or 4x4, or retained as 16x16 blocks. This field influences the size of the output image(s) and buffer(s) of the motion estimation algorithms defined in this extension. Valid values are described in the table below. ---------------------------------------------------------------------------- Type Description ---- ----------- CL_ME_MB_TYPE_16x16_INTEL Each block is of size 16x16 pixels CL_ME_MB_TYPE_8x8_INTEL Each block is of size 8x8 pixels CL_ME_MB_TYPE_4x4_INTEL Each block is of size 4x4 pixels ---------------------------------------------------------------------------- defines the search precision (and hence, the precision of the returned motion vectors). Valid values are described in the table below. ---------------------------------------------------------------------------- Type Description ---- ----------- CL_ME_SUBPIXEL_MODE_INTEGER_INTEL Integer pixel mode searching CL_ME_SUBPIXEL_MODE_HPEL_INTEL Half-pixel mode searching CL_ME_SUBPIXEL_MODE_QPEL_INTEL Quarter-pixel mode searching ---------------------------------------------------------------------------- specifies distortion measure adjustment used for the motion search SAD (Sum of Absolute Difference) comparison. Valid values are described in the table below. ---------------------------------------------------------------------------- Type Description ---- ----------- CL_ME_SAD_ADJUST_MODE_NONE_INTEL Non-adjusted SAD CL_ME_SAD_ADJUST_MODE_HAAR_INTEL Haar transformed SATD (frequency space) ---------------------------------------------------------------------------- specifies the search path and search radius when matching blocks in the neighborhood of each pixel block. Currently, all search algorithms match the source block with pixel blocks in the reference area exhaustively within a [Rx, Ry] radius from the center of the co-located source macroblock in the reference frame that the block belongs to, optionally offset by a predicted motion vector. ---------------------------------------------------------------------------- Flag Description ---- ----------- CL_ME_SEARCH_PATH_RADIUS_2_2_INTEL Exhaustive search within [+/-2, +/-2] radius CL_ME_SEARCH_PATH_RADIUS_4_4_INTEL Exhaustive search within [+/-4, +/-4] radius CL_ME_SEARCH_PATH_RADIUS_16_12_INTEL Exhaustive search within [+/-16, +/-12] radius ----------------------------------------------------------------------------" Add a new sub-section "Using Motion Estimation Accelerators" to Section 5.XX.4 - "Using Accelerators" in the cl_intel_accelerator specification: "Motion estimation accelerators control behavior of the motion estimation engine and are used by motion estimation kernels. The cl_intel_motion_estimation extension adds a built-in kernel for basic motion estimation. This section describes the motion estimation built-in kernel and how to set the motion estimation accelerator as an argument to the motion estimation built-in kernel. The motion estimation built-in kernel uses the built-in kernel functionality added in OpenCL 1.2. After creating the motion estimation built-in kernel, the motion estimation accelerator can be set as a kernel argument to control the motion estimation engine. There is currently no OpenCL C type for motion estimation accelerators, and motion estimation accelerators can only be used with the motion estimation built-in kernel. The kernel __kernel void block_motion_estimate_intel( accelerator_intel_t accelerator, __read_only image2d_t src_image, __read_only image2d_t ref_image, __global short2 * prediction_motion_vector_buffer, __global short2 * motion_vector_buffer, __global ushort * residuals); computes motion vectors by comparing a 2d source image with a 2d reference image, producing an output field of motion vectors. The algorithm searches for the best match of each pixel block in the source image by searching a region of the reference image, centered on the coordinates of that pixel block in the source image. The starting search coordinate may optionally be offset by a prediction motion vector. Additionally, the kernel may optionally return a vector field of per-pixel-block best-match distortion (SAD) values. When enqueuing this kernel, the and determine the region of interest (ROI) of the input frame in pixels. The determines the offset to the start of the region of interest. The determines the width and height of the region of interest. The amount of data written to the is dependent on the size of the region of interest and the partitioning mode specified by the accelerator. must be a valid accelerator object created by clCreateAcceleratorINTEL. For the block_motion_estimate_intel kernel, the type of the accelerator must be CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL. is the input source image, typically representing 8-bit luminance information. The and of are restricted as follows: ---------------------------------------------------------------------------- Channel Order Src Channel Data Type ------------- --------------------- CL_R CL_UNORM_INT8 ---------------------------------------------------------------------------- is the input reference image, typically representing 8-bit luminance information. The and of must match src_image, as follows: ---------------------------------------------------------------------------- Channel Order Ref Channel Data Type ------------- --------------------- CL_R CL_UNORM_INT8 ---------------------------------------------------------------------------- is the output motion vector buffer, representing a vector field of pixel block motion vectors, stored linearly in row-major order. The elements of this buffer describe a motion vector for the corresponding pixel block, with its x/y components packed as two 16-bit integer values. Each component is encoded as a S13.2 fixed point value (two's complement). The precision is specified by the of the . The size of the returned data is determined by the of the . The needs to be sized that it fits the results of all pixel blocks of the source image, i.e. the number of full or partial 16x16 source pixel macroblocks times the number of returned motion vectors per source block (1, 4, or 16). is an optional input buffer representing a vector field of prediction motion vectors, stored linearly in row-major order. When provided, this buffer must contain one prediction motion vector for each full or partial 16x16 pixel macroblock in the region of interest. The prediction motion vector is used to offset the default search center for each 16x16 pixel block search window. The default search center is the center of the co-located source macroblock in the reference frame. Note that in feedback algorithms, where the output of one frame is used as the input for the next frame, the output buffer may need to be downsampled. The argument is optional and may be set to NULL, in which case the prediction motion vectors are implied to be (0,0). is an optional output buffer representing a field of residuals or "distortion values", one for each returned motion vector, stored linearly in row-major order. These "residuals of the compensated image" represent the sum-of-absolute-differences (SAD) between the source frame pixel block and the best-match reference frame pixel block that produced the returned motion vector. The of the determines whether plain SAD or SATD (Haar-adjusted) values are returned. The argument is optional and may be set to NULL, in which case residual information is not returned. The motion estimation built-in kernels are queued for execution by the host application using clEnqueueNDRangeKernel(). This function will return the usual error codes, augmented with the following specific error codes for this kernel: * CL_INVALID_WORK_DIMENSION if is not 2. This built-in kernel requires a 2D ND-range. * CL_INVALID_WORK_GROUP_SIZE if is not NULL. This built-in kernel requires the work-group size to be set by the runtime. * CL_INVALID_IMAGE_SIZE if the region of interest defined by the and exceed the dimensions of the or . * CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if the or kernel arguments do not meet the image format restrictions described above." Revision History Version 1 - Initial Revision Version 2 - Formatting and minor bug fixes.