integerFunctions(3)

Description

The following table describes the built-in integer functions that take scalar or vector arguments. The vector versions of the integer functions operate component-wise. The description is per-component.

We use the generic type name gentype to indicate that the function can take char, charn, uchar, ucharn, short, shortn, ushort, ushortn, int, intn, uint, uintn, long ^[1], longn, ulong, or ulongn as the type for the arguments. We use the generic type name ugentype to refer to unsigned versions of gentype. For example, if gentype is char4, ugentype is uchar4. We also use the generic type name sgentype to indicate that the function can take a scalar data type, i.e. char, uchar, short, ushort, int, uint, long, or ulong, as the type for the arguments. For built-in integer functions that take gentype and sgentype arguments, the gentype argument must be a vector or scalar version of the sgentype argument. For example, if sgentype is uchar, gentype must be uchar or ucharn. For vector versions, sgentype is implicitly widened to gentype as described for arithmetic operators. n is 2, 3, 4, 8, or 16.

For any specific use of a function with gentype* arguments the actual type has to be the same for all arguments and the return type, unless they are explicitly specified as an actual type.

Table 1. Built-in Scalar and Vector Integer Argument Functions
Function	Description
ugentype abs(gentype x)	Returns \|x\|.
ugentype abs_diff(gentype x, gentype y)	Returns \|x - y\| without modulo overflow.
gentype add_sat(gentype x, gentype y)	Returns x + y and saturates the result.
gentype hadd(gentype x, gentype y)	Returns (x + y) >> 1. The intermediate sum does not modulo overflow.
gentype rhadd(gentype x, gentype y)	Returns (x + y + 1) >> 1. The intermediate sum does not modulo overflow. ^[2]
gentype clamp(gentype x, gentype minval, gentype maxval) gentype clamp(gentype x, sgentype minval, sgentype maxval)	Returns min(max(x, minval), maxval). Results are undefined if minval > maxval. Requires support for OpenCL C 1.1 or newer.
gentype clz(gentype x)	Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, returns the size in bits of the type of x or component type of x, if x is a vector.
gentype ctz(gentype x)	Returns the count of trailing 0-bits in x. If x is 0, returns the size in bits of the type of x or component type of x, if x is a vector. Requires support for OpenCL 2.0 or newer.
uint dot(uchar4 a, uchar4 b) int dot(char4 a, char4 b) int dot(uchar4 a, char4 b) int dot(char4 a, uchar4 b)	`dot` returns the dot product of the two input vectors `a` and `b`. The components of `a` and `b` are sign- or zero-extended to the width of the destination type and the vectors with extended components are multiplied component-wise. All the components of the resulting vectors are added together to form the final result. Requires that the `__opencl_c_integer_dot_product_input_4x8bit` feature macro is defined,
uint dot_acc_sat(uchar4 a, uchar4 b, uint acc) int dot_acc_sat(char4 a, char4 b, int acc) int dot_acc_sat(uchar4 a, char4 b, int acc) int dot_acc_sat(char4 a, uchar4 b, int acc)	`dot_acc_sat` returns the saturating addition of the dot product of the two input vectors `a` and `b` and the accumulator `acc`: product = dot(a,b); result = add_sat(product, acc); Requires that the `__opencl_c_integer_dot_product_input_4x8bit` feature macro is defined,
uint dot_4x8packed_uu_uint(uint a, uint b) int dot_4x8packed_ss_int(uint a, uint b) int dot_4x8packed_us_int(uint a, uint b) int dot_4x8packed_su_int(uint a, uint b)	Returns dot for 4x8 bit input vectors packed into a 32-bit word. Requires that the `__opencl_c_integer_dot_product_input_4x8bit_packed` feature macro is defined,
uint dot_acc_sat_4x8packed_uu_uint(uint a, uint b, uint acc) int dot_acc_sat_4x8packed_ss_int(uint a, uint b, int acc) int dot_acc_sat_4x8packed_us_int(uint a, uint b, int acc) int dot_acc_sat_4x8packed_su_int(uint a, uint b, int acc)	Returns dot_acc_set for 4x8 bit input vectors packed into a 32-bit word. Requires that the `__opencl_c_integer_dot_product_input_4x8bit_packed` feature macro is defined,
gentype mad_hi(gentype a, gentype b, gentype c)	Returns mul_hi(a, b) + c.
gentype mad_sat(gentype a, gentype b, gentype c)	Returns a * b + c and saturates the result.
gentype max(gentype x, gentype y) For OpenCL C 1.1 or newer: gentype max(gentype x, sgentype y)	Returns y if x < y, otherwise it returns x.
gentype min(gentype x, gentype y) For OpenCL C 1.1 or newer: gentype min(gentype x, sgentype y)	Returns y if y < x, otherwise it returns x.
gentype mul_hi(gentype x, gentype y)	Computes x * y and returns the high half of the product of x and y.
gentype rotate(gentype v, gentype i)	For each element in v, the bits are shifted left by the number of bits given by the corresponding element in i (subject to the usual shift modulo rules). Bits shifted off the left side of the element are shifted back in from the right.
gentype sub_sat(gentype x, gentype y)	Returns x - y and saturates the result.
short upsample(char hi, uchar lo) ushort upsample(uchar hi, uchar lo) shortn upsample(charn hi, ucharn lo) ushortn upsample(ucharn hi, ucharn lo)	result[i] = ((short)hi[i] << 8) \| lo[i] result[i] = ((ushort)hi[i] << 8) \| lo[i]
int upsample(short hi, ushort lo) uint upsample(ushort hi, ushort lo) intn upsample(shortn hi, ushortn lo) uintn upsample(ushortn hi, ushortn lo)	result[i] = ((int)hi[i] << 16) \| lo[i] result[i] = ((uint)hi[i] << 16) \| lo[i]
long upsample(int hi, uint lo) ulong upsample(uint hi, uint lo) longn upsample(intn hi, uintn lo) ulongn upsample(uintn hi, uintn lo)	result[i] = ((long)hi[i] << 32) \| lo[i] result[i] = ((ulong)hi[i] << 32) \| lo[i]
gentype popcount(gentype x)	Returns the number of non-zero bits in x. Requires support for OpenCL C 1.2 or newer.

The following table describes fast integer functions that can be used for optimizing performance of kernels. We use the generic type name gentype to indicate that the function can take int, int2, int3, int4, int8, int16, uint, uint2, uint3, uint4, uint8 or uint16 as the type for the arguments.

Table 2. Built-in 24-bit Integer Functions
Function	Description
gentype mad24(gentype x, gentype y, gentype z)	Multipy two 24-bit integer values x and y and add the 32-bit integer result to the 32-bit integer z. Refer to definition of mul24 to see how the 24-bit integer multiplication is performed.
gentype mul24(gentype x, gentype y)	Multiply two 24-bit integer values x and y. x and y are 32-bit integers but only the low 24-bits are used to perform the multiplication. mul24 should only be used when values in x and y are in the range [-2²³, 2²³-1] if x and y are signed integers and in the range [0, 2²⁴-1] if x and y are unsigned integers. If x and y are not in this range, the multiplication result is implementation-defined.

Document Notes

For more information, see the OpenCL C Specification

This page is extracted from the OpenCL C Specification. Fixes and changes should be made to the Specification, not directly.

Copyright

SPDX-License-Identifier: CC-BY-4.0

1. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the __opencl_c_int64 feature macro.

2. Frequently vector operations need n + 1 bits temporarily to calculate a result. The rhadd instruction gives you an extra bit without needing to upsample and downsample. This can be a profound performance win.

integerFunctions(3) Manual Page

Name

Description

See Also

Document Notes

Copyright