vectorDataLoadandStoreFunctions(3)

Description

The Built-in Vector Data Load and Store Functions table describes the list of supported functions that allow you to read and write vector types from a pointer to memory.

The generic type name gentype indicates that the function can take any of

char, uchar, short, ushort, int, uint, long ^[1] or ulong
float or double ^[2]
half ^[3]

All functions taking or returning half types are supported only when the cl_khr_fp16 extension macro is supported.

as the type for the arguments.

The generic type name gentypen indicates an n-element vector of gentype elements.

The generic type name halfn indicates an n-element vector of half elements.

The suffix n is also used in the function names (i.e. vloadn, vstoren etc.), where n = 2, 3 ^[4], 4, 8 or 16.

Table 1. Built-in Vector Data Load and Store Functions
Function	Description
gentypen vloadn(size_t offset, const __global gentype p) gentypen vloadn(size_t offset, const __local gentype p) gentypen vloadn(size_t offset, const __constant gentype p) gentypen vloadn(size_t offset, const __private gentype p) For OpenCL C 2.0, or OpenCL C 3.0 or newer with the `__opencl_c_generic_address_space` feature: gentypen vloadn(size_t offset, const gentype *p)	Return `sizeof(gentypen)` bytes of data, where the first `(n * sizeof(gentype))` bytes are read from the address computed as `(p + (offset * n))`. The computed address must be 8-bit aligned if `gentype` is `char` or `uchar`; 16-bit aligned if `gentype` is `half`, `short` or `ushort`; 32-bit aligned if `gentype` is `int`, `uint`, or `float`; and 64-bit aligned if `gentype` is `long` or `ulong`.
void vstoren(gentypen data, size_t offset, __global gentype p) void vstoren(gentypen data, size_t offset, __local gentype p) void vstoren(gentypen data, size_t offset, __private gentype p) For OpenCL C 2.0, or OpenCL C 3.0 or newer with the `__opencl_c_generic_address_space` feature: void vstoren(gentypen data, size_t offset, gentype p)	Write `n * sizeof(gentype)` bytes given by data to the address computed as `(p + (offset * n))`. The computed address must be 8-bit aligned if `gentype` is `char` or `uchar`; 16-bit aligned if `gentype` is `half`, `short` or `ushort`; 32-bit aligned if `gentype` is `int`, `uint`, or `float`; and 64-bit aligned if `gentype` is `long` or `ulong`.
float vload_half(size_t offset, const __global half p) float vload_half(size_t offset, const __local half p) float vload_half(size_t offset, const __constant half p) float vload_half(size_t offset, const __private half p) For OpenCL C 2.0, or OpenCL C 3.0 or newer with the `__opencl_c_generic_address_space` feature: float vload_half(size_t offset, const half *p)	Read `sizeof(half)` bytes of data from the address computed as `(p + offset)`. The data read is interpreted as a `half` value. The `half` value is converted to a `float` value and the `float` value is returned. The computed read address must be 16-bit aligned.
floatn vload_halfn(size_t offset, const __global half p) floatn vload_halfn(size_t offset, const __local half p) floatn vload_halfn(size_t offset, const __constant half p) floatn vload_halfn(size_t offset, const __private half p) For OpenCL C 2.0, or OpenCL C 3.0 or newer with the `__opencl_c_generic_address_space` feature: floatn vload_halfn(size_t offset, const half *p)	Read `(n * sizeof(half))` bytes of data from the address computed as `(p + (offset * n))`. The data read is interpreted as a `halfn` value. The `halfn` value read is converted to a `floatn` value and the `floatn` value is returned. The computed read address must be 16-bit aligned.
void vstore_half(float data, size_t offset, __global half p) void vstore_half_rte(float data, size_t offset, __global half p) void vstore_half_rtz(float data, size_t offset, __global half p) void vstore_half_rtp(float data, size_t offset, __global half p) void vstore_half_rtn(float data, size_t offset, __global half p) void vstore_half(float data, size_t offset, __local half p) void vstore_half_rte(float data, size_t offset, __local half p) void vstore_half_rtz(float data, size_t offset, __local half p) void vstore_half_rtp(float data, size_t offset, __local half p) void vstore_half_rtn(float data, size_t offset, __local half p) void vstore_half(float data, size_t offset, __private half p) void vstore_half_rte(float data, size_t offset, __private half p) void vstore_half_rtz(float data, size_t offset, __private half p) void vstore_half_rtp(float data, size_t offset, __private half p) void vstore_half_rtn(float data, size_t offset, __private half p) For OpenCL C 2.0, or OpenCL C 3.0 or newer with the `__opencl_c_generic_address_space` feature: void vstore_half(float data, size_t offset, half p) void vstore_half_rte(float data, size_t offset, half p) void vstore_half_rtz(float data, size_t offset, half p) void vstore_half_rtp(float data, size_t offset, half p) void vstore_half_rtn(float data, size_t offset, half p)	The `float` value given by data is first converted to a `half` value using the appropriate rounding mode. The `half` value is then written to the address computed as `(p + offset)`. The computed address must be 16-bit aligned. vstore_half uses the default rounding mode. The default rounding mode is round to nearest even.
void vstore_halfn(floatn data, size_t offset, __global half p) void vstore_halfn_rte(floatn data, size_t offset, __global half p) void vstore_halfn_rtz(floatn data, size_t offset, __global half p) void vstore_halfn_rtp(floatn data, size_t offset, __global half p) void vstore_halfn_rtn(floatn data, size_t offset, __global half p) void vstore_halfn(floatn data, size_t offset, __local half p) void vstore_halfn_rte(floatn data, size_t offset, __local half p) void vstore_halfn_rtz(floatn data, size_t offset, __local half p) void vstore_halfn_rtp(floatn data, size_t offset, __local half p) void vstore_halfn_rtn(floatn data, size_t offset, __local half p) void vstore_halfn(floatn data, size_t offset, __private half p) void vstore_halfn_rte(floatn data, size_t offset, __private half p) void vstore_halfn_rtz(floatn data, size_t offset, __private half p) void vstore_halfn_rtp(floatn data, size_t offset, __private half p) void vstore_halfn_rtn(floatn data, size_t offset, __private half p) For OpenCL C 2.0, or OpenCL C 3.0 or newer with the `__opencl_c_generic_address_space` feature: void vstore_halfn(floatn data, size_t offset, half p) void vstore_halfn_rte(floatn data, size_t offset, half p) void vstore_halfn_rtz(floatn data, size_t offset, half p) void vstore_halfn_rtp(floatn data, size_t offset, half p) void vstore_halfn_rtn(floatn data, size_t offset, half p)	The `floatn` value given by data is converted to a `halfn` value using the appropriate rounding mode. `n * sizeof(half)` bytes from the `halfn` value are then written to the address computed as `(p + (offset * n))`. The computed address must be 16-bit aligned. vstore_halfn uses the default rounding mode. The default rounding mode is round to nearest even.
void vstore_half(double data, size_t offset, __global half p) void vstore_half_rte(double data, size_t offset, __global half p) void vstore_half_rtz(double data, size_t offset, __global half p) void vstore_half_rtp(double data, size_t offset, __global half p) void vstore_half_rtn(double data, size_t offset, __global half p) void vstore_half(double data, size_t offset, __local half p) void vstore_half_rte(double data, size_t offset, __local half p) void vstore_half_rtz(double data, size_t offset, __local half p) void vstore_half_rtp(double data, size_t offset, __local half p) void vstore_half_rtn(double data, size_t offset, __local half p) void vstore_half(double data, size_t offset, __private half p) void vstore_half_rte(double data, size_t offset, __private half p) void vstore_half_rtz(double data, size_t offset, __private half p) void vstore_half_rtp(double data, size_t offset, __private half p) void vstore_half_rtn(double data, size_t offset, __private half p) For OpenCL C 2.0, or OpenCL C 3.0 or newer with the `__opencl_c_generic_address_space` feature: void vstore_half(double data, size_t offset, half p) void vstore_half_rte(double data, size_t offset, half p) void vstore_half_rtz(double data, size_t offset, half p) void vstore_half_rtp(double data, size_t offset, half p) void vstore_half_rtn(double data, size_t offset, half p)	The `double` value given by data is first converted to a `half` value using the appropriate rounding mode. The `half` value is then written to the address computed as `(p + offset)`. The computed address must be 16-bit aligned. vstore_half uses the default rounding mode. The default rounding mode is round to nearest even.
void vstore_halfn(doublen data, size_t offset, __global half p) void vstore_halfn_rte(doublen data, size_t offset, __global half p) void vstore_halfn_rtz(doublen data, size_t offset, __global half p) void vstore_halfn_rtp(doublen data, size_t offset, __global half p) void vstore_halfn_rtn(doublen data, size_t offset, __global half p) void vstore_halfn(doublen data, size_t offset, __local half p) void vstore_halfn_rte(doublen data, size_t offset, __local half p) void vstore_halfn_rtz(doublen data, size_t offset, __local half p) void vstore_halfn_rtp(doublen data, size_t offset, __local half p) void vstore_halfn_rtn(doublen data, size_t offset, __local half p) void vstore_halfn(doublen data, size_t offset, __private half p) void vstore_halfn_rte(doublen data, size_t offset, __private half p) void vstore_halfn_rtz(doublen data, size_t offset, __private half p) void vstore_halfn_rtp(doublen data, size_t offset, __private half p) void vstore_halfn_rtn(doublen data, size_t offset, __private half p) For OpenCL C 2.0, or OpenCL C 3.0 or newer with the `__opencl_c_generic_address_space` feature: void vstore_halfn(doublen data, size_t offset, half p) void vstore_halfn_rte(doublen data, size_t offset, half p) void vstore_halfn_rtz(doublen data, size_t offset, half p) void vstore_halfn_rtp(doublen data, size_t offset, half p) void vstore_halfn_rtn(doublen data, size_t offset, half p)	The `doublen` value given by data is converted to a `halfn` value using the appropriate rounding mode. `n * sizeof(half)` bytes from the `halfn` value are then written to the address computed as `(p + (offset * n))`. The computed address must be 16-bit aligned. vstore_halfn uses the default rounding mode. The default rounding mode is round to nearest even.
floatn vloada_halfn(size_t offset, const __global half p) floatn vloada_halfn(size_t offset, const __local half p) floatn vloada_halfn(size_t offset, const __constant half p) floatn vloada_halfn(size_t offset, const __private half p) For OpenCL C 2.0, or OpenCL C 3.0 or newer with the `__opencl_c_generic_address_space` feature: floatn vloada_halfn(size_t offset, const half *p)	For n = 2, 4, 8 and 16, read `sizeof(halfn)` bytes of data from the address computed as (p + (offset * n)). The data read is interpreted as a `halfn` value. The `halfn` value read is converted to a `floatn` value and the `floatn` value is returned. The computed address must be aligned to `sizeof(halfn)` bytes. For n = 3, vloada_half3 reads a `half3` from the address computed as `(p + (offset * 4))` and returns a `float3`. The computed address must be aligned to `sizeof(half)` * 4 bytes.
void vstorea_halfn(floatn data, size_t offset, __global half p) void vstorea_halfn_rte(floatn data, size_t offset, __global half p) void vstorea_halfn_rtz(floatn data, size_t offset, __global half p) void vstorea_halfn_rtp(floatn data, size_t offset, __global half p) void vstorea_halfn_rtn(floatn data, size_t offset, __global half p) void vstorea_halfn(floatn data, size_t offset, __local half p) void vstorea_halfn_rte(floatn data, size_t offset, __local half p) void vstorea_halfn_rtz(floatn data, size_t offset, __local half p) void vstorea_halfn_rtp(floatn data, size_t offset, __local half p) void vstorea_halfn_rtn(floatn data, size_t offset, __local half p) void vstorea_halfn(floatn data, size_t offset, __private half p) void vstorea_halfn_rte(floatn data, size_t offset, __private half p) void vstorea_halfn_rtz(floatn data, size_t offset, __private half p) void vstorea_halfn_rtp(floatn data, size_t offset, __private half p) void vstorea_halfn_rtn(floatn data, size_t offset, __private half p) For OpenCL C 2.0, or OpenCL C 3.0 or newer with the `__opencl_c_generic_address_space` feature: void vstorea_halfn(floatn data, size_t offset, half p) void vstorea_halfn_rte(floatn data, size_t offset, half p) void vstorea_halfn_rtz(floatn data, size_t offset, half p) void vstorea_halfn_rtp(floatn data, size_t offset, half p) void vstorea_halfn_rtn(floatn data, size_t offset, half p)	The `floatn` value given by data is converted to a `halfn` value using the appropriate rounding mode. For n = 2, 4, 8 and 16, the `halfn` value is written to the address computed as `(p + (offset * n))`. The computed address must be aligned to `sizeof(halfn)` bytes. For n = 3, the `half3` value is written to the address computed as `(p + (offset * 4))`. The computed address must be aligned to `sizeof(half) * 4` bytes. vstorea_halfn uses the default rounding mode. The default rounding mode is round to nearest even.
void vstorea_halfn(doublen data, size_t offset, __global half p) void vstorea_halfn_rte(doublen data, size_t offset, __global half p) void vstorea_halfn_rtz(doublen data, size_t offset, __global half p) void vstorea_halfn_rtp(doublen data, size_t offset, __global half p) void vstorea_halfn_rtn(doublen data, size_t offset, __global half p) void vstorea_halfn(doublen data, size_t offset, __local half p) void vstorea_halfn_rte(doublen data, size_t offset, __local half p) void vstorea_halfn_rtz(doublen data, size_t offset, __local half p) void vstorea_halfn_rtp(doublen data, size_t offset, __local half p) void vstorea_halfn_rtn(doublen data, size_t offset, __local half p) void vstorea_halfn(doublen data, size_t offset, __private half p) void vstorea_halfn_rte(doublen data, size_t offset, __private half p) void vstorea_halfn_rtz(doublen data, size_t offset, __private half p) void vstorea_halfn_rtp(doublen data, size_t offset, __private half p) void vstorea_halfn_rtn(doublen data, size_t offset, __private half p) For OpenCL C 2.0, or OpenCL C 3.0 or newer with the `__opencl_c_generic_address_space` feature: void vstorea_halfn(doublen data, size_t offset, half p) void vstorea_halfn_rte(doublen data, size_t offset, half p) void vstorea_halfn_rtz(doublen data, size_t offset, half p) void vstorea_halfn_rtp(doublen data, size_t offset, half p) void vstorea_halfn_rtn(doublen data, size_t offset, half p)	The `doublen` value is converted to a `halfn` value using the appropriate rounding mode. For n = 2, 4, 8 or 16, the `halfn` value is written to the address computed as `(p + (offset * n))`. The computed address must be aligned to `sizeof(halfn)` bytes. For n = 3, the `half3` value is written to the address computed as `(p + (offset * 4))`. The computed address must be aligned to `sizeof(half) * 4` bytes. vstorea_halfn uses the default rounding mode. The default rounding mode is round to nearest even.

The results of vector data load and store functions are undefined if the address being read from or written to is not correctly aligned as described in Built-in Vector Data Load and Store Functions. The pointer argument p can be a pointer to global, local, or private memory for store functions described in Built-in Vector Data Load and Store Functions. The pointer argument p can be a pointer to global, local, constant, or private memory for load functions described in Built-in Vector Data Load and Store Functions.

The vector data load and store functions variants that take pointer arguments which point to the generic address space are also supported.

Document Notes

For more information, see the OpenCL C Specification

This page is extracted from the OpenCL C Specification. Fixes and changes should be made to the Specification, not directly.

Copyright

SPDX-License-Identifier: CC-BY-4.0

1. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the __opencl_c_int64 feature macro.

2. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the __opencl_c_fp64 feature macro.

3. Only if the cl_khr_fp16 extension is supported and has been enabled.

4. vload3 and vload_half3 read (x, y, z) components from address (p + (offset * 3)) into a 3-component vector. vstore3 and vstore_half3 write (x, y, z) components from a 3-component vector to address (p + (offset * 3)). In addition, vloada_half3 reads (x, y, z) components from address (p + (offset * 4)) into a 3-component vector and vstorea_half3 writes (x, y, z) components from a 3-component vector to address (p + (offset * 4)). Whether vloada_half3 and vstorea_half3 read/write padding data between the third vector element and the next alignment boundary is implementation-defined. The vloada_ and vstorea_ variants are provided to access data that is aligned to the size of the vector, and are intended to enable performance on hardware that can take advantage of the increased alignment.

vectorDataLoadandStoreFunctions(3) Manual Page

Name

Description

See Also

Document Notes

Copyright