Description

The following table describes the list of supported functions that allow you to read and write vector types from a pointer to memory. We use the generic type gentype to indicate the built-in data types char, uchar, short, ushort, int, uint, long [1], ulong, float or double [2]. We use the generic type name gentypen to represent n-element vectors of gentype elements. We use the type name halfn to represent n-element vectors of half elements. The suffix n is also used in the function names (i.e. vloadn, vstoren etc.), where n = 2, 3 [3], 4, 8 or 16.

Table 1. Built-in Vector Data Load and Store Functions

Function

Description

gentypen vloadn(size_t offset, const __global gentype *p)
gentypen vloadn(size_t offset, const __local gentype *p)
gentypen vloadn(size_t offset, const __constant gentype *p)
gentypen vloadn(size_t offset, const __private gentype *p)

For OpenCL C 2.0, or OpenCL C 3.0 or newer with the __opencl_c_generic_address_space feature:

gentypen vloadn(size_t offset, const gentype *p)

Return sizeof(gentypen) bytes of data, where the first (n * sizeof(gentype)) bytes are read from the address computed as (p + (offset * n)). The computed address must be 8-bit aligned if gentype is char or uchar; 16-bit aligned if gentype is short or ushort; 32-bit aligned if gentype is int, uint, or float; and 64-bit aligned if gentype is long or ulong.

void vstoren(gentypen data, size_t offset, __global gentype *p)
void vstoren(gentypen data, size_t offset, __local gentype *p)
void vstoren(gentypen data, size_t offset, __private gentype *p)

For OpenCL C 2.0, or OpenCL C 3.0 or newer with the __opencl_c_generic_address_space feature:

void vstoren(gentypen data, size_t offset, gentype *p)

Write n * sizeof(gentype) bytes given by data to the address computed as (p + (offset * n)). The computed address must be 8-bit aligned if gentype is char or uchar; 16-bit aligned if gentype is short or ushort; 32-bit aligned if gentype is int, uint, or float; and 64-bit aligned if gentype is long or ulong.

float vload_half(size_t offset, const __global half *p)
float vload_half(size_t offset, const __local half *p)
float vload_half(size_t offset, const __constant half *p)
float vload_half(size_t offset, const __private half *p)

For OpenCL C 2.0, or OpenCL C 3.0 or newer with the __opencl_c_generic_address_space feature:

float vload_half(size_t offset, const half *p)

Read sizeof(half) bytes of data from the address computed as (p + offset). The data read is interpreted as a half value. The half value is converted to a float value and the float value is returned. The computed read address must be 16-bit aligned.

floatn vload_halfn(size_t offset, const __global half *p)
floatn vload_halfn(size_t offset, const __local half *p)
floatn vload_halfn(size_t offset, const __constant half *p)
floatn vload_halfn(size_t offset, const __private half *p)

For OpenCL C 2.0, or OpenCL C 3.0 or newer with the __opencl_c_generic_address_space feature:

floatn vload_halfn(size_t offset, const half *p)

Read (n * sizeof(half)) bytes of data from the address computed as (p + (offset * n)). The data read is interpreted as a halfn value. The halfn value read is converted to a floatn value and the floatn value is returned. The computed read address must be 16-bit aligned.

void vstore_half(float data, size_t offset, __global half *p)
void vstore_half_rte(float data, size_t offset, __global half *p)
void vstore_half_rtz(float data, size_t offset, __global half *p)
void vstore_half_rtp(float data, size_t offset, __global half *p)
void vstore_half_rtn(float data, size_t offset, __global half *p)

void vstore_half(float data, size_t offset, __local half *p)
void vstore_half_rte(float data, size_t offset, __local half *p)
void vstore_half_rtz(float data, size_t offset, __local half *p)
void vstore_half_rtp(float data, size_t offset, __local half *p)
void vstore_half_rtn(float data, size_t offset, __local half *p)

void vstore_half(float data, size_t offset, __private half *p)
void vstore_half_rte(float data, size_t offset, __private half *p)
void vstore_half_rtz(float data, size_t offset, __private half *p)
void vstore_half_rtp(float data, size_t offset, __private half *p)
void vstore_half_rtn(float data, size_t offset, __private half *p)

For OpenCL C 2.0, or OpenCL C 3.0 or newer with the __opencl_c_generic_address_space feature:

void vstore_half(float data, size_t offset, half *p)
void vstore_half_rte(float data, size_t offset, half *p)
void vstore_half_rtz(float data, size_t offset, half *p)
void vstore_half_rtp(float data, size_t offset, half *p)
void vstore_half_rtn(float data, size_t offset, half *p)

The float value given by data is first converted to a half value using the appropriate rounding mode. The half value is then written to the address computed as (p + offset). The computed address must be 16-bit aligned.

vstore_half uses the default rounding mode. The default rounding mode is round to nearest even.

void vstore_halfn(floatn data, size_t offset, __global half *p)
void vstore_halfn_rte(floatn data, size_t offset, __global half *p)
void vstore_halfn_rtz(floatn data, size_t offset, __global half *p)
void vstore_halfn_rtp(floatn data, size_t offset, __global half *p)
void vstore_halfn_rtn(floatn data, size_t offset, __global half *p)

void vstore_halfn(floatn data, size_t offset, __local half *p)
void vstore_halfn_rte(floatn data, size_t offset, __local half *p)
void vstore_halfn_rtz(floatn data, size_t offset, __local half *p)
void vstore_halfn_rtp(floatn data, size_t offset, __local half *p)
void vstore_halfn_rtn(floatn data, size_t offset, __local half *p)

void vstore_halfn(floatn data, size_t offset, __private half *p)
void vstore_halfn_rte(floatn data, size_t offset, __private half *p)
void vstore_halfn_rtz(floatn data, size_t offset, __private half *p)
void vstore_halfn_rtp(floatn data, size_t offset, __private half *p)
void vstore_halfn_rtn(floatn data, size_t offset, __private half *p)

For OpenCL C 2.0, or OpenCL C 3.0 or newer with the __opencl_c_generic_address_space feature:

void vstore_halfn(floatn data, size_t offset, half *p)
void vstore_halfn_rte(floatn data, size_t offset, half *p)
void vstore_halfn_rtz(floatn data, size_t offset, half *p)
void vstore_halfn_rtp(floatn data, size_t offset, half *p)
void vstore_halfn_rtn(floatn data, size_t offset, half *p)

The floatn value given by data is converted to a halfn value using the appropriate rounding mode. n * sizeof(half) bytes from the halfn value are then written to the address computed as (p + (offset * n)). The computed address must be 16-bit aligned.

vstore_halfn uses the default rounding mode. The default rounding mode is round to nearest even.

void vstore_half(double data, size_t offset, __global half *p)
void vstore_half_rte(double data, size_t offset, __global half *p)
void vstore_half_rtz(double data, size_t offset, __global half *p)
void vstore_half_rtp(double data, size_t offset, __global half *p)
void vstore_half_rtn(double data, size_t offset, __global half *p)

void vstore_half(double data, size_t offset, __local half *p)
void vstore_half_rte(double data, size_t offset, __local half *p)
void vstore_half_rtz(double data, size_t offset, __local half *p)
void vstore_half_rtp(double data, size_t offset, __local half *p)
void vstore_half_rtn(double data, size_t offset, __local half *p)

void vstore_half(double data, size_t offset, __private half *p)
void vstore_half_rte(double data, size_t offset, __private half *p)
void vstore_half_rtz(double data, size_t offset, __private half *p)
void vstore_half_rtp(double data, size_t offset, __private half *p)
void vstore_half_rtn(double data, size_t offset, __private half *p)

For OpenCL C 2.0, or OpenCL C 3.0 or newer with the __opencl_c_generic_address_space feature:

void vstore_half(double data, size_t offset, half *p)
void vstore_half_rte(double data, size_t offset, half *p)
void vstore_half_rtz(double data, size_t offset, half *p)
void vstore_half_rtp(double data, size_t offset, half *p)
void vstore_half_rtn(double data, size_t offset, half *p)

The double value given by data is first converted to a half value using the appropriate rounding mode. The half value is then written to the address computed as (p + offset). The computed address must be 16-bit aligned.

vstore_half uses the default rounding mode. The default rounding mode is round to nearest even.

void vstore_halfn(doublen data, size_t offset, __global half *p)
void vstore_halfn_rte(doublen data, size_t offset, __global half *p)
void vstore_halfn_rtz(doublen data, size_t offset, __global half *p)
void vstore_halfn_rtp(doublen data, size_t offset, __global half *p)
void vstore_halfn_rtn(doublen data, size_t offset, __global half *p)

void vstore_halfn(doublen data, size_t offset, __local half *p)
void vstore_halfn_rte(doublen data, size_t offset, __local half *p)
void vstore_halfn_rtz(doublen data, size_t offset, __local half *p)
void vstore_halfn_rtp(doublen data, size_t offset, __local half *p)
void vstore_halfn_rtn(doublen data, size_t offset, __local half *p)

void vstore_halfn(doublen data, size_t offset, __private half *p)
void vstore_halfn_rte(doublen data, size_t offset, __private half *p)
void vstore_halfn_rtz(doublen data, size_t offset, __private half *p)
void vstore_halfn_rtp(doublen data, size_t offset, __private half *p)
void vstore_halfn_rtn(doublen data, size_t offset, __private half *p)

For OpenCL C 2.0, or OpenCL C 3.0 or newer with the __opencl_c_generic_address_space feature:

void vstore_halfn(doublen data, size_t offset, half *p)
void vstore_halfn_rte(doublen data, size_t offset, half *p)
void vstore_halfn_rtz(doublen data, size_t offset, half *p)
void vstore_halfn_rtp(doublen data, size_t offset, half *p)
void vstore_halfn_rtn(doublen data, size_t offset, half *p)

The doublen value given by data is converted to a halfn value using the appropriate rounding mode. n * sizeof(half) bytes from the halfn value are then written to the address computed as (p + (offset * n)). The computed address must be 16-bit aligned.

vstore_halfn uses the default rounding mode. The default rounding mode is round to nearest even.

floatn vloada_halfn(size_t offset, const __global half *p)
floatn vloada_halfn(size_t offset, const __local half *p)
floatn vloada_halfn(size_t offset, const __constant half *p)
floatn vloada_halfn(size_t offset, const __private half *p)

For OpenCL C 2.0, or OpenCL C 3.0 or newer with the __opencl_c_generic_address_space feature:

floatn vloada_halfn(size_t offset, const half *p)

For n = 2, 4, 8 and 16, read sizeof(halfn) bytes of data from the address computed as (p + (offset * n)). The data read is interpreted as a halfn value. The halfn value read is converted to a floatn value and the floatn value is returned. The computed address must be aligned to sizeof(halfn) bytes.

For n = 3, vloada_half3 reads a half3 from the address computed as (p + (offset * 4)) and returns a float3. The computed address must be aligned to sizeof(half) * 4 bytes.

void vstorea_halfn(floatn data, size_t offset, __global half *p)
void vstorea_halfn_rte(floatn data, size_t offset, __global half *p)
void vstorea_halfn_rtz(floatn data, size_t offset, __global half *p)
void vstorea_halfn_rtp(floatn data, size_t offset, __global half *p)
void vstorea_halfn_rtn(floatn data, size_t offset, __global half *p)

void vstorea_halfn(floatn data, size_t offset, __local half *p)
void vstorea_halfn_rte(floatn data, size_t offset, __local half *p)
void vstorea_halfn_rtz(floatn data, size_t offset, __local half *p)
void vstorea_halfn_rtp(floatn data, size_t offset, __local half *p)
void vstorea_halfn_rtn(floatn data, size_t offset, __local half *p)

void vstorea_halfn(floatn data, size_t offset, __private half *p)
void vstorea_halfn_rte(floatn data, size_t offset, __private half *p)
void vstorea_halfn_rtz(floatn data, size_t offset, __private half *p)
void vstorea_halfn_rtp(floatn data, size_t offset, __private half *p)
void vstorea_halfn_rtn(floatn data, size_t offset, __private half *p)

For OpenCL C 2.0, or OpenCL C 3.0 or newer with the __opencl_c_generic_address_space feature:

void vstorea_halfn(floatn data, size_t offset, half *p)
void vstorea_halfn_rte(floatn data, size_t offset, half *p)
void vstorea_halfn_rtz(floatn data, size_t offset, half *p)
void vstorea_halfn_rtp(floatn data, size_t offset, half *p)
void vstorea_halfn_rtn(floatn data, size_t offset, half *p)

The floatn value given by data is converted to a halfn value using the appropriate rounding mode.

For n = 2, 4, 8 and 16, the halfn value is written to the address computed as (p + (offset * n)). The computed address must be aligned to sizeof(halfn) bytes.

For n = 3, the half3 value is written to the address computed as (p + (offset * 4)). The computed address must be aligned to sizeof(half) * 4 bytes.

vstorea_halfn uses the default rounding mode. The default rounding mode is round to nearest even.

void vstorea_halfn(doublen data, size_t offset, __global half *p)
void vstorea_halfn_rte(doublen data, size_t offset, __global half *p)
void vstorea_halfn_rtz(doublen data, size_t offset, __global half *p)
void vstorea_halfn_rtp(doublen data, size_t offset, __global half *p)
void vstorea_halfn_rtn(doublen data, size_t offset, __global half *p)

void vstorea_halfn(doublen data, size_t offset, __local half *p)
void vstorea_halfn_rte(doublen data, size_t offset, __local half *p)
void vstorea_halfn_rtz(doublen data, size_t offset, __local half *p)
void vstorea_halfn_rtp(doublen data, size_t offset, __local half *p)
void vstorea_halfn_rtn(doublen data, size_t offset, __local half *p)

void vstorea_halfn(doublen data, size_t offset, __private half *p)
void vstorea_halfn_rte(doublen data, size_t offset, __private half *p)
void vstorea_halfn_rtz(doublen data, size_t offset, __private half *p)
void vstorea_halfn_rtp(doublen data, size_t offset, __private half *p)
void vstorea_halfn_rtn(doublen data, size_t offset, __private half *p)

For OpenCL C 2.0, or OpenCL C 3.0 or newer with the __opencl_c_generic_address_space feature:

void vstorea_halfn(doublen data, size_t offset, half *p)
void vstorea_halfn_rte(doublen data, size_t offset, half *p)
void vstorea_halfn_rtz(doublen data, size_t offset, half *p)
void vstorea_halfn_rtp(doublen data, size_t offset, half *p)
void vstorea_halfn_rtn(doublen data, size_t offset, half *p)

The doublen value is converted to a halfn value using the appropriate rounding mode.

For n = 2, 4, 8 or 16, the halfn value is written to the address computed as (p + (offset * n)). The computed address must be aligned to sizeof(halfn) bytes.

For n = 3, the half3 value is written to the address computed as (p + (offset * 4)). The computed address must be aligned to sizeof(half) * 4 bytes.

vstorea_halfn uses the default rounding mode. The default rounding mode is round to nearest even.

The results of vector data load and store functions are undefined if the address being read from or written to is not correctly aligned as described in Built-in Vector Data Load and Store Functions. The pointer argument p can be a pointer to global, local, or private memory for store functions described in Built-in Vector Data Load and Store Functions. The pointer argument p can be a pointer to global, local, constant, or private memory for load functions described in Built-in Vector Data Load and Store Functions.

The vector data load and store functions variants that take pointer arguments which point to the generic address space are also supported.

See Also

No cross-references are available

Document Notes

For more information, see the OpenCL C Specification

This page is extracted from the OpenCL C Specification. Fixes and changes should be made to the Specification, not directly.

Copyright 2014-2023 The Khronos Group Inc.

SPDX-License-Identifier: CC-BY-4.0


1. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the __opencl_c_int64 feature macro.
2. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the __opencl_c_fp64 feature macro.
3. vload3 and vload_half3 read (x,y,z) components from address (p + (offset * 3)) into a 3-component vector. vstore3 and vstore_half3 write (x,y,z) components from a 3-component vector to address (p + (offset * 3)). In addition, vloada_half3 reads (x,y,z) components from address (p + (offset * 4)) into a 3-component vector and vstorea_half3 writes (x,y,z) components from a 3-component vector to address (p + (offset * 4)). Whether vloada_half3 and vstorea_half3 read/write padding data between the third vector element and the next alignment boundary is implementation-defined. The vloada_ and vstorea_ variants are provided to access data that is aligned to the size of the vector, and are intended to enable performance on hardware that can take advantage of the increased alignment.