Name Strings cl_amd_bus_addressable_memory Contributors Pierre Boudier Graham Sellers Benedikt Kessler Ofer Rosenberg Contact Brian Sumner, AMD (Brian.Sumner 'at' amd.com) IP Status No known IP issues. Version Version 1.2, Nov 3, 2011 Number OpenCL Extension #25 Status Draft Extension Type OpenCL platform extension Dependencies OpenCL 1.1 is required Overview This extension defines an API that allows improved control of the physical memory used by the graphics device It allows to share a memory allocated by the Graphics driver to be used by other device on the bus by exposing a write-only bus address. One example of application would be a video capture device which would DMA into the GPU memory It also offers the reverse operation of specifying a buffer allocated on another device to be used for write access by the GPU Table and Chapter numbers mentioned below match OpenCL 1.1 Rev 44 New Procedures and Functions cl_int clEnqueueWaitSignalAMD(cl_command_queue command_queue, cl_mem mem_object, uint value, cl_uint num_events, const cl_event *event_wait_list, cl_event *event); cl_int clEnqueueWriteSignalAMD(cl_command_queue command_queue, cl_mem mem_object, uint value, cl_ulong offset, cl_uint num_events, const cl_event *event_wait_list, cl_event *event); cl_int clEnqueueMakeBuffersResidentAMD(cl_command_queue command_queue, cl_uint num_mem_objects, cl_mem* mem_objects, cl_bool blocking_make_resident, cl_bus_address_amd * bus_addresses, cl_uint num_events, const cl_event *event_wait_list, cl_event *event); New Types typedef struct _cl_bus_address_amd New Tokens Accepted by the parameter of clCreateBuffer. CL_MEM_BUS_ADDRESSABLE_AMD (1<<30) CL_MEM_EXTERNAL_PHYSICAL_AMD (1<<31) New command types for the events returned by the above functions CL_COMMAND_WAIT_SIGNAL_AMD 0x4080 CL_COMMAND_WRITE_SIGNAL_AMD 0x4081 CL_COMMAND_MAKE_BUFFERS_RESIDENT_AMD 0x4082 Additions to Table 5.4 (List of supported cl_mem_flags values) cl_mem_flags | Description ---------------------------------------------------------------------- CL_MEM_BUS_ADDRESSABLE_AMD | This flag specifies that the application | wants the OpenCL implementation to | create a buffer that can be accessed by | remote device DMA. | | CL_MEM_BUS_ADDRESSABLE_AMD, | CL_MEM_ALLOC_HOST_PTR and | CL_MEM_USE_HOST_PTR | are mutually exclusive. | CL_MEM_EXTERNAL_PHYSICAL_AMD| This flag specifies that the application | wants the OpenCL implementation to | create a buffer from an already | allocated memory on remote device | | CL_MEM_EXTERNAL_PHYSICAL_AMD, | CL_MEM_ALLOC_HOST_PTR, | CL_MEM_COPY_HOST_PTR and | CL_MEM_USE_HOST_PTR | are mutually exclusive. | | CL_MEM_EXTERNAL_PHYSICAL_AMD, | CL_MEM_READ_WRITE and CL_MEM_READ_ONLY | are mutually exclusive. Additions to section 5.2.1 (Creating Buffer Objects) This extension defines two new flags passed to clCreateBuffer: CL_MEM_BUS_ADDRESSABLE_AMD & CL_MEM_EXTERNAL_PHYSICAL_AMD, used to create OpenCL buffers which are used to communicate with a remote device. In addition, the extension defines the following structure which provides the bus address information: typedef struct _cl_bus_address_amd { cl_long surfbusaddress; cl_long signalbusaddress }cl_bus_address_amd; There are two types of buffer objects that can be created: 1. A buffer object which represents a buffer created on the device's memory, which can be shared with a remote device, to be shared with the remote device. This buffer is created using CL_MEM_BUS_ADDRESSABLE_AMD. The application may initialize this memory by adding the mem flag CL_MEM_COPY_HOST_PTR, and providing data in host_ptr. Otherwise, the buffer content are undefined. The application is required to make buffers resident before accessing them from remote device. Using these buffers without making them resident will lead to undefined behavior (especially, the addresses for other Buffers may become invalid). See next section for making buffers resident. 2. A write only buffer object which represents a remote buffer, located on the remote device. There is no actual memory allocation on the device in this case. This buffer is created using CL_MEM_EXTERNAL_PHYSICAL_AMD. A kernel running on the device can only write to this buffer. When creating a buffer using CL_MEM_EXTERNAL_PHYSICAL_AMD, the application is required to pass the cl_bus_address_amd struct to host_ptr argument of clCreateBuffer. will contain the page aligned physical starting address of the backing store preallocated by the application on a remote device. will contain the page aligned physical starting address of preallocated signaling surface. Both bus addresses must have been allocated from the same device and memory pool. Failure will occur if multiple buffers are created for the same . Map/unmap and read operations are not supported for external physical memory. Sub buffers are not supported for both bus addressable memory and external physical memory. The following errors are added to clCreateBuffer: CL_MEM_OBJECT_ALLOCATION_FAILURE is generated if the parameter of clCreateBuffer is CL_MEM_EXTERNAL_PHYSICAL_AMD, and the remote bus address cannot be mapped to the device address space. CL_INVALID_HOST_PTR is generated if the parameter of clCreateBuffer is CL_MEM_EXTERNAL_PHYSICAL_AMD and the or parameter of clCreateBuffer are 0 or not page aligned. CL_OUT_OF_HOST_MEMORY is generated if if the parameter of clCreateBuffer is CL_BUS_ADDRESSABLE_MEMORY_AMD, and no memory can be allocated with a valid bus address. CL_OUT_OF_RESOURCES if bus addressable memory is already used by another application or context New section 5.4.4 (Making buffers resident) The application requires the bus address in order to access the buffers from a remote device. As the OS may rearrange buffers to make space for other memory allocation, we must make the buffers resident before trying to access them on remote device. The following API is used to make buffers resident: cl_int clEnqueueMakeBuffersResidentAMD(cl_command_queue command_queue, cl_uint num_mem_objects, const cl_mem *mem_objects cl_bus_address_amd * bus_addresses, cl_bool blocking_make_resident, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event); The memory objects passed need to be buffers created with CL_MEM_BUS_ADDRESSABLE_AMD flag. clEnqueueMakeBuffersResidentAMD return CL_SUCCESS if the function is executed successfully. Otherwise, it returns one of the following errors: CL_INVALID_OPERATION is generated if any of the pointer parameters of clEnqueueMakeBuffersResidentAMD are NULL (and count is > 0). CL_INVALID_OPERATION is generated if any of the mem_objects passed to clEnqueueMakeBuffersResidentAMD was not a valid cl_mem object created with CL_BUS_ADDRESSABLE_MEMORY_AMD flag. CL_OUT_OF_HOST_MEMORY is generated if any of the mem_objects passed to clEnqueueMakeBuffersResidentAMD could not be made resident so that the buffer or signal bus addresses will be returned as 0. New section 5.4.5 (Memory Object Synchronization) The following API is used to synchronize with the remote device. This Synchronization enables the device to know when the other device finished writing to the buffer/s. cl_int clEnqueueWaitSignalAMD(cl_command_queue command_queue, cl_mem buffer, uint value, cl_uint num_events, const cl_event *event_wait_list, cl_event *event); This command instructs the OpenCL to wait until is written to before issuing the next command. cl_int clEnqueueWriteSignalAMD(cl_command_queue command_queue, cl_mem buffer, uint value, cl_ulong offset, cl_uint num_events, const cl_event *event_wait_list, cl_event *event); This command instructs the OpenCL to write to the signal address + of (which must be a buffer created with CL_MEM_EXTERNAL_PHYSICAL_AMD). This should be done after a write operation by the device into that buffer is complete. Consecutive marker values must keep increasing. These commands return CL_SUCCESS if the function is executed successfully. Otherwise, it returns one of the following errors: CL_INVALID_MEM_OBJECT is generated if the parameter of clEnqueueWaitSignalAMD or clEnqueueWriteSignalAMD is not a valid buffer CL_INVALID_COMMAND_QUEUE is generated if the parameter of clEnqueueWaitSignalAMD or clEnqueueWriteSignalAMD is not a valid command queue. CL_INVALID_MEM_OBJECT is generated if the parameter of clEnqueueWaitSignalAMD does not represent a buffer allocated with CL_MEM_BUS_ADDRESSABLE_AMD CL_INVALID_MEM_OBJECT is generated if the parameter of clEnqueueWriteSignalAMD does not represent a buffer defined as CL_EXTERNAL_PHYSICAL_MEMORY_AMD CL_INVALID_BUFFER_SIZE is generated if the parameter of clEnqueueWriteSignalAMD would lead to a write beyond the size of CL_INVALID_VALUE is generated if the signal address used by clEnqueueWriteSignalAMD or clEnqueueWaitSignalAMD of is invalid (for example 0)