Name Strings cl_altera_compiler_mode Contributors David Neto, Altera Corporation Contact Michael Kinsner, mkinsner 'at' altera 'dot' com IP Status No known IP claims. Version Version 1, 2014-02-06 Number OpenCL Extension #30 Status Complete. Shipping with the Altera SDK for OpenCL, version 14.0 An earlier version shipped with the Altera SDK for OpenCL, version 13.0 Extension Type OpenCL platform extension Dependencies OpenCL 1.0 is required. This document is written against revision 48 of the OpenCL 1.0 specification. Overview This extension specifies alternative device code compilation flows for OpenCL. In standard OpenCL, there are two methods for preparing code for execution on an OpenCL device. The first is to compile OpenCL C code from source text using an online compiler. The second is to load a precompiled device binary. This extension specifies mechanisms to enable alternative workflows for compiling and loading device code. They enable simplified or optimized support for the following development and deployment scenarios: 1. The target OpenCL device is not available in the development environment. 2. The platform does not provide an online compiler for the target OpenCL device, or full online compilation is not desirable during host application development. 3. The application only requires one device program, and does not need read access to the device binary. The development and deployment of a particular application may involve more than one of these scenarios. The mechanisms specified by this extension are: 1. An "offline device" feature: This is the ability to specify that the OpenCL runtime should partially emulate the presence a device. From the perspective of the host program, emulation should be complete, except that kernel code may have no effect. For example, kernels are enqueued, buffers are copied or mapped as required, and event profiling information is updated to reflect command progression. However side effects produced by running the code inside kernels may not be visible. 2. A "create program executable library" compiler mode: The clBuildProgram API method does not fully compile device code. Instead, it performs an online stub compilation, and saves enough information in a data store to perform a deferred full (offline) compilation. 2a. The stub compilation produces a binary which includes enough kernel interface information to satisfy further host activities such as creating cl_kernel objects, setting kernel arguments, and enqueueing kernels for execution. But the stub binary may not have executable code, so the execution of kernels from such a binary may not produce valid data. 2b. The data store is used to perform offline compilation of kernel programs. Each logically distinct call of clBuildProgram generates an entry in the data store. Each entry is a tuple consisting of: - The target device - The kernel program source - The options supplied to clBuildProgram - A set of instructions (such as a script) for offline compilation of the kernel program. The result of executing these instructions is to associate a device binary (including executable device code) with this tuple in the data store. 3. A "use program executable library" mode: We assume that an offline compilation has been performed for all entries in the data store generated by use of the "create program executable library" mode. In the "use program exectuable library" mode, the clBuildProgram API method performs a lookup in the data store based on the following attributes: - The target device - The kernel program source - The options supplied to clBuildProgram If such an entry exists in the data store, then the associated binary is loaded into the cl_program object. If no such entry exists, or if the entry is not associated with a device binary, then an error is returned. 4. A "preloaded binary only" compiler mode: In this mode, the runtime does not compile any programs, does not load the device with new code at runtime, and does not necessarily produce a valid device binary when requested by the CL_PROGRAM_BINARIES query to the clGetProgramInfo API method. Instead: - The device is assumed to already have a program loaded before the cl_context is created. - All cl_program objects behave as if they refer to the preloaded binary - The clBuildProgram API method is a no-op except for setting a successful build status. The following paragraphs describe the benefits provided by the mechanisms specified in this extension. Scenario 1: Developing the host part of an OpenCL application without a device being present: The offline device feature enables the execution of a host program even while a device is not present in the system. This is useful for developing an application before a device has been manufactured, is otherwise scarce, or unavailable. Although the data coming back from the emulated device is invalid, host code interaction with the runtime may be developed and tested. Scenario 2: Porting an existing application to a platform without an online compiler: Many OpenCL platforms provide an online OpenCL C compiler for each device. Therefore host applications often use clCreateProgramWithSource and clBuildProgram to create executable device code. The standard method of porting such a program to an environment without an online compiler is to replace the compile-from-source sequence of API calls with a find-and-load-from-binary code sequence. Transforming the host application in this way introduces platform dependencies (such as APIs to access a filesystem) and adds bookkeeping complexity to map the original OpenCL C source and compile options with an associated device binary. The porting and verification effort can be quite onerous if many portions of the host code must be modified, or if the device source code is parameterized at runtime. We can reduce porting effort in this scenario by using a three-phase approach. In the first phase, we run the host program with the runtime configured to use offline device mode, and also the "create program executable libary" compiler mode. This generates a data store (the program executable library) containing enough information to compile the required device programs in an offline manner, i.e. outside the control of the host program. In the second phase, we perform an offline compilation of all entries in the data store. This is done by enumerating the entries in the data store and following the compilation instructions for each entry. In the third phase, the application is fully functional. The host is configured to use "use program executable library" compiler mode. Any API request to compile a device program from source is translated into a lookup of the (fully functional) device binary in the data store. This approach works when: - The device has a stable name. That is, the device name does not change from one run to another. - The host program calls clBuildProgram with only a finite set of combinations of device, kernel source, and build options. - Those combinations are consistent from one run to the next. That is, the kernel source and build options do not depend on the data results from executing a previous kernel. - Other context upon which the compilation depends is stable. For example, the location and contents of files included from the OpenCL C source remains the same. Scenario 3: An application needs only one cl_program, runs in a constrained environment, and may need "instant on" behaviour: The "preloaded binary only" compiler mode is desiged to be used by embedded applications. Such applications typically have the following constraints: - The application is simple enough that only one cl_program is required. - The target system has tight memory and time constraints. It may be infeasible or undesirable to process a full device binary through the standard OpenCL runtime APIs. For example, the device binary might be too large to map into host memory, or the application cannot tolerate the delay to load the device binary from external storage. - The application has no requirement to recover the device binary via the CL_PROGRAM_BINARIES query to clGetProgramInfo. Even moreso, it may be undesirable for intellectual property reasons or otherwise to allow the host program to recover the device binary. - It is desirable to develop the application in the normal way, but to minimize host program code changes before deployment. That is, application development uses the standard clCreateProgramWithSource, or clCreateProgramWithBinary APIs. But the transition to a test or production environment is limited to a configuration change, as opposed to a code change. Header File Interface constants are defined in cl_ext.h New Tokens New context properties to be used in the array supplied to to clCreateContext or clCreateContextFromType: CL_CONTEXT_COMPILER_MODE_ALTERA 0x40F0 CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA 0x40F1 CL_CONTEXT_OFFLINE_DEVICE_ALTERA 0x40F2 Values to be supplied for context property CL_CONTEXT_COMPILER_MODE_ALTERA: CL_CONTEXT_COMPILER_MODE_OFFLINE_ALTERA 0 CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA 1 CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA 2 CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA 3 Additions to Chapter 4 of the OpenCL 1.0 (v48) Specification Add the following to Table 4.4, List of supported properties by clCreateContext: cl_context_properties enum: CL_CONTEXT_OFFLINE_DEVICE_ALTERA Property value: const char* Description: Specifies that the runtime should partly emulate the presence of the named device. The device should behave normally except that kernels executed on such an "offline" device might not produce any side effects. The specified name value should be the first word in the full name returned by the CL_DEVICE_NAME, in C-style form. An implementation may restrict the use of this property as follows: - A platform may require external initialization when using this context property. For example, Altera's platform implementations require that environment variable CL_CONTEXT_OFFLINE_DEVICE_ALTERA be set to the same device name string as supplied to this property. - When this property is specified, the specified device may be the only available device in the platform. This property may be useful for developing or porting applications when no online compiler is available, and when the CL_CONTEXT_COMPILER_MODE_ALTERA property is set to CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA. cl_context_properties enum: CL_CONTEXT_COMPILER_MODE_ALTERA Property value: cl_ulong Description: For devices without an online compiler, this property specifies alternative behaviour for building, loading, and querying device programs. (Platforms implementing the embedded profile are the only ones that may omit providing an online compiler.) The value should be one of the following: CL_CONTEXT_COMPILER_MODE_OFFLINE_ALTERA, CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA, CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA, CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA. CL_CONTEXT_COMPILER_MODE_OFFLINE_ALTERA - The clBuildProgram and clCreateProgramWithBinary, and clGetProgramInfo API methods behave in the standard way. In particular, clBuildProgram may fail with error CL_COMPILER_NOT_AVAILABLE. This value is the default, and is the only value which specifies conformant behaviour. CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA - The device is assumed to already have a program loaded before the first OpenCL APIs are invoked. All cl_program objects behave as if they refer to the preloaded device binary. The clBuildProgram API method always succeeds when compiling for this device, but does not produce an executable device binary. The clCreateProgramWithBinary does not validate the provided binary. It always succeeds provided its arguments are well-formed. The data returned by the CL_PROGRAM_BINARIES query for clGetProgramInfo is unspecified. It may not be a valid device binary. CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA - When compiling a program created with clCreateProgramWithSource, the clBuildProgram method performs a "stub" compilation, and populates a data store with enough information to perform a program compilation offline (i.e. outside the control of the host program). The online stub compilation produces a binary which includes enough information to satsify further runtime API activities such as creating cl_kernel objects, setting kernel arguments, and enqueing kernels for execution. The stub binary may not have executable code, so the execution of kernels from such a binary may not produce valid data. This value for CL_CONTEXT_COMPILER_MODE_ALTERA is designed to be used with the CL_CONTEXT_OFFLINE_DEVICE_ALTERA context property. Additionally, each logically distinct call of clBuildProgram generates an entry in an external data store which persists beyond the lifetime of the host program. Each entry is a tuple consisting of: - The target device - The kernel program source - The options supplied to clBuildProgram - A set of instructions (such as a script) for offline compilation of the kernel program. The result of executing these instructions is to associate a device binary (including executable device code) with this tuple in the data store. In Altera's platform implementation, the data store is a tree of directories and files. Each data store entry is a leaf directory in this tree containing: - a file named "kernels.cl" containing the kernel program source - a file named "build.cmd" containing the list of operating system commands to be used to compile the program to a device binary, and to store that device binary in a file named "kernels.aocx" in the same directory. CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA - When compiling a program created with clCreateProgramWithSoruce, the clBuildProgram API method performs a lookup in the data store based on the following attributes: - The target device - The kernel program source - The options supplied to clBuildProgram If such an entry exists in the data store, then the associated binary is loaded into the cl_program. If no such entry exists, or if the entry is not associated with a device binary, then an error is returned. cl_context_properties enum: CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA Property value: const char* Description: Specifies the filesystem root directory for the data store used when either value CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA or value CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA is specified for context property CL_CONTEXT_COMPILER_MODE_ALTERA. The specified value may be a relative directory name, but will be resolved to an absolute path at context creation time. If this property is left unspecified, then data store root is implementation-defined. In Altera's platform implementation, the default is the "aocl_program_library" subdirectory in the current directory in effect at context creation time. Additions to Chapter 5 of the OpenCL 1.0 (v48) Specification Additions to Section 5.4.1 Creating Program Objects When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA is specified with value CL_CONTEXT_COMPILER_PRELOADED_BINARY_ONLY_ALTERA, the clCreateProgramWithBinary API method behaves in a non-standard way. See the description of this property value in Table 4.4 for more information. Additions to Section 5.4.2 Building Program Exectuables When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA is specified, the clBuildProgram API method behaves in a non-standard way when compiling for devices without an online compiler. See the description of this property value in Table 4.4 for more information. Additions to Section 5.4.5 Program Object Queries The following is added to the description of the CL_PROGRAM_BINARIES query in Table 5.11: When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA is specified with either value CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA or value CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA, then the binary returned by this query may not be a valid device binary. See the description of the CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA property value in Table 4.4 for more information. Additions to Section 5.6 Executing Kernels When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA is specified with value CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA, or if context property CL_CONTEXT_OFFLINE_DEVICE_ALTERA is specified, then a kernel execution may behave as if the kernel code produces no side effects, i.e. as if the body of the kernel contains no statements. (All other runtime activites must occur. For example, the kernels are still enqueued, buffers are copied or mapped as required to satisfy kernel arguments, and event profiling information is updated to reflect command progression.) See the description of the CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA property value in Table 4.4 for more information. Implementation Notes For each context property defined in this extension specification, Altera's platform implementation allows the property to be specified by setting an environment variable of the same name. For CL_CONTEXT_OFFLINE_DEVICE_ALTERA, the environment variable of *must* be set before any platform APIs are called. For CL_CONTEXT_COMPILER_MODE_ALTERA, value of the environment variable is the numerical value for the corresponding value enum as defined in the CL/cl_ext.h header file. Precedence: For context properties CL_CONTEXT_COMPILER_MODE_ALTERA and CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA, any value explicitly provided to the clCreateContext or clCreateContextFromType API methods will override an environment variable setting. If property CL_CONTEXT_OFFLINE_DEVICE_ALTERA is explicitly provided to the clCreateCOTNEXT or clCreateContextFromType methods, then it must match the value set for environment variable CL_CONTEXT_OFFLINE_DEVICE_ALTERA. Issues 1. The use of context property CL_CONTEXT_OFFLINE_DEVICE_ALTERA is rather restricted: - It must be set at the "beginning of time" for a host application, e.g. via an environment variable setting. - If an "offline device" is used in a context, then online devices cannot be used in any context for the platform. In this light, it might be more natural to call it a "platform" property. But in OpenCL platforms are stateless, so this would not be appropriate. Instead the most basic runtime controls are applied to contexts, not platforms. These restrictions reflect limitations of Altera's original implementation. However, the feature in its current form is still quite useful for application development and porting. Sample Code Example for using an "offline device", with creation and initialization of a data store for offline compilation of program binaries. #include #include #include #define CHECK(X) assert(CL_SUCCESS == (X)) int main(...) { cl_platform platform = 0; cl_device_id device = 0; cl_context context = 0; cl_int status = 0; // Specify an offline device via environment variable here, or externally // before program startup. // This must occur befor the first OpenCL API method call. setenv("CL_CONTEXT_OFFLINE_DEVICE_ALTERA","mydevice"); CHECK( clGetPlatformIDs(1,&platform,0) ); CHECK( clGetDeviceIDs(platform,CL_DEVICE_TYPE_ACCELERATOR,1,&device,0) ); cl_context_properties props[] = { // Specify creation and initialization of a program library data store. CL_CONTEXT_COMPILER_MODE_ALTERA, (cl_context_properties)CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA // Where is the data store rooted? CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA (cl_context_properties)"/data/myproject/proglib" // Terminate the properties list. 0 }; context = clCreateContext( props, 1, &device, 0, 0, &status ); CHECK( status ); cl_command_queue cq = clCreateCommandQueue( context, device, 0, &status ); CHECK( status ); const char* source = "kernel void foo( global int* A ) { *A = 42; }"; cl_program program = clCreateProgramWithSource( context, 1, &source, NULL, &status ); CHECK( status ); // Perform a stub compilation, and create an entry for this (device,source,options) // combination in the program library data store. CHECK( clBuildProgram( program, 1, &device, "-cl-opt-disable", 0, 0 ) ); cl_mem mem = clCreateBuffer(context,CL_MEM_READ_WRITE,sizeof(cl_int),0,&status); CHECK( status ); // Only a stub compilation has been performed. // But this still enables full setup and enqueue of a kernel for execution. cl_kernel kernel = clCreateKernel( program, "foo", &status ); CHECK( status ); CHECK( clSetKernelArg( kernel, 0, sizeof(cl_mem), &mem ) ); // Expect that kernels scheduled by clEnqueueNDRangeKernel and // clEnqueueTask may not produce expected side effects. // It may appear that each kernel executes no statements. CHECK( clEnqueueTask( cq, kernel, 0, 0, 0 ) ); cl_int the_answer = 0; CHECK( clEnqueueReadBuffer( cq, mem, 1 /*block*/, 0, sizeof(the_answer),&the_answer,0,0,0) ); // Because we're using an "offline device", we can't rely on the_answer being 42. // ... Example for using an already-populated data store of program binaries compiled in an offline manner. The code is the same as the previous example, except use these context properties instead: cl_context_properties props[] = { // Specify the use of a program data store library of pre-compiled device binaries. CL_CONTEXT_COMPILER_MODE_ALTERA, (cl_context_properties)CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA // Where is the data store rooted? CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA (cl_context_properties)"/data/myproject/proglib" // Terminate the properties list. 0 }; Example for using a "preloaded binary": cl_context_properties props[] = { // Specify the use of a preloaded binary. CL_CONTEXT_COMPILER_MODE_ALTERA, (cl_context_properties)CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA, // Terminate the properties list. 0 }; context = clCreateContext( props, 1, &device, 0, 0, &status ); ... // All cl_program objects refer to the preloaded binary. // clCreateProgramWithBinary accepts invalid device binaries. // clBuildProgram succeeds but does not actually compile the program from source. Conformance Tests None. Revision History Version 0, 2013-05-06 - Initial revision. Documents the behaviours in the Altera SDK for OpenCL, version 13.0. Version 1, 2014-02-06 - Assigned registered values to tokens.