Dependencies
This extension is written against the OpenCL Specification v3.0.3.
This extension requires OpenCL 2.0.
Overview
This extension gives applications explicit control over some aspects of work scheduling. It can be used for performance tuning or debugging.
New API Enums
Accepted value for the param_name parameter to clGetDeviceInfo:
CL_DEVICE_SCHEDULING_CONTROLS_CAPABILITIES_ARM 0x41E4
CL_DEVICE_SCHEDULING_KERNEL_BATCHING_ARM (1 << 0)
CL_DEVICE_SCHEDULING_WORKGROUP_BATCH_SIZE_ARM (1 << 1)
CL_DEVICE_SCHEDULING_WORKGROUP_BATCH_SIZE_MODIFIER_ARM (1 << 2)
CL_DEVICE_SCHEDULING_DEFERRED_FLUSH_ARM (1 << 3)
CL_DEVICE_SCHEDULING_REGISTER_ALLOCATION_ARM (1 << 4)
CL_DEVICE_SUPPORTED_REGISTER_ALLOCATIONS_ARM 0x41EB
Accepted value for the param_name parameter to clSetKernelExecInfo:
CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM 0x41E5
CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_MODIFIER_ARM 0x41E6
Accepted value for the properties parameter to clCreateCommandQueueWithProperties:
CL_QUEUE_KERNEL_BATCHING_ARM 0x41E7
CL_QUEUE_DEFERRED_FLUSH_ARM 0x41EC
New build options
This extension adds a build option to control the number of registers allocated to each thread:
-fregister-allocation=
Missing before version 0.3.0 |
Modifications to the OpenCL API Specification
- (Modify Section 4.2, Querying Devices)
-
- (Add the following to Table 5, Device Queries)
cl_device_info | Return Type | Description |
---|---|---|
|
|
Returns a bitfield of the scheduling controls this device supports:
- |
|
|
Returns an array of valid register allocations for this device. Each of the
returned values can be passed to the |
- (Modify Section 5.1, Command Queues)
-
- (Add the following to Table 9, List of supported queue creation properties)
Queue Properties | Type | Description |
---|---|---|
|
|
Whether kernels enqueued to this queue should be batched for submission to the
device. |
|
|
Whether flush operations are performed in the thread triggering the flush or
deferred for execution in another thread managed by the OpenCL runtime.
|
- (Modify Section 5.8.6, Compiler Options)
-
- (Add the following to Optimization Options)
The following options are supported when building programs created from source or intermediate language. Specifying these options when building a program created from a binary will result in {CL_INVALID_BUILD_OPTIONS} being returned by clBuildProgram.
-fregister-allocation=<number-of-registers-per-thread>
-
This option overrides the compiler’s selection of the number of machine registers allocated to each thread for all kernels in the program.
-fregister-allocation=<kernel-name>:<number-of-registers-per-thread>[,…]
-
This option overrides the compiler’s selection of the number of machine registers allocated to each thread for specific kernels in the program.
- (Modify Section 5.9, Kernel Objects)
-
- (Add the following to Table 31, List of param_values accepted by *clSetKernelExecInfo*)
cl_kernel_exec_info | Type | Description |
---|---|---|
|
|
Set the size of batches of work groups distributed to compute units. The value is a number of work groups. If set to 0, then the runtime will pick a suitable value automatically. Defaults to 0. If the value is greater than the number of work groups necessary to execute a given NDRange, the actual batch size will be capped at the number of work groups in the NDRange. When a value is not directly usable due to device-specific constraints, it will be rounded up to the next usable value. |
|
|
Modify the size of batches of work groups distributed to compute units. On devices that support On devices that do not support |
Interactions with Other Extensions
Some features in this extension interact with cl_arm_thread_limit_hint
.
If CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM
or
CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_MODIFIER_ARM
is set to a non-default
value at the time a kernel is enqueued, then any thread limit hint specified
in the kernel source using the arm_thread_limit_hint
attribute will be
ignored.