Dependencies
This extension is written against the OpenCL Specification v3.0.3.
This extension requires OpenCL 2.0.
Overview
This extension gives applications explicit control over some aspects of work scheduling. It can be used for performance tuning or debugging.
New API Enums
Accepted value for the param_name parameter to clGetDeviceInfo:
CL_DEVICE_SCHEDULING_CONTROLS_CAPABILITIES_ARM 0x41E4
CL_DEVICE_SCHEDULING_KERNEL_BATCHING_ARM (1 << 0)
CL_DEVICE_SCHEDULING_WORKGROUP_BATCH_SIZE_ARM (1 << 1)
CL_DEVICE_SCHEDULING_WORKGROUP_BATCH_SIZE_MODIFIER_ARM (1 << 2)
CL_DEVICE_SCHEDULING_DEFERRED_FLUSH_ARM (1 << 3)
CL_DEVICE_SCHEDULING_REGISTER_ALLOCATION_ARM (1 << 4)
CL_DEVICE_SCHEDULING_WARP_THROTTLING_ARM (1 << 5)
CL_DEVICE_SCHEDULING_COMPUTE_UNIT_BATCH_QUEUE_SIZE_ARM (1 << 6)
CL_DEVICE_SUPPORTED_REGISTER_ALLOCATIONS_ARM 0x41EB
CL_DEVICE_MAX_WARP_COUNT_ARM 0x41EA
Accepted value for the param_name parameter to clSetKernelExecInfo:
CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM 0x41E5
CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_MODIFIER_ARM 0x41E6
CL_KERNEL_EXEC_INFO_WARP_COUNT_LIMIT_ARM 0x41E8
CL_KERNEL_EXEC_INFO_COMPUTE_UNIT_MAX_QUEUED_BATCHES_ARM 0x41F1
Accepted value for the properties parameter to clCreateCommandQueueWithProperties:
CL_QUEUE_KERNEL_BATCHING_ARM 0x41E7
CL_QUEUE_DEFERRED_FLUSH_ARM 0x41EC
Accepted value for the param_name parameter to clGetKernelInfo:
CL_KERNEL_MAX_WARP_COUNT_ARM 0x41E9
New build options
This extension adds a build option to control the number of registers allocated to each thread:
-fregister-allocation=
Missing before version 0.3.0 |
Modifications to the OpenCL API Specification
- (Modify Section 4.2, Querying Devices)
-
- (Add the following to Table 5, Device Queries)
cl_device_info | Return Type | Description |
---|---|---|
|
|
Returns a bitfield of the scheduling controls this device supports:
- - - |
|
|
Returns an array of valid register allocations for this device. Each of the
returned values can be passed to the |
|
|
Returns the maximum number of warps per compute unit a kernel may use. The
value returned is an upper bound for any possible kernel. When
|
- (Modify Section 5.1, Command Queues)
-
- (Add the following to Table 9, List of supported queue creation properties)
Queue Properties | Type | Description |
---|---|---|
|
|
Whether kernels enqueued to this queue should be batched for submission to the
device. |
|
|
Whether flush operations are performed in the thread triggering the flush or
deferred for execution in another thread managed by the OpenCL runtime.
|
- (Modify Section 5.8.6, Compiler Options)
-
- (Add the following to Optimization Options)
The following options are supported when building programs created from
source or intermediate language. Specifying these options when building
a program created from a binary will result in CL_INVALID_
being returned by clBuildProgram.
-fregister-allocation=<number-of-registers-per-thread>
-
This option overrides the compiler’s selection of the number of machine registers allocated to each thread for all kernels in the program.
-fregister-allocation=<kernel-name>:<number-of-registers-per-thread>[,…]
-
This option overrides the compiler’s selection of the number of machine registers allocated to each thread for specific kernels in the program.
- (Modify Section 5.9, Kernel Objects)
-
- (Add the following to Table 31, List of param_values accepted by *clSetKernelExecInfo*)
cl_kernel_exec_info | Type | Description |
---|---|---|
|
|
Set the size of batches of work groups distributed to compute units. The value is a number of work groups. If set to 0, then the runtime will pick a suitable value automatically. Defaults to 0. If the value is greater than the number of work groups necessary to execute a given NDRange, the actual batch size will be capped at the number of work groups in the NDRange. When a value is not directly usable due to device-specific constraints, it will be rounded up to the next usable value. |
|
|
Modify the size of batches of work groups distributed to compute units. On devices that support On devices that do not support |
|
|
Limit the number of warps allowed to run in each compute unit for this kernel. |
|
|
Limit the number of workgroup batches each compute unit can have in its queue. |
- (Add the following to Table 32, List of supported param_names by *clGetKernelInfo*)
Kernel Info | Return type | Description |
---|---|---|
|
|
Returns the maximum number of warps this kernel can use per compute unit. |
Interactions with Other Extensions
Some features in this extension interact with cl_arm_thread_limit_hint
.
If CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM
or
CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_MODIFIER_ARM
is set to a non-default
value at the time a kernel is enqueued, then any thread limit hint specified
in the kernel source using the arm_thread_limit_hint
attribute will be
ignored.
Revision History
Version | Date | Author | Changes |
---|---|---|---|
0.5.0 |
2022-06-29 |
Kévin Petit |
Add support for compute unit batch size queue control |
0.4.0 |
2022-02-28 |
Kévin Petit |
Add support for warp throttling |
0.3.0 |
2021-01-19 |
Kévin Petit |
Add support for register allocation control |
0.2.0 |
2020-09-14 |
Kévin Petit |
Add support for deferred queue flush control |
0.1.0 |
2020-08-28 |
Kévin Petit |
Initial version |