Name Strings

cl_arm_scheduling_controls

Contact

Kevin Petit (kevin.petit 'at' arm.com)

Contributors

Kevin Petit, Arm Ltd.

Notice

Copyright (c) 2020 Arm Ltd.

Status

Shipping

Version

Built On: 2021-02-11
Version: 0.3.0

Dependencies

This extension is written against the OpenCL Specification v3.0.3.

This extension requires OpenCL 2.0.

Overview

This extension gives applications explicit control over some aspects of work scheduling. It can be used for performance tuning or debugging.

New API Enums

Accepted value for the param_name parameter to clGetDeviceInfo:

CL_DEVICE_SCHEDULING_CONTROLS_CAPABILITIES_ARM           0x41E4

CL_DEVICE_SCHEDULING_KERNEL_BATCHING_ARM                (1 << 0)
CL_DEVICE_SCHEDULING_WORKGROUP_BATCH_SIZE_ARM           (1 << 1)
CL_DEVICE_SCHEDULING_WORKGROUP_BATCH_SIZE_MODIFIER_ARM  (1 << 2)
CL_DEVICE_SCHEDULING_DEFERRED_FLUSH_ARM                 (1 << 3)
CL_DEVICE_SCHEDULING_REGISTER_ALLOCATION_ARM            (1 << 4)

CL_DEVICE_SUPPORTED_REGISTER_ALLOCATIONS_ARM              0x41EB

Accepted value for the param_name parameter to clSetKernelExecInfo:

CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM           0x41E5
CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_MODIFIER_ARM  0x41E6

Accepted value for the properties parameter to clCreateCommandQueueWithProperties:

CL_QUEUE_KERNEL_BATCHING_ARM 0x41E7
CL_QUEUE_DEFERRED_FLUSH_ARM  0x41EC

New build options

This extension adds a build option to control the number of registers allocated to each thread:

-fregister-allocation=
Missing before version 0.3.0

Modifications to the OpenCL API Specification

(Modify Section 4.2, Querying Devices)
(Add the following to Table 5, Device Queries)
cl_device_info Return Type Description

CL_DEVICE_SCHEDULING_CONTROLS_CAPABILITIES_ARM

cl_device_scheduling_controls_capabilities_arm

Returns a bitfield of the scheduling controls this device supports:
- CL_DEVICE_SCHEDULING_KERNEL_BATCHING_ARM is set when the device supports CL_QUEUE_KERNEL_BATCHING_ARM.
- CL_DEVICE_SCHEDULING_WORKGROUP_BATCH_SIZE_ARM is set when the device supports CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM.
- CL_DEVICE_SCHEDULING_WORKGROUP_BATCH_SIZE_MODIFIER_ARM is set when the device supports CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_MODIFIER_ARM. - CL_DEVICE_SCHEDULING_DEFERRED_FLUSH_ARM is set when the device supports CL_QUEUE_DEFERRED_FLUSH_ARM.

- CL_DEVICE_SCHEDULING_REGISTER_ALLOCATION_ARM is set when the device compiler supports the -fregister-allocation option.

CL_DEVICE_SUPPORTED_REGISTER_ALLOCATIONS_ARM

cl_int[]

Returns an array of valid register allocations for this device. Each of the returned values can be passed to the -fregister-allocation build option.
Missing before version 0.3.

(Modify Section 5.1, Command Queues)
(Add the following to Table 9, List of supported queue creation properties)
Queue Properties Type Description

CL_QUEUE_KERNEL_BATCHING_ARM

cl_bool

Whether kernels enqueued to this queue should be batched for submission to the device. CL_TRUE means kernels will be batched, CL_FALSE that they will not. Defaults to CL_TRUE.

CL_QUEUE_DEFERRED_FLUSH_ARM

cl_bool

Whether flush operations are performed in the thread triggering the flush or deferred for execution in another thread managed by the OpenCL runtime. CL_TRUE means flush operations are deferred. Defaults to CL_TRUE.
Missing before version 0.2.

(Modify Section 5.8.6, Compiler Options)
(Add the following to Optimization Options)

The following options are supported when building programs created from source or intermediate language. Specifying these options when building a program created from a binary will result in {CL_INVALID_BUILD_OPTIONS} being returned by clBuildProgram.

-fregister-allocation=<number-of-registers-per-thread>

This option overrides the compiler’s selection of the number of machine registers allocated to each thread for all kernels in the program.

-fregister-allocation=<kernel-name>:<number-of-registers-per-thread>[,…​]

This option overrides the compiler’s selection of the number of machine registers allocated to each thread for specific kernels in the program.

(Modify Section 5.9, Kernel Objects)
(Add the following to Table 31, List of param_values accepted by *clSetKernelExecInfo*)
cl_kernel_exec_info Type Description

CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM

cl_uint

Set the size of batches of work groups distributed to compute units. The value is a number of work groups. If set to 0, then the runtime will pick a suitable value automatically. Defaults to 0. If the value is greater than the number of work groups necessary to execute a given NDRange, the actual batch size will be capped at the number of work groups in the NDRange. When a value is not directly usable due to device-specific constraints, it will be rounded up to the next usable value.

CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_MODIFIER_ARM

cl_int

Modify the size of batches of work groups distributed to compute units.

On devices that support CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM, the value is a number of work groups added to the batch size calculated by the runtime (when CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM set to 0) or set by the application (when CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM set to a value greater than 0).

On devices that do not support CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM, the value is a number in the range [-31,+31]. When set to 0, the runtime-selected batch size is used unmodified. When set to a non-zero value, each increment of one unit in either direction around zero will either divide (negative value) or multiply (positive value) the batch size by 2. If the requested modification is not possible due to hardware constraints, the greatest possible modification will be used.

Interactions with Other Extensions

Some features in this extension interact with cl_arm_thread_limit_hint.

If CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_ARM or CL_KERNEL_EXEC_INFO_WORKGROUP_BATCH_SIZE_MODIFIER_ARM is set to a non-default value at the time a kernel is enqueued, then any thread limit hint specified in the kernel source using the arm_thread_limit_hint attribute will be ignored.

Issues

None.

Revision History

Version Date Author Changes

0.3.0

2021-01-19

Kévin Petit

Add support for register allocation control

0.2.0

2020-09-14

Kévin Petit

Add support for deferred queue flush control

0.1.0

2020-08-28

Kévin Petit

Initial version