44. Execution Graphs
Execution graphs provide a way for applications to dispatch multiple operations dynamically from a single initial command on the host. To achieve this, a new execution graph pipeline is provided, that links together multiple shaders or pipelines which each describe one or more operations that can be dispatched within the execution graph. Each linked pipeline or shader describes an execution node within the graph, which can be dispatched dynamically from another shader within the same graph. This allows applications to describe much richer execution topologies at a finer granularity than would typically be possible with API commands alone.
44.1. Pipeline Creation
To create execution graph pipelines, call:
// Provided by VK_AMDX_shader_enqueue
VkResult vkCreateExecutionGraphPipelinesAMDX(
VkDevice device,
VkPipelineCache pipelineCache,
uint32_t createInfoCount,
const VkExecutionGraphPipelineCreateInfoAMDX* pCreateInfos,
const VkAllocationCallbacks* pAllocator,
VkPipeline* pPipelines);
-
device
is the logical device that creates the execution graph pipelines. -
pipelineCache
is either VK_NULL_HANDLE, indicating that pipeline caching is disabled; or the handle of a valid pipeline cache object, in which case use of that cache is enabled for the duration of the command. -
createInfoCount
is the length of thepCreateInfos
andpPipelines
arrays. -
pCreateInfos
is a pointer to an array of VkExecutionGraphPipelineCreateInfoAMDX structures. -
pAllocator
controls host memory allocation as described in the Memory Allocation chapter. -
pPipelines
is a pointer to an array of VkPipeline handles in which the resulting execution graph pipeline objects are returned.
The implementation will create a pipeline in each element of
pPipelines
from the corresponding element of pCreateInfos
.
If creation of any pipeline fails, that pipeline will be set to
VK_NULL_HANDLE.
If creation fails for a pipeline create info with a
VkExecutionGraphPipelineCreateInfoAMDX::flags
value that
included VK_PIPELINE_CREATE_EARLY_RETURN_ON_FAILURE_BIT
, all pipelines
at a greater index all automatically fail.
The VkExecutionGraphPipelineCreateInfoAMDX
structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkExecutionGraphPipelineCreateInfoAMDX {
VkStructureType sType;
const void* pNext;
VkPipelineCreateFlags flags;
uint32_t stageCount;
const VkPipelineShaderStageCreateInfo* pStages;
const VkPipelineLibraryCreateInfoKHR* pLibraryInfo;
VkPipelineLayout layout;
VkPipeline basePipelineHandle;
int32_t basePipelineIndex;
} VkExecutionGraphPipelineCreateInfoAMDX;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
flags
is a bitmask of VkPipelineCreateFlagBits specifying how the pipeline will be generated. -
stageCount
is the number of entries in thepStages
array. -
pStages
is a pointer to an array ofstageCount
VkPipelineShaderStageCreateInfo structures describing the set of the shader stages to be included in the execution graph pipeline. -
pLibraryInfo
is a pointer to a VkPipelineLibraryCreateInfoKHR structure defining pipeline libraries to include. -
layout
is the description of binding locations used by both the pipeline and descriptor sets used with the pipeline. -
basePipelineHandle
is a pipeline to derive from -
basePipelineIndex
is an index into thepCreateInfos
parameter to use as a pipeline to derive from
The parameters basePipelineHandle
and basePipelineIndex
are
described in more detail in Pipeline
Derivatives.
Each shader stage provided when creating an execution graph pipeline
(including those in libraries) is associated with a name and an index,
determined by the inclusion or omission of a
VkPipelineShaderStageNodeCreateInfoAMDX structure in its pNext
chain.
In addition to the shader name and index, an internal "node index" is also generated for each node, which can be queried with vkGetExecutionGraphPipelineNodeIndexAMDX, and is used exclusively for initial dispatch of an execution graph.
VK_SHADER_INDEX_UNUSED_AMDX
is a special shader index used to indicate
that the created node does not override the index.
In this case, the shader index is determined through other means.
It is defined as:
#define VK_SHADER_INDEX_UNUSED_AMDX (~0U)
The VkPipelineShaderStageNodeCreateInfoAMDX
structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkPipelineShaderStageNodeCreateInfoAMDX {
VkStructureType sType;
const void* pNext;
const char* pName;
uint32_t index;
} VkPipelineShaderStageNodeCreateInfoAMDX;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
pName
is the shader name to use when creating a node in an execution graph. IfpName
isNULL
, the name of the entry point specified in SPIR-V is used as the shader name. -
index
is the shader index to use when creating a node in an execution graph. Ifindex
isVK_SHADER_INDEX_UNUSED_AMDX
then the original index is used, either as specified by theShaderIndexAMDX
execution mode, or0
if that too is not specified.
When included in the pNext
chain of a
VkPipelineShaderStageCreateInfo structure, this structure specifies
the shader name and shader index of a node when creating an execution graph
pipeline.
If this structure is omitted, the shader name is set to the name of the
entry point in SPIR-V and the shader index is set to 0
.
When dispatching a node from another shader, the name is fixed at pipeline creation, but the index can be set dynamically. By associating multiple shaders with the same name but different indexes, applications can dynamically select different nodes to execute. Applications must ensure each node has a unique name and index.
To query the internal node index for a particular node in an execution graph, call:
// Provided by VK_AMDX_shader_enqueue
VkResult vkGetExecutionGraphPipelineNodeIndexAMDX(
VkDevice device,
VkPipeline executionGraph,
const VkPipelineShaderStageNodeCreateInfoAMDX* pNodeInfo,
uint32_t* pNodeIndex);
-
device
is the thatexecutionGraph
was created on. -
executionGraph
is the execution graph pipeline to query the internal node index for. -
pNodeInfo
is a pointer to a VkPipelineShaderStageNodeCreateInfoAMDX structure identifying the name and index of the node to query. -
pNodeIndex
is the returned internal node index of the identified node.
Once this function returns, the contents of pNodeIndex
contain the
internal node index of the identified node.
44.2. Initializing Scratch Memory
Implementations may need scratch memory to manage dispatch queues or similar when executing a pipeline graph, and this is explicitly managed by the application.
To query the scratch space required to dispatch an execution graph, call:
// Provided by VK_AMDX_shader_enqueue
VkResult vkGetExecutionGraphPipelineScratchSizeAMDX(
VkDevice device,
VkPipeline executionGraph,
VkExecutionGraphPipelineScratchSizeAMDX* pSizeInfo);
-
device
is the thatexecutionGraph
was created on. -
executionGraph
is the execution graph pipeline to query the scratch space for. -
pSizeInfo
is a pointer to a VkExecutionGraphPipelineScratchSizeAMDX structure that will contain the required scratch size.
After this function returns, information about the scratch space required
will be returned in pSizeInfo
.
The VkExecutionGraphPipelineScratchSizeAMDX
structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkExecutionGraphPipelineScratchSizeAMDX {
VkStructureType sType;
void* pNext;
VkDeviceSize size;
} VkExecutionGraphPipelineScratchSizeAMDX;
-
sType
is a VkStructureType value identifying this structure. -
pNext
isNULL
or a pointer to a structure extending this structure. -
size
indicates the scratch space required for dispatch the queried execution graph.
To initialize scratch memory for a particular execution graph, call:
// Provided by VK_AMDX_shader_enqueue
void vkCmdInitializeGraphScratchMemoryAMDX(
VkCommandBuffer commandBuffer,
VkDeviceAddress scratch);
-
commandBuffer
is the command buffer into which the command will be recorded. -
scratch
is a pointer to the scratch memory to be initialized.
This command must be called before using scratch
to dispatch the
currently bound execution graph pipeline.
Execution of this command may modify any memory locations in the range
[scratch
,scratch
+ size
), where size
is the value
returned in VkExecutionGraphPipelineScratchSizeAMDX::size
by
VkExecutionGraphPipelineScratchSizeAMDX for the currently bound
execution graph pipeline.
Accesses to this memory range are performed in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT
pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT
and
VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT
access flags.
If any portion of scratch
is modified by any command other than
vkCmdDispatchGraphAMDX, vkCmdDispatchGraphIndirectAMDX,
vkCmdDispatchGraphIndirectCountAMDX, or
vkCmdInitializeGraphScratchMemoryAMDX
with the same execution graph,
it must be reinitialized for the execution graph again before dispatching
against it.
44.3. Dispatching a Graph
Initial dispatch of an execution graph is done from the host in the same way as any other command, and can be used in a similar way to compute dispatch commands, with indirect variants available.
To record an execution graph dispatch, call:
// Provided by VK_AMDX_shader_enqueue
void vkCmdDispatchGraphAMDX(
VkCommandBuffer commandBuffer,
VkDeviceAddress scratch,
const VkDispatchGraphCountInfoAMDX* pCountInfo);
-
commandBuffer
is the command buffer into which the command will be recorded. -
scratch
is a pointer to the scratch memory to be used. -
pCountInfo
is a host pointer to a VkDispatchGraphCountInfoAMDX structure defining the nodes which will be initially executed.
When this command is executed, the nodes specified in pCountInfo
are
executed.
Nodes executed as part of this command are not implicitly synchronized in
any way against each other once they are dispatched.
For this command, all device/host pointers in substructures are treated as host pointers and read only during host execution of this command. Once this command returns, no reference to the original pointers is retained.
Execution of this command may modify any memory locations in the range
[scratch
,scratch
+ size
), where size
is the value
returned in VkExecutionGraphPipelineScratchSizeAMDX::size
by
VkExecutionGraphPipelineScratchSizeAMDX for the currently bound
execution graph pipeline Accesses to this memory range are performed in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT
pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT
and
VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT
access flags.
To record an execution graph dispatch with node and payload parameters read on device, call:
// Provided by VK_AMDX_shader_enqueue
void vkCmdDispatchGraphIndirectAMDX(
VkCommandBuffer commandBuffer,
VkDeviceAddress scratch,
const VkDispatchGraphCountInfoAMDX* pCountInfo);
-
commandBuffer
is the command buffer into which the command will be recorded. -
scratch
is a pointer to the scratch memory to be used. -
pCountInfo
is a host pointer to a VkDispatchGraphCountInfoAMDX structure defining the nodes which will be initially executed.
When this command is executed, the nodes specified in pCountInfo
are
executed.
Nodes executed as part of this command are not implicitly synchronized in
any way against each other once they are dispatched.
For this command, all device/host pointers in substructures are treated as
device pointers and read during device execution of this command.
The allocation and contents of these pointers only needs to be valid during
device execution.
All of these addresses will be read in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT
pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT
access flag.
Execution of this command may modify any memory locations in the range
[scratch
,scratch
+ size
), where size
is the value
returned in VkExecutionGraphPipelineScratchSizeAMDX::size
by
VkExecutionGraphPipelineScratchSizeAMDX for the currently bound
execution graph pipeline.
Accesses to this memory range are performed in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT
pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT
and
VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT
access flags.
To record an execution graph dispatch with all parameters read on device, call:
// Provided by VK_AMDX_shader_enqueue
void vkCmdDispatchGraphIndirectCountAMDX(
VkCommandBuffer commandBuffer,
VkDeviceAddress scratch,
VkDeviceAddress countInfo);
-
commandBuffer
is the command buffer into which the command will be recorded. -
scratch
is a pointer to the scratch memory to be used. -
countInfo
is a device address of a VkDispatchGraphCountInfoAMDX structure defining the nodes which will be initially executed.
When this command is executed, the nodes specified in countInfo
are
executed.
Nodes executed as part of this command are not implicitly synchronized in
any way against each other once they are dispatched.
For this command, all pointers in substructures are treated as device
pointers and read during device execution of this command.
The allocation and contents of these pointers only needs to be valid during
device execution.
All of these addresses will be read in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT
pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT
access flag.
Execution of this command may modify any memory locations in the range
[scratch
,scratch
+ size
), where size
is the value
returned in VkExecutionGraphPipelineScratchSizeAMDX::size
by
VkExecutionGraphPipelineScratchSizeAMDX for the currently bound
execution graph pipeline.
Accesses to this memory range are performed in the
VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT
pipeline stage with the
VK_ACCESS_2_SHADER_STORAGE_READ_BIT
and
VK_ACCESS_2_SHADER_STORAGE_WRITE_BIT
access flags.
The VkDeviceOrHostAddressConstAMDX
union is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef union VkDeviceOrHostAddressConstAMDX {
VkDeviceAddress deviceAddress;
const void* hostAddress;
} VkDeviceOrHostAddressConstAMDX;
-
deviceAddress
is a buffer device address as returned by the vkGetBufferDeviceAddressKHR command. -
hostAddress
is a const host memory address.
The VkDispatchGraphCountInfoAMDX
structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkDispatchGraphCountInfoAMDX {
uint32_t count;
VkDeviceOrHostAddressConstAMDX infos;
uint64_t stride;
} VkDispatchGraphCountInfoAMDX;
-
count
is the number of dispatches to perform. -
infos
is the device or host address of a flat array of VkDispatchGraphInfoAMDX structures -
stride
is the byte stride between successive VkDispatchGraphInfoAMDX structures ininfos
Whether infos
is consumed as a device or host pointer is defined by
the command this structure is used in.
The VkDispatchGraphInfoAMDX
structure is defined as:
// Provided by VK_AMDX_shader_enqueue
typedef struct VkDispatchGraphInfoAMDX {
uint32_t nodeIndex;
uint32_t payloadCount;
VkDeviceOrHostAddressConstAMDX payloads;
uint64_t payloadStride;
} VkDispatchGraphInfoAMDX;
-
nodeIndex
is the index of a node in an execution graph to be dispatched. -
payloadCount
is the number of payloads to dispatch for the specified node. -
payloads
is a device or host address pointer to a flat array of payloads with size equal to the product ofpayloadCount
andpayloadStride
-
payloadStride
is the byte stride between successive payloads inpayloads
Whether payloads
is consumed as a device or host pointer is defined by
the command this structure is used in.
44.4. Shader Enqueue
Compute shaders in an execution graph can use the
OpInitializeNodePayloadsAMDX
to initialize nodes for dispatch.
Any node payload initialized in this way will be enqueued for dispatch once
the shader is done writing to the payload.
As compilers may be conservative when making this determination, shaders
can further call OpFinalizeNodePayloadsAMDX
to guarantee that the
payload is no longer being written.
The Node
Name
operand of the PayloadNodeNameAMDX
decoration
on a payload identifies the shader name of the node to be enqueued, and the
Shader
Index
operand of OpInitializeNodePayloadsAMDX
identifies the shader index.
A node identified in this way is dispatched as described in the following
sections.
44.4.1. Compute Nodes
Compute shaders added as nodes to an execution graph are executed
differently based on the presence or absence of the
StaticNumWorkgroupsAMDX
or CoalescingAMDX
execution modes.
Dispatching a compute shader node that does not declare either the
StaticNumWorkgroupsAMDX
or CoalescingAMDX
execution mode will
execute a number of workgroups in each dimension specified by the first 12
bytes of the payload, interpreted as a VkDispatchIndirectCommand.
The same payload will be broadcast to each workgroup in the same dispatch.
Additional values in the payload are have no effect on execution.
Dispatching a compute shader node with the StaticNumWorkgroupsAMDX
execution mode will execute workgroups in each dimension according to the
x
, y
, and z
size
operands to the
StaticNumWorkgroupsAMDX
execution mode.
The same payload will be broadcast to each workgroup in the same dispatch.
Any values in the payload have no effect on execution.
Dispatching a compute shader node with the CoalescingAMDX
execution
mode will enqueue a single invocation for execution.
Implementations may combine multiple such dispatches into the same
workgroup, up to the size of the workgroup.
The number of invocations coalesced into a given workgroup in this way can
be queried via the CoalescedInputCountAMDX
built-in.
Any values in the payload have no effect on execution.