The OpenVX Specification
r31169
|
OpenVX is intended to be used either directly by applications or as the acceleration layer for higher-level vision frameworks, engines or platform APIs.
OpenVX is designed as a framework of standardized computer vision functions able to run on a wide variety of platforms and potentially to be accelerated by a vendor's implementation on that platform. OpenVX can improve the performance and efficiency of vision applications by providing an abstraction for commonly-used vision functions and an abstraction for aggregations of functions (a “graph”), thereby providing the implementer the opportunity to minimize the run-time overhead.
The functions in OpenVX 1.0 are intended to cover common functionality required by many vision applications.
This specification makes no statements as to which acceleration methodology or techniques may be used in its implementation. Vendors may choose any number of implementation methods such as parallelism and/or specialized hardware offload techniques.
This specification also makes no statement or requirements on a “level of performance” as this may vary significantly across platforms and use cases.
The OpenVX 1.0 focuses on vision functions that can be significantly accelerated by diverse hardware. Future versions of this specification may adopt additional vision functions into the core standard when hardware acceleration for those functions becomes practical.
OpenVX 1.0 has been designed to maximize functional and performance portability wherever possible, while recognizing that the API is intended to be used on a wide diversity of devices with specific constraints and properties. Tradeoffs are made for portability where possible: for example, portable Graphs constructed using this API should work on any OpenVX implementation and return similar results within the precision bounds defined by the OpenVX conformance tests.
To avoid forcing hardware-specific requirements onto any particular implementation, the API is designed to be opaque.
OpenVX is intended to address a very broad range of devices and platforms - from deeply embedded systems to desktop machines, and even distributed computing architectures.
The range of implementations is quite discreet, and as such, the API shall only address all these spaces through opaqueness.
All data, except client-facing structures, are opaque and hidden behind a reference that may be as thin or thick as an implementation needs. Each implementation provides the standardized interfaces for accessing data that takes care of specialized hardware, platform, or allocation requirements. Memory that is imported or shared from other APIs is not subsumed by OpenVX and is still maintained and accessible by the originator.
OpenVX does not dictate any requirements on memory allocation methods or the layout of opaque memory objects and it does not dictate byte packing or alignment for structures on architectures.
OpenVX objects are both strongly typed at compile-time for safety critical applications and are strongly typed at run-time for dynamic applications. Each object has its typedef'd type and its associated enumerated value in the vx_type_e
list. Any object may be down-cast to a vx_reference
safely to be used in functions that require this, specifically vxQueryReference
, which can be used to get the vx_type_e
value using an vx_enum
.
This specification defines the following OpenVX framework objects.
VX_TYPE_IMAGE
, or VX_TYPE_ARRAY
, or some other object type from vx_type_e
. VX_INPUT
, VX_OUTPUT
, or VX_BIDIRECTIONAL
. VX_PARAMETER_STATE_REQUIRED
, or VX_PARAMETER_STATE_OPTIONAL
.vx_parameter
is extracted from a Node, an additional attribute can be accessed: vx_reference
assigned to this parameter index from the Node creation function (e.g., vxSobel3x3Node
).Data objects are object that are processed by graphs in nodes.
vx_int16
values. Also contains a scaling factor for normalization. Used specifically with vxuConvolve
and vxConvolveNode
.vx_df_image_e
.vxTableLookupNode
and vxuTableLookup
.vx_image
objects.Error objects are specialized objects that may be returned from other object creator functions when serious platform issue occur (i.e., out of memory or out of handles). These can be checked at the time of creation of these objects, but checking also may be put-off until usage in other APIs or verification time, in which case, the implementation must return appropriate errors to indicate that an invalid object type was used.
The graph is the central computation concept of OpenVX. The purpose of using graphs to express the Computer Vision problem is to allow for the possibility of any implementation to maximize its optimization potential because all the operations of the graph and its dependencies are known ahead of time, before the graph is processed.
Graphs are composed of one or more nodes that are added to the graph through node creation functions. Graphs in OpenVX must be created ahead of processing time and verified by the implementation, after which they can be processed as many times as needed.
Graph Nodes are linked together via data dependencies with no explicitly-stated ordering. The same reference may be linked to other nodes. Linking has a limitation, however, in that only one node in a graph may output to any specific data object reference. That is, only a single writer of an object may exist in a given graph. This prevents indeterminate ordering from data dependencies. All writers in a graph shall produce output data before any reader of that data accesses it.
Graphs in OpenVX depend on data objects to link together nodes. When clients of OpenVX know that they do not need access to these intermediate data objects, they may be created as virtual
. Virtual data objects can be used in the same manner as non-virtual data objects to link nodes of a graph together; however, virtual data objects are different in the following respects.
virtual
create function from a Graph external perspective. Calls to Access/Commit from within client-defined functions may succeed as they are Graph internal.These restrictions enable vendors the ability to optimize some aspects of the data object or its usage. Some vendors may not allocate such objects, some may create intermediate sub-objects of the object, and some may allocate the object on remote, inaccessible memories. OpenVX does not proscribe which optimization the vendor does, merely that it may happen.
Parameters to node creation functions are defined as either atomic types, such as vx_int32
, vx_enum
, or as objects, such as vx_scalar
, vx_image
. The atomic variables of the Node creation functions shall be converted by the framework into vx_scalar
references for use by the Nodes. A node parameter of type vx_scalar
can be changed during the graph execution; whereas, a node parameter of an atomic type (vx_int32
etc.) require at least a graph revalidation if changed. All node parameter objects may be modified by retrieving the reference to the vx_parameter
via vxGetParameterByIndex
, and then passing that to vxQueryParameter
to retrieve the reference to the object.
If the type of the parameter is unknown, it may be retrieved with the same function.
Parameters may exist on Graphs, as well. These parameters are defined by the author of the Graph and each Graph parameter is defined as a specific parameter from a Node within the Graph using vxAddParameterToGraph
. Graph parameters communicate to the implementation that there are specific Node parameters that may be modified by the client between Graph executions. Additionally, they are parameters that the client may set without the reference to the Node but with the reference to the Graph using vxSetGraphParameterByIndex
. This allows for the Graph authors to construct Graph Factories. How these factories work falls outside the scope of this document.
Graphs must execute in both:
vxProcessGraph
will block until the graph has completed), and invxScheduleGraph
and vxWaitGraph
).In asynchronous mode, Graphs must be single-issue-per-reference. This means that given a constructed graph reference \(G\), it may be scheduled multiple times but only executes sequentially with respect to itself. Multiple graphs references given to the asynchronous graph interface do not have a defined behavior and may execute in parallel or in series based on the behavior or the vendor's implementation.
To use graphs several rules must be put in place to allow deterministic execution of Graphs. The behavior of a processGraph( \(G\))
call is determined by the structure of the Processing Graph \(G\). The Processing Graph is a bipartite graph consisting of a set of Nodes \(N_1 \ldots N_n\) and a set of data objects \(d_1 \ldots d_i\). Each edge ( \(N_x\), \(D_y\)) in the graph represents a data object \(D_y\) that is written by Node \(N_x\) and each edge ( \(D_x\), \(N_y\)) represents a data object \(D_x\) that is read by Node \(N_y\). Each edge \(e\) has a name Name( \(e\))
, which gives the parameter name of the node that references the corresponding data object. Each Node Parameter also as a type Type(node, name)
in {INPUT, OUTPUT, INOUT}
. Some data objects are Virtual, and some data objects are Delay. Delay data objects are just collections of data objects with indexing (like an image list) and known linking points in a graph. A node may be classified as a head node, which has no backward dependency. Alternatively, a node may be a dependent node, which has a backward dependency to the head node. In addition, the Processing Graph has several restrictions:
Type( \(N_x\), Name( \(N_x\), \(D_y\)))
in {OUTPUT, INOUT}
Type( \(N_y\), Name( \(D_x\), \(N_y\)))
in {INPUT}
or {INOUT}
Type( \(N_x\), Name( \(N_x\), \(D_y\)))
is INOUT implies \(D_y\) is non-Virtual. The execution of each node in a graph consists of an atomic operation (sometimes referred to as firing) that consumes data representing each input data object, processes it, and produces data representing each output data object. A node may execute when all of its input edges are marked present. Before the graph executes, the following initial marking is used:
Processing a node results in unmarking all the corresponding input edges and marking all its output edges; marking an output edge ( \(N_x\), \(D_y\)) where \(D_y\) is not a Delay results in marking all of the input edges ( \(D_y\), \(N_z\)). Following these rules, it is possible to statically schedule the nodes in a graph as follows: Construct a precedence graph \(P\), including all the nodes \(N_1 \ldots N_x\), and an edge ( \(N_x\), \(N_z\)) for every pair of edges ( \(N_x\), \(D_y\)) and ( \(D_y\), \(N_z\)) where \(D_y\) is not a Delay. Then unconditionally fire each node according to any topological sort of \(P\).
The following assertions should be verified:
The execution model described here just acts as a formalism. For example, independent processing is allowed across multiple depended and depending nodes and edges, provided that the result is invariant with the execution model described here.
In the following example a client computes the gradient magnitude and gradient phase from a blurred input image. The vxMagnitudeNode
and vxPhaseNode
are independently computed, in that each does not depend on the output of the other. OpenVX does not mandate that they are run simultaneously or in parallel, but it could be implemented this way by the OpenVX vendor.
The code to construct such a graph can be seen below.
Graphs within OpenVX must go through a rigorous validation process before execution to satisfy the design concept of eliminating run-time overhead (parameter checking) that guarantees safe execution of the graph. OpenVX must check for (but is not limited to) these conditions:
vx_direction_e
). vx_type_e
). \( 0.5 <= k <= 1.0 \)
). The implementation is not required to do run-time range checking of scalar values. If the value of the scalar changes at run time to go outside the range, the results are undefined. The rationale is that the potential performance hit for run-time range checking is too large to be enforced. It will still be checked at graph verification time as a time-zero sanity check. If the scalar is an output parameter of another node, it must be initialized to a legal value. In the case of vxScaleImageNode
, the relation of the input image dimensions to the output image dimensions determines the scaling factor. These values or attributes of data objects must be checked for compatibility on each platform. vx_graph
must be a Directed Acyclic Graph (DAG). No cycles or feedback is allowed. The vx_delay
object has been designed to explicitly address feedback between Graph executions. Callbacks are a method to control graph flow and to make decisions based on completed work. The vxAssignNodeCallback
call takes as a parameter a callback function. This function will be called after the execution of the particular node, but prior to the completion of the graph. If nodes are arranged into independent sets, the order of the callbacks is unspecified. Nodes that are arranged in a serial fashion due to data dependencies perform callbacks in order. The callback function may use the node reference first to extract parameters from the node, and then extract the data references. Data outputs of Nodes with callbacks shall be available (via Access/Commit methods) when the callback is called.
OpenVX supports the concept of client-defined functions that shall be executed as Nodes from inside the Graph or are Graph internal. The purpose of this paradigm is to:
In this example, to execute client-supplied functions, the graph does not have to be halted and then resumed. These nodes shall be executed in an independent fashion with respect to independent base nodes within OpenVX. This allows implementations to further minimize execution time if hardware to exploit this property exists.
User Kernels must aid in the Graph Verification effort by providing explicit validation functions for each vision function they implement. Each parameter passed to the instanced Node of a User Kernel is validated using the client-supplied validation functions. The client must check these attributes and/or values of each parameter:
Input validators execute before output validators. This allows any or all inputs to be used as dependents of output parameter validation.
The Meta Format Object is an opaque object used to collect requirements about the output parameter, which then the OpenVX implementation will check. The Client must manually set relevant object attributes to be checked against output parameters, such as dimensionality, format, scaling, etc.
There is a special case with vx_image
output parameters where the User Kernel output validation function can specify a positional and/or size-related change of the valid region of the output image relative to the input image during verification time. This is intended to give the optimizer more information about memory usage, and could lead to better outcomes or different strategies. Delta rectangles (specified using the vx_delta_rectangle_t
parameter) are used to update a valid region for the user kernels with a call to vxSetMetaFormatAttribute
from the output validator.
For example, for a 5x5 box filter where 2 border pixels of the output are lost (invalid), and with no center shift, use:
vx_delta_rectangle_t delta = {2, 2, -2, -2};
For the same 5x5 box filter, except with a center-shift into the upper-left corner:
vx_delta_rectangle_t delta = {0, 0, -4, -4};
If this attribute has not been set prior to graph verification, the graph manager must determine the new valid region based on vxCommitImagePatch
calls during the execution time.
User Kernels must be exported with a unique name (see Naming Conventions for information on OpenVX conventions) and a unique enumeration. Clients of OpenVX may use either the name or enumeration to retrieve a kernel, so collisions due to non-unique names will cause problems. The kernel enumerations may be extended by following this example:
Each vendor of a vision function or an implementation must apply to Khronos to get a unique identifier (up to a limit of \( 2^{12}-1 \) vendors). Until they obtain a unique ID vendors must use VX_ID_DEFAULT.
To construct a kernel enumeration, a vendor must have both their ID and a library ID. The library ID's are completely vendor defined (however when using the VX_ID_DEFAULT ID, many libraries may collide in namespace).
Once both are defined, a kernel enumeration may be constructed using the VX_KERNEL_BASE macro and an offset. (The offset is optional, but very helpful for long enumerations.)
OpenVX also contains an interface defined within <VX/vxu.h>
that allows for immediate execution of vision functions. These interfaces are prefixed with vxu
to distinguish them from the Node interfaces, which are of the form vx<Name>Node
. Each of these interfaces replicates a Node interface with some exceptions. Immediate mode functions are defined to behave as Single Node Graphs, which have no leaking side-effects (e.g., no Log entries) within the Graph Framework after the function returns. The following tables refer to both the Immediate Mode and Graph Mode vision functions. The Module documentation for each vision function draws a distinction on each API by noting that it is either an immediate mode function with the tag [Immediate]
or it is a Graph mode function by the tag [Graph]
.
OpenVX comes with a standard or base set of vision functions. The following table lists the supported set of vision functions, their input types (first table) and output types (second table), and the version of OpenVX in which they are supported.
Vision Function | U8 | U16 | S16 | S32 | U32 | F32 | color |
---|---|---|---|---|---|---|---|
AbsDiff | 1.0 | 1.0 | |||||
Accumulate | 1.0 | ||||||
AccumulateSquared | 1.0 | ||||||
AccumulateWeighted | 1.0 | ||||||
Add | 1.0 | 1.0 | |||||
And | 1.0 | ||||||
Box3x3 | 1.0 | ||||||
CannyEdgeDetector | 1.0 | ||||||
ChannelCombine | 1.0 | ||||||
ChannelExtract | 1.0 | ||||||
ColorConvert | 1.0 | ||||||
ConvertDepth | 1.0 | 1.0 | |||||
Convolve | 1.0 | ||||||
Dilate3x3 | 1.0 | ||||||
EqualizeHistogram | 1.0 | ||||||
Erode3x3 | 1.0 | ||||||
FastCorners | 1.0 | ||||||
Gaussian3x3 | 1.0 | ||||||
HarrisCorners | 1.0 | ||||||
HalfScaleGaussian | 1.0 | ||||||
Histogram | 1.0 | ||||||
IntegralImage | 1.0 | ||||||
TableLookup | 1.0 | ||||||
Magnitude | 1.0 | ||||||
MeanStdDev | 1.0 | ||||||
Median3x3 | 1.0 | ||||||
MinMaxLoc | 1.0 | 1.0 | |||||
Multiply | 1.0 | 1.0 | |||||
Not | 1.0 | ||||||
OpticalFlowLK | 1.0 | ||||||
Or | 1.0 | ||||||
Phase | 1.0 | ||||||
GaussianPyramid | 1.0 | ||||||
Remap | 1.0 | ||||||
ScaleImage | 1.0 | ||||||
Sobel3x3 | 1.0 | ||||||
Subtract | 1.0 | 1.0 | |||||
Threshold | 1.0 | ||||||
WarpAffine | 1.0 | ||||||
WarpPerspective | 1.0 | ||||||
Xor | 1.0 |
Vision Function | U8 | U16 | S16 | U32 | S32 | F32 | color |
---|---|---|---|---|---|---|---|
AbsDiff | 1.0 | 1.0 | |||||
Accumulate | 1.0 | ||||||
AccumulateSquared | 1.0 | ||||||
AccumulateWeighted | 1.0 | ||||||
Add | 1.0 | 1.0 | |||||
And | 1.0 | ||||||
Box3x3 | 1.0 | ||||||
CannyEdgeDetector | 1.0 | ||||||
ChannelCombine | 1.0 | ||||||
ChannelExtract | 1.0 | ||||||
ColorConvert | 1.0 | ||||||
ConvertDepth | 1.0 | 1.0 | |||||
Convolve | 1.0 | 1.0 | |||||
Dilate3x3 | 1.0 | ||||||
EqualizeHistogram | 1.0 | ||||||
Erode3x3 | 1.0 | ||||||
FastCorners | 1.0 | ||||||
Gaussian3x3 | 1.0 | ||||||
HarrisCorners | 1.0 | ||||||
HalfScaleGaussian | 1.0 | ||||||
Histogram | 1.0 | ||||||
IntegralImage | 1.0 | ||||||
TableLookup | 1.0 | ||||||
Magnitude | 1.0 | ||||||
MeanStdDev | 1.0 | ||||||
Median3x3 | 1.0 | ||||||
MinMaxLoc | 1.0 | 1.0 | 1.0 | ||||
Multiply | 1.0 | 1.0 | |||||
Not | 1.0 | ||||||
OpticalFlowLK | 1.0 | ||||||
Or | 1.0 | ||||||
Phase | 1.0 | ||||||
GaussianPyramid | 1.0 | ||||||
Remap | 1.0 | ||||||
ScaleImage | 1.0 | ||||||
Sobel3x3 | 1.0 | ||||||
Subtract | 1.0 | 1.0 | |||||
Threshold | 1.0 | ||||||
WarpAffine | 1.0 | ||||||
WarpPerspective | 1.0 | ||||||
Xor | 1.0 |
The lifecycle of the context is very simple.
OpenVX has four main phases of graph lifecycle:
vxCreateGraph
, and Nodes are connected together by data objects. vxProcessGraph
or vxScheduleGraph
. Between executions data may be updated by the client or some other external mechanism. The client of OpenVX may change reference of input data to a graph, but this may require the graph to be validated again by checking vxIsGraphVerified
. vxReleaseGraph
. All Nodes in the Graph are released.
All objects in OpenVX follow a similar lifecycle model. All objects are
vxCreate<Object><Method>
or retreived via vxGet<Object><Method>
from the parent object if they are internally created. vxRelease<Object>
or via vxReleaseContext
when all objects are released.This is an example of the Image Lifecycle using the OpenVX Framework API. This would also apply to other data types with changes to the types and function names.
For objects retrieved from OpenVX that are 2D in nature, such as vx_image
, vx_matrix
, and vx_convolution
, the manner in which the host-side has access to these memory regions is well-defined. OpenVX uses a row-major storage (that is each unit in a column is memory-adjacent to its row adjacent unit). Two-dimensional objects are always created (using vxCreateImage
or vxCreateMatrix
) in width (columns) by height (rows) notation, with the arguments in that order. When accessing these structures in “C” with two-dimensional arrays of declared size, the user must therefore provide the array dimensions in the reverse of the order of the arguments to the Create function. This layout ensures row-wise storage in C on the host. A pointer could also be allocated for the matrix data and would have to be indexed in this row-major method.
Images and Array differ slightly in how they are accessed due to more complex memory layout requirements.
Arrays only require a single value, the stride, instead of the entire addressing structure that images need.
Access/Commit pairs can also be called on individual elements of array using a method similar to this:
Beyond User Kernels there are other mechanisms for vendors to extend features in OpenVX. These mechanisms are not available to User Kernels.
When extending attributes, vendors must use their assigned ID from vx_vendor_id_e
in conjunction with the appropriate macros for creating new attributes with VX_ATTRIBUTE_BASE
. The typical mechanism to extend a new attribute for some object type (for example a vx_node
attribute from VX_ID_TI
) would look like this:
Vendors wanting to add more kernels to the base set supplied to OpenVX should provide a header of the form
that contains definitions of each of the following.
Some extensions affect base vision functions and thus may be invisible to most users. In these circumstances, the vendor must report the supported extensions to the base nodes through the VX_CONTEXT_ATTRIBUTE_EXTENSIONS
attribute on the context.
Extensions in this list are dependent on the extension itself; they may or may not have a header and new kernels or framework feature or data objects. The common feature is that they are implemented and supported by the implementation vendor.
The specification defines a Hinting API that allows Clients to feed information to the implementation for optional behavior changes. See Framework: Hints. It is assumed that most of the hints will be vendor- or implementation-specific. Check with the OpenVX implementation vendor for information on vendor-specific extensions.
The specification defines a Directive API to control implementation behavior. See Framework: Directives. This may allow things like disabling parallelism for debugging, enabling cache writing-through for some buffers, or any implementation-specific optimization.
The User Kernel Tiling facility enables optimizations of the user kernels (e.g., locality of execution or parallelism) when performing computation on the image data. Modern processors have a diverse memory hierarchy that varies from relatively small but fast and expensive memory to relatively large but slow and inexpensive memory. Image data are typically too large to fit into the fast but small memory. The ability to break the image data into smaller sized units allows for optimized computation on these smaller units with fast memory access or parallel execution of a user kernel on multiple image tiles simultaneously. The OpenVX Graph Manager possesses the knowledge about the memory hierarchy of the platform and is hence in a position to break the image data into smaller units for memory optimization. Knowledge of the memory access pattern of an algorithm is key for the graph manager to enable optimizations.
The Khronos OpenVX Working Group will include this extension as part of the OpenVX 1.1 specification, contingent on community feedback.
Copyright
The Khronos Group 2011-2013. OpenVX™, OpenCL™, OpenGL™, and OpenMAX™ are trademarks of the Khronos Group™.