Detailed Description

Extracts Histogram of Oriented Gradients features from the input grayscale image.

The Histogram of Oriented Gradients (HOG) vision function is split into two nodes vxHOGCellsNode and vxHOGFeaturesNode. The specification of these nodes cover a subset of possible HOG implementations. The vxHOGCellsNode calculates the gradient orientation histograms and average gradient magnitudes for each of the cells. The vxHOGFeaturesNode uses the cell histograms and optionally the average gradient magnitude of the cells to produce a HOG feature vector. This involves grouping up the cell histograms into blocks which are then normalized. A moving window is applied to the input image and for each location the block data associated with the window is concatenated to the HOG feature vector.

Data Structures
struct	vx_hog_t
	The HOG descriptor structure. More...

Functions
vx_node VX_API_CALL	vxHOGCellsNode (vx_graph graph, vx_image input, vx_int32 cell_width, vx_int32 cell_height, vx_int32 num_bins, vx_tensor magnitudes, vx_tensor bins)
	[Graph] Performs cell calculations for the average gradient magnitude and gradient orientation histograms. More...

vx_node VX_API_CALL	vxHOGFeaturesNode (vx_graph graph, vx_image input, vx_tensor magnitudes, vx_tensor bins, const vx_hog_t *params, vx_size hog_param_size, vx_tensor features)
	[Graph] The node produces HOG features for the W1xW2 window in a sliding window fashion over the whole input image. Each position produces a HOG feature vector. More...

vx_status VX_API_CALL	vxuHOGCells (vx_context context, vx_image input, vx_int32 cell_width, vx_int32 cell_height, vx_int32 num_bins, vx_tensor magnitudes, vx_tensor bins)
	[Immediate] Performs cell calculations for the average gradient magnitude and gradient orientation histograms. More...

vx_status VX_API_CALL	vxuHOGFeatures (vx_context context, vx_image input, vx_tensor magnitudes, vx_tensor bins, const vx_hog_t *params, vx_size hog_param_size, vx_tensor features)
	[Immediate] Computes Histogram of Oriented Gradients features for the W1xW2 window in a sliding window fashion over the whole input image. More...

Data Structure Documentation

struct vx_hog_t

The HOG descriptor structure.

Definition at line 1699 of file vx_types.h.

Data Fields
vx_int32	cell_width	The histogram cell width of type `VX_TYPE_INT32`.
vx_int32	cell_height	The histogram cell height of type `VX_TYPE_INT32`.
vx_int32	block_width	The histogram block width of type `VX_TYPE_INT32`. Must be divisible by cell_width.
vx_int32	block_height	The histogram block height of type `VX_TYPE_INT32`. Must be divisible by cell_height.
vx_int32	block_stride	The histogram block stride within the window of type `VX_TYPE_INT32`. Must be an integral number of cell_width and cell_height.
vx_int32	num_bins	The histogram size of type `VX_TYPE_INT32`.
vx_int32	window_width	The feature descriptor window width of type `VX_TYPE_INT32`
vx_int32	window_height	The feature descriptor window height of type `VX_TYPE_INT32`
vx_int32	window_stride	The feature descriptor window stride of type `VX_TYPE_INT32`
vx_float32	threshold	The threshold for the maximum L2-norm value for a histogram bin. It is used as part of block normalization. It defaults to 0.2.

Function Documentation

vx_node VX_API_CALL vxHOGCellsNode	(	vx_graph	graph,
		vx_image	input,
		vx_int32	cell_width,
		vx_int32	cell_height,
		vx_int32	num_bins,
		vx_tensor	magnitudes,
		vx_tensor	bins
	)

[Graph] Performs cell calculations for the average gradient magnitude and gradient orientation histograms.

Firstly, the gradient magnitude and gradient orientation are computed for each pixel in the input image. Two 1-D centred, point discrete derivative masks are applied to the input image in the horizontal and vertical directions.

\[ M_h = [-1, 0, 1] \]

and

\[ M_v = [-1, 0, 1]^T \]

\(G_v\) is the result of applying mask \(M_v\) to the input image, and \(G_h\) is the result of applying mask \(M_h\) to the input image. The border mode used for the gradient calculation is implementation dependent. Its behavior should be similar to VX_BORDER_UNDEFINED. The gradient magnitudes and gradient orientations for each pixel are then calculated in the following manner.

\[ G(x,y) = \sqrt{G_v(x,y)^2 + G_h(x,y)^2} \]

\[ \theta(x,y) = arctan(G_v(x,y), G_h(x,y)) \]

where \(arctan(v, h)\) is \( tan^{-1}(v/h)\) when \(h!=0\),

\( -pi/2 \) if \(v<0\) and \(h==0\),

\( pi/2 \) if \(v>0\) and \(h==0\)

and \( 0 \) if \(v==0\) and \(h==0\)

Secondly, the gradient magnitudes and orientations are used to compute the bins output tensor and optional magnitudes output tensor. These tensors are computed on a cell level where the cells are rectangular in shape. The magnitudes tensor contains the average gradient magnitude for each cell.

\[magnitudes(c) = \frac{1}{(cell\_width * cell\_height)}\sum\limits_{w=0}^{cell\_width} \sum\limits_{h=0}^{cell\_height} G_c(w,h)\]

where \(G_c\) is the gradient magnitudes related to cell \(c\). The bins tensor contains histograms of gradient orientations for each cell. The gradient orientations at each pixel range from 0 to 360 degrees. These are quantised into a set of histogram bins based on the num_bins parameter. Each pixel votes for a specific cell histogram bin based on its gradient orientation. The vote itself is the pixel's gradient magnitude.

\[bins(c, n) = \sum\limits_{w=0}^{cell\_width} \sum\limits_{h=0}^{cell\_height} G_c(w,h) * 1[B_c(w, h, num\_bins) == n]\]

where \(B_c\) produces the histogram bin number based on the gradient orientation of the pixel at location ( \(w\), \(h\)) in cell \(c\) based on the \(num\_bins\) and

\[1[B_c(w, h, num\_bins) == n]\]

is a delta-function with value 1 when \(B_c(w, h, num\_bins) == n\) or 0 otherwise.

Parameters

[in]	graph	The reference to the graph.
[in]	input	The input image of type `VX_DF_IMAGE_U8`.
[in]	cell_width	The histogram cell width of type `VX_TYPE_INT32`.
[in]	cell_height	The histogram cell height of type `VX_TYPE_INT32`.
[in]	num_bins	The histogram size of type `VX_TYPE_INT32`.
[out]	magnitudes	(Optional) The output average gradient magnitudes per cell of `vx_tensor` of type `VX_TYPE_INT16` of size \( [floor(image_{width}/cell_{width}) ,floor(image_{height}/cell_{height}) ] \).
[out]	bins	The output gradient orientation histograms per cell of `vx_tensor` of type `VX_TYPE_INT16` of size \( [floor(image_{width}/cell_{width}) ,floor(image_{height}/cell_{height}), num_{bins}] \).

Returns: vx_node.

Return values

0	Node could not be created.
*	Node handle.

vx_node VX_API_CALL vxHOGFeaturesNode	(	vx_graph	graph,
		vx_image	input,
		vx_tensor	magnitudes,
		vx_tensor	bins,
		const vx_hog_t *	params,
		vx_size	hog_param_size,
		vx_tensor	features
	)

[Graph] The node produces HOG features for the W1xW2 window in a sliding window fashion over the whole input image. Each position produces a HOG feature vector.

Firstly if a magnitudes tensor is provided the cell histograms in the bins tensor are normalised by the average cell gradient magnitudes.

\[bins(c,n) = \frac{bins(c,n)}{magnitudes(c)}\]

To account for changes in illumination and contrast the cell histograms must be locally normalized which requires grouping the cell histograms together into larger spatially connected blocks. Blocks are rectangular grids represented by three parameters: the number of cells per block, the number of pixels per cell, and the number of bins per cell histogram. These blocks typically overlap, meaning that each cell histogram contributes more than once to the final descriptor. To normalize a block its cell histograms \(h\) are grouped together to form a vector \(v = [h_1, h_2, h_3, ... , h_n]\). This vector is normalised using L2-Hys which means performing L2-norm on this vector; clipping the result (by limiting the maximum values of v to be threshold) and renormalizing again. If the threshold is equal to zero then L2-Hys normalization is not performed.

\[L2norm(v) = \frac{v}{\sqrt{\|v\|_2^2 + \epsilon^2}}\]

where \( \|v\|_k \) be its k-norm for k=1, 2, and \( \epsilon \) be a small constant. For a specific window its HOG descriptor is then the concatenated vector of the components of the normalized cell histograms from all of the block regions contained in the window. The W1xW2 window starting position is at coordinates 0x0. If the input image has dimensions that are not an integer multiple of W1xW2 blocks with the specified stride, then the last positions that contain only a partial W1xW2 window will be calculated with the remaining part of the W1xW2 window padded with zeroes. The Window W1xW2 must also have a size so that it contains an integer number of cells, otherwise the node is not well-defined. The final output tensor will contain HOG descriptors equal to the number of windows in the input image. The output features tensor has 3 dimensions, given by:

\[[ (floor((image_{width}-window_{width})/window_{stride}) + 1),\]

\[ (floor((image_{height}-window_{height})/window_{stride}) + 1),\]

\[ floor((window_{width} - block_{width})/block_{stride} + 1) * floor((window_{height} - block_{height})/block_{stride} + 1) *\]

\[ (((block_{width} * block_{height}) / (cell_{width} * cell_{height})) * num_{bins})] \]

See vxCreateTensor and vxCreateVirtualTensor. We recommend the output tensors always be virtual objects, with this node connected directly to the classifier. The output tensor will be very large, and using non-virtual tensors will result in a poorly optimized implementation. Merging of this node with a classifier node such as that described in the classifier extension will result in better performance. Notice that this node creation function has more parameters than the corresponding kernel. Numbering of kernel parameters (required if you create this node using the generic interface) is explicitly specified here.

Parameters

[in]	graph	The reference to the graph.
[in]	input	The input image of type `VX_DF_IMAGE_U8`. (Kernel parameter #0)
[in]	magnitudes	(Optional) The gradient magnitudes per cell of `vx_tensor` of type `VX_TYPE_INT16`. It is the output of `vxHOGCellsNode`. (Kernel parameter #1)
[in]	bins	The gradient orientation histograms per cell of `vx_tensor` of type `VX_TYPE_INT16`. It is the output of `vxHOGCellsNode`. (Kernel parameter #2)
[in]	params	The parameters of type `vx_hog_t`. (Kernel parameter #3)
[in]	hog_param_size	Size of `vx_hog_t` in bytes. Note that this parameter is not counted as one of the kernel parameters.
[out]	features	The output HOG features of `vx_tensor` of type `VX_TYPE_INT16`. (Kernel parameter #4)

Returns: vx_node.

Return values

0	Node could not be created.
*	Node handle.

vx_status VX_API_CALL vxuHOGCells	(	vx_context	context,
		vx_image	input,
		vx_int32	cell_width,
		vx_int32	cell_height,
		vx_int32	num_bins,
		vx_tensor	magnitudes,
		vx_tensor	bins
	)

[Immediate] Performs cell calculations for the average gradient magnitude and gradient orientation histograms.

Firstly, the gradient magnitude and gradient orientation are computed for each pixel in the input image. Two 1-D centred, point discrete derivative masks are applied to the input image in the horizontal and vertical directions.

\[ M_h = [-1, 0, 1] \]

and

\[ M_v = [-1, 0, 1]^T \]

\(G_v\) is the result of applying mask \(M_v\) to the input image, and \(G_h\) is the result of applying mask \(M_h\) to the input image. The border mode used for the gradient calculation is implementation dependent. Its behavior should be similar to VX_BORDER_UNDEFINED. The gradient magnitudes and gradient orientations for each pixel are then calculated in the following manner.

\[ G(x,y) = \sqrt{G_v(x,y)^2 + G_h(x,y)^2} \]

\[ \theta(x,y) = arctan(G_v(x,y), G_h(x,y)) \]

where \(arctan(v, h)\) is \( tan^{-1}(v/h)\) when \(h!=0\),

\( -pi/2 \) if \(v<0\) and \(h==0\),

\( pi/2 \) if \(v>0\) and \(h==0\)

and \( 0 \) if \(v==0\) and \(h==0\)

Secondly, the gradient magnitudes and orientations are used to compute the bins output tensor and optional magnitudes output tensor. These tensors are computed on a cell level where the cells are rectangular in shape. The magnitudes tensor contains the average gradient magnitude for each cell.

\[magnitudes(c) = \frac{1}{(cell\_width * cell\_height)}\sum\limits_{w=0}^{cell\_width} \sum\limits_{h=0}^{cell\_height} G_c(w,h)\]

where \(G_c\) is the gradient magnitudes related to cell \(c\). The bins tensor contains histograms of gradient orientations for each cell. The gradient orientations at each pixel range from 0 to 360 degrees. These are quantised into a set of histogram bins based on the num_bins parameter. Each pixel votes for a specific cell histogram bin based on its gradient orientation. The vote itself is the pixel's gradient magnitude.

\[bins(c, n) = \sum\limits_{w=0}^{cell\_width} \sum\limits_{h=0}^{cell\_height} G_c(w,h) * 1[B_c(w, h, num\_bins) == n]\]

where \(B_c\) produces the histogram bin number based on the gradient orientation of the pixel at location ( \(w\), \(h\)) in cell \(c\) based on the \(num\_bins\) and

\[1[B_c(w, h, num\_bins) == n]\]

is a delta-function with value 1 when \(B_c(w, h, num\_bins) == n\) or 0 otherwise.

Parameters

[in]	context	The reference to the overall context.
[in]	input	The input image of type `VX_DF_IMAGE_U8`.
[in]	cell_width	The histogram cell width of type `VX_TYPE_INT32`.
[in]	cell_height	The histogram cell height of type `VX_TYPE_INT32`.
[in]	num_bins	The histogram size of type `VX_TYPE_INT32`.
[out]	magnitudes	The output average gradient magnitudes per cell of `vx_tensor` of type `VX_TYPE_INT16` of size \( [floor(image_{width}/cell_{width}) ,floor(image_{height}/cell_{height}) ] \).
[out]	bins	The output gradient orientation histograms per cell of `vx_tensor` of type `VX_TYPE_INT16` of size \( [floor(image_{width}/cell_{width}) ,floor(image_{height}/cell_{height}), num_{bins}] \).

Returns: A vx_status_e enumeration.

Return values

VX_SUCCESS	Success
*	An error occurred. See `vx_status_e`.

vx_status VX_API_CALL vxuHOGFeatures	(	vx_context	context,
		vx_image	input,
		vx_tensor	magnitudes,
		vx_tensor	bins,
		const vx_hog_t *	params,
		vx_size	hog_param_size,
		vx_tensor	features
	)

[Immediate] Computes Histogram of Oriented Gradients features for the W1xW2 window in a sliding window fashion over the whole input image.

Firstly if a magnitudes tensor is provided the cell histograms in the bins tensor are normalised by the average cell gradient magnitudes.

\[bins(c,n) = \frac{bins(c,n)}{magnitudes(c)}\]

To account for changes in illumination and contrast the cell histograms must be locally normalized which requires grouping the cell histograms together into larger spatially connected blocks. Blocks are rectangular grids represented by three parameters: the number of cells per block, the number of pixels per cell, and the number of bins per cell histogram. These blocks typically overlap, meaning that each cell histogram contributes more than once to the final descriptor. To normalize a block its cell histograms \(h\) are grouped together to form a vector \(v = [h_1, h_2, h_3, ... , h_n]\). This vector is normalised using L2-Hys which means performing L2-norm on this vector; clipping the result (by limiting the maximum values of v to be threshold) and renormalizing again. If the threshold is equal to zero then L2-Hys normalization is not performed.

\[L2norm(v) = \frac{v}{\sqrt{\|v\|_2^2 + \epsilon^2}}\]

where \( \|v\|_k \) be its k-norm for k=1, 2, and \( \epsilon \) be a small constant. For a specific window its HOG descriptor is then the concatenated vector of the components of the normalized cell histograms from all of the block regions contained in the window. The W1xW2 window starting position is at coordinates 0x0. If the input image has dimensions that are not an integer multiple of W1xW2 blocks with the specified stride, then the last positions that contain only a partial W1xW2 window will be calculated with the remaining part of the W1xW2 window padded with zeroes. The Window W1xW2 must also have a size so that it contains an integer number of cells, otherwise the node is not well-defined. The final output tensor will contain HOG descriptors equal to the number of windows in the input image. The output features tensor has 3 dimensions, given by:

\[[ (floor((image_{width}-window_{width})/window_{stride}) + 1),\]

\[ (floor((image_{height}-window_{height})/window_{stride}) + 1),\]

\[ floor((window_{width} - block_{width})/block_{stride} + 1) * floor((window_{height} - block_{height})/block_{stride} + 1) *\]

\[ (((block_{width} * block_{height}) / (cell_{width} * cell_{height})) * num_{bins})] \]

See vxCreateTensor and vxCreateVirtualTensor. The output tensor from this function may be very large. For this reason, is it not recommended that this "immediate mode" version of the function be used. The preferred method to perform this function is as graph node with a virtual tensor as the output.

Parameters

[in]	context	The reference to the overall context.
[in]	input	The input image of type `VX_DF_IMAGE_U8`.
[in]	magnitudes	The averge gradient magnitudes per cell of `vx_tensor` of type `VX_TYPE_INT16`. It is the output of `vxuHOGCells`.
[in]	bins	The gradient orientation histogram per cell of `vx_tensor` of type `VX_TYPE_INT16`. It is the output of `vxuHOGCells`.
[in]	params	The parameters of type `vx_hog_t`.
[in]	hog_param_size	Size of `vx_hog_t` in bytes.
[out]	features	The output HOG features of `vx_tensor` of type `VX_TYPE_INT16`.

Returns: A vx_status_e enumeration.

Return values

VX_SUCCESS	Success
*	An error occurred. See `vx_status_e`.

Detailed Description

Data Structures

Functions

Data Structure Documentation

Function Documentation