User:KYN/Structure tensor

In computer vision and image processing the structure tensor is a feature descriptor, computed from a local image region, which holds information about the region's structure, for example in terms of orientation or intrinsic dimension. It has been developed by several authors and to be used for different applications. In the literature, the structure tensor is sometimes referred to as interest operator or second moment matrix.

Formally, the structure tensor is computed as a local weighted average of the outer product of the image gradient with itself:


 * $$ \mathbf{T}(\mathbf{x}) = w * \left([\nabla I] [\nabla I]^{T}\right) $$

where $$ w\, $$ is the weighting function, $$ I\, $$ is the image intensity function and $$ \nabla I $$ is the gradient of that function.

An important fact about the structure tensor is that it can be defined for image data of arbitrary outer dimension, not only for 2D images. As a consequence, the structure tensor has found applications not only in the analysis of 2-dimensional images, but also in the analysis of 3-dimensional image data, such as MRI images, and image sequences, for example in motion estimation.

How the structure tensor is computed
Here, the function $$ I\, $$ is the image intensity which in practice is a function of discrete image coordinates. Typically, $$ I\, $$ represents the intensity of a gray value image, but the exact interpretation of $$ I\, $$ in terms of what it actually is an intensity of is not necessary in order for $$ \mathbf{T} $$ to be well-defined. The outer dimension of the intensity function, or the number of image coordinates, is here denoted $$ n\, $$

According to the formal definition given above, the structure tensor can be computed by first estimating the local gradient of $$ I\, $$ at each point in the image. Since the image intensity normally has been sampled without strict compliance to the sampling theorem, and since a truncated operator has to be used for computing the gradient, only an estimate of the intensity gradient can be obtained. There are several proposals for implementing such a gradient estimate, ranging from very small filters which are computationally efficient but in general produce poor estimates, to larger filters which give a more accurate estimate but at higher computational cost. Either way, the result of this operation is a new image which at each image point holds an n-dimensional vector, the local gradient.

Next, at each point in the gradient image, the gradient is multiplied with self in an outer product. The result of this operation is a new image which at each image point holds an $$ n \times n $$ symmetric matrix. Due to the symmetry of this matrix, it is possible to reduce the data by storing only the $$ n (n + 1)/2 $$ linearly independent elements of this matrix, for example, the upper right triangular elements of the matrix including the diagonal.

Finally, this matrix is filtered by using the function w as a filter. In all applications for which the structure tensor has been developed, w is assumed to represent some type of low-pass filter, typically truncated approximation of an isotropic Gaussian function of suitable size. The low-pass character of the filter means that the result of the filtering can be interpreted in terms of a weighted average of the outer product matrix over local image regions. For non-Gaussian w, the isotropy is still important, as is discussed below. A practical implementation of this filtering operation implies that each of the elements of the previous matrix has to be filtered independently with the same filter w. However, due to the symmetry of the matrix it is possible to reduce the number of filtering operations to $$ n (n + 1)/2 $$.

As a result of this computation, it should be clear that the concept of the structure tensor referres to a specific tensor which has been computed from a specific but arbitrary image region. The size of this region is determined by the sizes of the operators which are used to estimate the gradient and the filter w.

Interpretation of the structure tensor
Most derivations of the the structure tensor make a connection between the local region from where it has been computed and its eigenvalues and eigenvectors. This implies that once the tensor has been computed for an image, the next step is often to analyze each and every tensor by first decomposing it into its eigenvalues and eigenvectors. In some applications, however, certain information about the local image structure can be derived by applying simpler computations directly on the elements of the structure tensor.

In general, the eigenvalues of the structure tensor describe the intrinsic dimension of the local image region from where the tensor is computed. More precisely, if the region is intrinsic i-dimensional, then the structure tensor has i non-zero eigenvalues. In this context, it is worth pointing out that the construction of the structure tensor assures that it is positive semi-definite.

The eigenvectors are in general providing information about the dominant orientation of the region. More specifically, the eigenvector corresponding to non-zero eigenvalues (or to zero eigenvalues depending on the application) describe the orientation of a region in terms of a subspace which can be described as the one that produce the largest possible variation of the local image when image coordinates are changed

However, the concept of orientation is only well-defined if the intrinsic dimensionality is 0 < i < n. Also, only

Applications
The derivation of the structure tensor can be made from different initial assumptions and problem formulations. The resulting tensor has been proposed independently and about the same time by several authors which have been working on rather different applications. Interestingly, the resulting tensor has always appeared as part of solving an optimization problem, where some error or other target function is to be minimized or maximized. It is the formulation of the optimization problem which is different for the various applications described here.

Förstner and Gülch: interest point detection
Förstner and Gülch study the problem of detecting interest points which are described in terms of a least squares estimation problem defined within a local image region. Four different cases of point estimation are studied, including finding a corner point defined as the intersection of two or more edges. It is shown that all four cases lead to equations which includes a normal equation matrix N in the form of a structure tensor as it is defined above. Furthermore, it is shown that the eigensystem of N is related to the error of the estimated point. More precisely, an error ellipse can be defined around the estimated point such that the true point lies within the ellipse with a certainty depending on the size of the ellipse. The size of the ellipse, for a given certainty, depends on the eigenvalues of N, and can be determined by means of rather simple computations on the elements of N. An interest point operator is proposed based on first computing N for all points in an image and then detecting those point where the error ellipse is smaller than a specified threshold. These points are input to a non-max suppression algorithm which extracts the points which are local minimas in terms of error ellipse size. The resulting points are the detected interest points. The authors use a box-function as w, but also propose that triangular or Gaussian shaped functions can be useful.

Bigün and Granlund: orientation estimation
Bigün and Granlund study the problem of finding the best orientation estimate given a local region. An image region which has a well-defined orientation is in the Fourier domain represented as an impulse line which passes through the origin. In the presence of noise, or if there are more than one orientation present, there may be non-zero components scattered over the entire frequency domain, and the problem is to find the line which can be optimally fitted to this data. It is shown that this line is given by an eigenvector of a inertia matrix J, containing the inertia matrix of the spectrum of the local region. It is then shown that J can be approximated by means of operations in the signal domain, resulting in an expression similar to the structure tensor defined above. By making an eigensystem decomposition of the resulting J, computed at each point in the image, an analysis of the eigenvalues and eigenvectors provides information both about the dominant orientation and the certainty that the corresponding region can be described only by that orientation. A Gaussian window function w is suggested.

Harris and Stephens: interest point detection
Harris and Stephens also studied the problem of finding interest point and came up with the same solution as Förstner and Gülch, but from different initial formulation of the problem. A change function E(x,y) is defined which describes how different an image region becomes if its position is changed by displacement (x,y). A linearization of E then leads to its Hessian matrix M. It is then argued that both interest point, characterized by a local maximum or minimum of E, as well as edge points, where E is constant in one orientation and varies in the perpendicular orientation, can be detected by analyzing the eigenvalues of M. In the case of interest points, the analysis is a minor variation of the operator described by Förstner and Gülch. A Gaussian window function w is used.