Homography (computer vision)

In the field of computer vision, any two images of the same planar surface in space are related by a homography (assuming a pinhole camera model). This has many practical applications, such as image rectification, image registration, or camera motion—rotation and translation—between two images. Once camera resectioning has been done from an estimated homography matrix, this information may be used for navigation, or to insert models of 3D objects into an image or video, so that they are rendered with the correct perspective and appear to have been part of the original scene (see Augmented reality).

3D plane to plane equation
We have two cameras a and b, looking at points $$P_i$$ in a plane. Passing from the projection $${}^bp_i=\left({}^bu_i;{}^bv_i;1\right)$$ of $$P_i$$ in b to the projection $${}^ap_i=\left({}^au_i;{}^av_i;1\right)$$ of $$P_i$$ in a:


 * $${}^ap_i = \frac{{}^bz_i}{{}^az_i}K_a \cdot H_{ab} \cdot K_b^{-1} \cdot {}^bp_i$$

where $${}^az_i$$ and $${}^bz_i$$ are the z coordinates of P in each camera frame and where the homography matrix $$H_{ab}$$ is given by


 * $$H_{ab} = R - \frac{t n^T}{d}$$.

$$R$$ is the rotation matrix by which b is rotated in relation to a; t is the translation vector from a to b; n and d are the normal vector of the plane and the distance from origin to the plane respectively. Ka and Kb are the cameras' intrinsic parameter matrices.



The figure shows camera b looking at the plane at distance d. Note: From above figure, assuming $$n^T P_i + d = 0$$ as plane model, $$n^T P_i$$ is the projection of vector $$P_i$$ along $$n$$, and equal to $$-d$$. So $$t = t \cdot 1 = t \left(-\frac{n^TP_i}{d}\right)$$. And we have $$H_{ab} P_i = R P_i + t$$ where $$H_{ab} = R - \frac{t n^T}{d}$$.

This formula is only valid if camera b has no rotation and no translation. In the general case where $$R_a,R_b$$ and $$t_a,t_b$$ are the respective rotations and translations of camera a and b, $$R=R_a R_b^T$$ and the homography matrix $$H_{ab}$$ becomes


 * $$H_{ab} = R_a R_b^T - \frac{(-R_a * R_b^T * t_b + t_a) n^T}{d} $$

where d is the distance of the camera b to the plane.

Affine homography
When the image region in which the homography is computed is small or the image has been acquired with a large focal length, an affine homography is a more appropriate model of image displacements. An affine homography is a special type of a general homography whose last row is fixed to


 * $$h_{31}=h_{32}=0, \; h_{33}=1.$$

Toolboxes

 * homest is a GPL C/C++ library for robust, non-linear (based on the Levenberg–Marquardt algorithm) homography estimation from matched point pairs (Manolis Lourakis).
 * OpenCV is a complete (open and free) computer vision software library that has many routines related to homography estimation (cvFindHomography) and re-projection (cvPerspectiveTransform).