User:Shurjo 1234/Black–Jepson method

The Black-Jepson method is a method for computing optical flow that was developed by Allan Jepson and Micheal Black in 1993. It was one of the first optical flow algorithms that could successfully and efficiently detect multiple directions of optical flow within a single image patch.

Concept
Many optical flow algorithms often operate under the constraint that for a finite image region, only one direction of flow will be present (i.e. a single motion flow). This assumption is often violated in cases involving transparency, depth discontinuities, shadows, specular reflections, etc. In such cases, tradtional algorithms often generate faulty solutions for optical flow measurement. The Black-Jepson method relaxes the single motion assumption and attempts to detect multiple directions of optical flow within the image region (i.e. multiple motions). Intuitively, the key idea here is to note that when multiple motions are present within an image region, the individual motion estimates within a region form distinct clusters. The Black-Jepson method aims to take advantage of this clustering by the creation of a probabilistic mixture of distributions that parameterize each cluster and thereby each direction of optical flow present within the image region.

Mathematical Details
Let $$I(\mathbf{x}(t), t)$$ be the given image region. Let $$S(\mathbf{x}(t), t)$$ be the image obtained after some kind of pre-processing of $$I(\mathbf{x}(t), t)$$ (eg. smoothing). $$\mathbf{x}(t)$$ is a pixel location that is evolving over the current time, $$t$$. The processed image region $$S$$ is assumed to be locally constant over space and time in the direction of image motion. This is called the data conservation constraint.

$$ S(\mathbf{x}(t), t) = $$ constant

Differentiating this equation with respect to time gives us the motion constraint equation.

$$\vec{\nabla}S(\mathbf{x}(t), t).\mathbf{v}(\mathbf{x}(t), t) = 0 $$

Here $$\mathbf{v}(t)$$ is a three dimensional vector in homogeneous coordinates where



\quad\quad \mathbf{v} = \begin{bmatrix} v_1\\[10pt] v_2\\ \\ 1 \end{bmatrix} \quad\quad $$

that describes the 2D image velocity of the region. $$v_1$$ and $$ v_2$$ are related by the motion constraint equation but do not have a unique solution. This is the well known aperture problem (see Optical Flow for further description). Typically, these equations are solved by generating constraints based on the spatiotemporal gradient around each point and then solving the motion constraint equation with the assumption of a single underlying motion model. The motion model is allowed can be constant, affine or even quadratic.

The Black-Jepson method assumes that each of the underlying motions present in an image region form distinct clusters. To model this, a parameterized layer is independently generated to describe the motion of each layer. It is important to note that each layer is allowed to occupy all of a given image region (the clustering comes about as a result of the later optimization step). Mixture model theory is then used to combine these layers in such a way as to form a parameterized description of all the directions of optical flow within the image region. Each individual layer is generated in a manner similar to traditional algorithms i.e. the motion model is allowed to be constant, affine or quadratic within the given image region. To determine each model, a "component probability" is assigned to each layer that is a function of a measured constraint vector given a point's spatial location and model parameters ( $$\mathbf{p_n}(\mathbf{c_k}|\mathbf{x_k}, \mathbf{a_k})$$ where $$\mathbf{a_k}$$ are the model parameters). To combine the layers, additional parameters $$\mathbf{m_n}$$ called "mixture probabilities" are created and assigned to each of the different layers with the adittional constraint that they must add up to 1 ($$\sum_{n=0}^{N}m_n = 1$$). The individual layer probabilities are combined to form a comprehensive probability function that describes the probability of a certain constraint vector occurring given measurements made from each layer.

$$p(\mathbf{c_k}|\mathbf{x_k}, \mathbf{a_1},...,\mathbf{a_N}) = \sum_{n=0}^{N} m_n p_n(\mathbf{c_k}|\mathbf{x_k}, \mathbf{a_k})$$

These equations are used to parameterize every point. To generate a model of the complete image, we combine all these probabilities over all the points. For ease of computation we use a log-likelihood function over all the points to estimate our motion parameters ($$\mathbf{a_1},...,\mathbf{a_N}$$) and our mixture probabilities ($$\mathbf{m_n}$$).

$$log L(\mathbf{m},\mathbf{a_1},...,\mathbf{a_N}) = \sum_{k=1}^{K} log(p(\mathbf{c_k}|\mathbf{x_k}, \mathbf{a_1},...,\mathbf{a_N}))$$

The iterative EM algorithm  is used to maximize the likelihood function by fine tuning the parameters of each layer in relation to each other. Intuitively, the EM algorithm isolates each cluster and maximizes its likelihood. The parameters generated at the end of the algorithm thus give the multiple directions of motion that are present within a given image region.

Advantages over Previous Methods

 * Detection of multiple frames of motion in an image region
 * Accurate detection and removal of outliers
 * Accurate estimation in the presence of occlusion boundaries.
 * Accurate estimation in the presence of transparency.
 * Use of well defined EM algorithm allows for an efficient implementation

Disadvantages

 * Underlying mixture model is simplistic. Velocities are modeled as being constant over time.
 * The constraint equations set up are not always valid (eg. presence of shading variation, highlights etc.)
 * Though set up to work with an arbitrary number of layers, the algorithm is tested with only two layers/two directions of motion.

Implementation
The algorithm has been made available on the author, Micheal J. Black's personal website. It is written in C and can be found here in the form of a .tar file.