Teknomo–Fernandez algorithm

The Teknomo–Fernandez algorithm (TF algorithm), is an efficient algorithm for generating the background image of a given video sequence.

By assuming that the background image is shown in the majority of the video, the algorithm is able to generate a good background image of a video in $$O(R)$$-time using only a small number of binary operations and Boolean bit operations, which require a small amount of memory and has built-in operators found in many programming languages such as C, C++, and Java.

History
People tracking from videos usually involves some form of background subtraction to segment foreground from background. Once foreground images are extracted, then desired algorithms (such as those for motion tracking, object tracking, and facial recognition) may be executed using these images.

However, background subtraction requires that the background image is already available and unfortunately, this is not always the case. Traditionally, the background image is searched for manually or automatically from the video images when there are no objects. More recently, automatic background generation through object detection, medial filtering, medoid filtering, approximated median filtering, linear predictive filter, non-parametric model, Kalman filter, and adaptive smoothening have been suggested; however, most of these methods have high computational complexity and are resource-intensive.

The Teknomo–Fernandez algorithm is also an automatic background generation algorithm. Its advantage, however, is its computational speed of only $$O(R)$$-time, depending on the resolution $$R$$ of an image and its accuracy gained within a manageable number of frames. Only at least three frames from a video is needed to produce the background image assuming that for every pixel position, the background occurs in the majority of the videos. Furthermore, it can be performed for both grayscale and colored videos.

Assumptions

 * The camera is stationary.
 * The light of the environment changes only slowly relative to the motions of the people in the scene.
 * The number of people does not occupy the scene for the most of the time at the same place.

Generally, however, the algorithm will certainly work whenever the following single important assumption holds: "For each pixel position, the majority of the pixel values in the entire video contain the pixel value of the actual background image (at that position)."As long as each part of the background is shown in the majority of the video, the entire background image needs not to appear in any of its frames. The algorithm is expected to work accurately.

Equations
0, &\text{otherwise} \end{cases}$$
 * 1) For three frames of image sequence $$x_1$$, $$x_2$$, and $$x_3$$, the background image $$B$$ is obtained using     $$B = x_3(x_1\oplus x_2)+x_1x_2 $$
 * 2) The Boolean mode function $$S$$ of the table occurs when the number of 1 entries is larger than half of the number of images such that    $$S=\begin{cases} 1, & \text{if } \sum_{i=1}^n x_i\ge\left \lceil \frac n 2 + 1 \right\rceil, \text{ and } n\ge 3 \\
 * 1) For three images, the background image $$B$$ can be taken as the value
 * $$\bar{x}_1 x_2x_3+x_1\bar{x}_2 x_3+x_1x_2\bar{x}_3+x_1x_2x_3$$

Background generation algorithm
At the first level, three frames are selected at random from the image sequence to produce a background image by combining them using the first equation. This yields a better background image at the second level. The procedure is repeated until desired level $$L$$.

Theoretical accuracy
At level $$\ell$$, the probability $$p_\ell$$ that the modal bit predicted is the actual modal bit is represented by the equation $$p_\ell = (p_{\ell-1})^3 + 3(p_{\ell-1})^2(1-p_{\ell-1})$$. The table below gives the computed probability values across several levels using some specific initial probabilities. It can be observed that even if the modal bit at the considered position is at a low 60% of the frames, the probability of accurate modal bit determination is already more than 99% at 6 levels.

Space complexity
The space requirement of the Teknomo–Fernandez algorithm is given by the function $$O(RF+R3^L)$$, depending on the resolution $$R$$ of the image, the number $$F$$ of frames in the video, and the desired number $$L$$ of levels. However, the fact that $$L$$ will probably not exceed 6 reduces the space complexity to $$O(RF)$$.

Time complexity
The entire algorithm runs in $$O(R)$$-time, only depending on the resolution of the image. Computing the modal bit for each bit can be done in $$O(1)$$-time while the computation of the resulting image from the three given images can be done in $$O(R)$$-time. The number of the images to be processed in $$L$$ levels is $$O(3^L)$$. However, since $$L \le 6$$, then this is actually $$O(1)$$, thus the algorithm runs in $$O(R)$$.

Variants
A variant of the Teknomo–Fernandez algorithm that incorporates the Monte-Carlo method named CRF has been developed. Two different configurations of CRF were implemented: CRF9,2 and CRF81,1. Experiments on some colored video sequences showed that the CRF configurations outperform the TF algorithm in terms of accuracy. However, the TF algorithm remains more efficient in terms of processing time.

Applications

 * Object detection
 * Face detection
 * Face recognition
 * Pedestrian detection
 * Video surveillance
 * Motion capture
 * Human-computer interaction
 * Content-based video coding
 * Traffic monitoring
 * Real-time gesture recognition