User:Appuasc11r4

Maximum margin Hough Transform ( computer vision ) is a feature extraction technique used to detect objects in image analysis and digital image processing. This technique places Hough transform in a discriminative framework where local parts probabilistically vote for location of the object. It allows for the weights to be learned in a max-margin framework to optimize the classification performance.

Introduction
Various object detection techniques have been used throughout history viz. sliding window classifier, constellation model and implicit shape model. The Hough transform is also one such method and has been used for various pose estimation problems including shape detection. When the Hough transform is placed in a discriminative framework where each local part casts a weighted vote for the possible locations of the object center, the learning framework takes into account both the appearance of the part and the spatial distribution of its position with respect to the object center and parts. As the appearance of the part and its spatial distribution repeat and occur again at a consistent location then they are assigned with higher weights than the rest. The globally optimal solution can be obtained by using off the shelf optimization packages. This approach is known as Maximum Margin Hough Transform or $$M^2HT$$. The parts and the probability distribution of the object locations are treated as blackbox allowing us to learn weights for the popular implicit shape model.

Theory
Let $$f_i$$ denote the features observed at a location $$l_i$$, which could be based on the properties of the local patch around $$l_i$$. Let S(O, x) denote the score of object O at a location x. Here x denotes pose related properties such as position, scale, and aspect ratio. Let $$C_i$$ denotes the i’th codebook entry of the vector quantized space of features f.

The overall procedure Probabilistic Hough Transform can be seen as a weighted vote for object locations all over the codebook entries $$C_i$$. Whereas in Max-Margin Hough Transform the weights $$w_i$$ can be learnt in a discriinative manner which optimizes the classification performance. The main idea here is that the score of the S(O, x) is a linear function of $$p(O|C_i)$$. background. Assuming that the probability $$p(O|C_i, l)$$ is independent of the location (location invariance).

$$ \textstyle S(O, x) = \sum_{i,j} p(x|O,C_i,l_j)p(C_i|f_j)p(O|C_i, l_j) $$

$$= \textstyle           \sum_{i,j} p(x|O, C_i, l_j)p(C_i|f_j)p(O|C_i) $$

$$= \textstyle            \sum_{i} p(O|C_i)\sum_{j} p(x|O,C_i, l_j)p(C_i|f_j)$$

$$= \textstyle            \sum_{i} w_i  a_i(x) = w^T A(x)$$

where $$\textstyle A^T = [a_1a_2. . . a_K]$$ is the activation vector and $$a_i$$ is given by the following equation:

$$ a_i(x) = \textstyle\sum_{j} p(x|O,C_i, l_j)p(C_i|f_j) $$

This algorithm finds weights that maximizes the score S on correct object locations over incorrect ones. Rather than estimating $$w_i$$ based only on codebook activations, we can use the conditional distribution of the object centers to learn the weights.

Discriminative Training
Let $${(y_i, x_i)}_{i=1}^N $$be set of training examples, where $$y_i $$ ₢ {+1,−1} is the label and $$x_i$$ is the location of the i’th training instance.

The first stage is to compute the activations $$A_i = A(x_i)$$ for each example by carrying forward the voting process and adding up the votes for each feature $$f_j$$ found at location $$l_j$$ according to the Equation explained in the above section. Thus the score assigned by the model to the instance i is $$w^TA_i$$. Weights are learned by maximizing this score on correct locations of the object over incorrect ones. In order to be robust to outliers and avoid overfitting, we propose a max-margin formulation leading to the following optimization problem

$$ min_{w,b,c} 1/2w^T w+C \sum_{i=1}^T c_i$$

where w>=0, c>=0 for all i= 1,2,...,N