Draft:Selective search

In computer vision and image processing, selective search is a method of object detection that takes an image and captures objects within that image into defined spaces called bounding boxes (defined by an origin point, height value, and length value). These boxes then act as Regions of Interest (ROI) for an image classifier to classify what object is present in the bounding box.[1]

The selective search method is an attempt to make object detection less computationally taxing then exhaustive search and capture the benefits of segmentation in establishing the boundary lines of the boxes based on the shape of the object being classified.

It avoids needing to make a lot of correct guesses on its parameters by using an ensemble approach to segmentation rather than a single model and it avoids having a fixed bounding box size, thus being able to detect objects at different scales, by combining segments to varying sizes rather than having a fixed sized grid overlaying the image from which to search from, as the computationally feasible exhaustive approaches have.

Uses
It is the fundamental Region of Interest extractor for the original Region Based Convolutional Neural Network (R-CNN) as well as Fast R-CNN.

How it works
The image is taken in and split up into an initial set of small starting regions by the fast method of Felzenszwalb and Huttenlocher.[2] A greedy algorithm is used to iteratively group regions together. The algorithm works by taking a given region and calculating the similaritiese between it and all neighbouring regions. The two most similar regions are grouped together. This occurs recursively until a single region spanning the image results. The Hierarchical Grouping Algorithm is as follows:

Input: (color) image Output: Set of object location hypotheses L

Obtain initial regions R = {r1,...rn} using Fast Method (Felzenszwalb and Huttenlocher) Initialize similarity set S = Ø foreach Neighboring region pair (ri,rj) do   Calculate similarity s(ri,rj) S=S U s(ri,rj)

while S != Ø do   Get highest similarity s(ri,rj) = max(S) Merge corresponding regions rt = ri U rj Remove similarities regarding ri : S = S \ s(ri,r*) Remove similarities regarding rj : S = S \ s(r*,rj) Calculate similarity set St, between rt and its neighbors S = S U St   R = R U rt

Extract object location boxes L from all regions in R

Selective search relies on variety in its assembly of larger regions and it accomplishes this variety in 3 ways: using a variety of color spaces with different invariance properties, using different similarity measure sij, and varying the starting regions of calculation.

The similarity measure in the above algorithm s(ri,rj) is defined as:

ai ∈ {0,1}

Efficient Graph-Based Image Segmentation: Felzenszwalb and Huttenlocher (Fast Method for Image Component Generation)
For the initial segments list used in 'R' above, a method has to be employed to actually generate regions in the image that have some kind of structure. In other words, pixels have to be associated in groups that have some kind of shared meaning between them.

These groups are called components and the segments list (R) is filled with them.

The method used in the original paper for Selective Search was created by Felzenszwalb and Huttenlocher[2]

It's algorithm goes as follows: Input: Graph(Vertices, Edges) Output: Segmentation list(Components) where Components = list(vertices)

NOTE: image that is used to generate graph typically has a Gaussian blur at 0.8 radius to reduce artifacting in final segmentation list.

Edges = list(Edge) Edge = (Vertex1, Vertex2, weight) Component = (list(verticies), internal_diff:int) THRESHOLD = arbitrary valued used to grow or shrink the number of total components by increasing or decreasing the threshold for merging components when comparison for merging occurs.

Sorted_Edges = list of 'Edges' from 'Graph' sorted from smallest to largest by weight

Segmentation_list = []

for vertex in Vertices: Segmentation_list.append(Component(vertex, internal_diff=0))

for edge in Sorted_edges: if edge.v1 is not in same component as edge.v2 and edge.weight <= min(Component(v1).internal_diff + THRESHOLD, Component(v2).internal_diff + THRESHOLD): merge the two components. merged component internal_diff = max(weight, v1.internal_diff, v2.internal_diff) update Segmentation_list with union of two Components