VisualRank

VisualRank is a system for finding and ranking images by analysing and comparing their content, rather than searching image names, Web links or other text. Google scientists made their VisualRank work public in a paper describing applying PageRank to Google image search at the International World Wide Web Conference in Beijing in 2008.

Methods
Both computer vision techniques and locality-sensitive hashing (LSH) are used in the VisualRank algorithm. Consider an image search initiated by a text query. An existing search technique based on image metadata and surrounding text is used to retrieve the initial result candidates (PageRank), which along with other images in the index are clustered in a graph according to their similarity (which is precomputed). Centrality is then measured on the clustering, which will return the most canonical image(s) with respect to the query. The idea here is that agreement between users of the web about the image and its related concepts will result in those images being deemed more similar. VisualRank is defined iteratively by $$VR = S^* \times VR$$, where $$S^*$$ is the image similarity matrix. As matrices are used, eigenvector centrality will be the measure applied, with repeated multiplication of $$VR$$ and $$S^*$$ producing the eigenvector we're looking for. Clearly, the image similarity measure is crucial to the performance of VisualRank since it determines the underlying graph structure.

The main VisualRank system begins with local feature vectors being extracted from images using scale-invariant feature transform (SIFT). Local feature descriptors are used instead of color histograms as they allow similarity to be considered between images with potential rotation, scale, and perspective transformations. Locality-sensitive hashing is then applied to these feature vectors using the p-stable distribution scheme. In addition to this, LSH amplification using AND/OR constructions are applied. As part of the applied scheme, a Gaussian distribution is used under the $\ell_2$ norm.