Color moments

Color moments are measures that characterise color distribution in an image in the same way that central moments uniquely describe a probability distribution. Color moments are mainly used for color indexing purposes as features in image retrieval applications in order to compare how similar two images are based on color. Usually one image is compared to a database of digital images with pre-computed features in order to find and retrieve a similar Image. Each comparison between images results in a similarity score, and the lower this score is the more identical the two images are supposed to be.

Overview
Color moments are scaling and rotation invariant. It is usually the case that only the first three color moments are used as features in image retrieval applications as most of the color distribution information is contained in the low-order moments. Since color moments encode both shape and color information they are a good feature to use under changing lighting conditions, but they cannot handle occlusion very successfully. Color moments can be computed for any color model. Three color moments are computed per channel (e.g. 9 moments if the color model is RGB and 12 moments if the color model is CMYK). Computing color moments is done in the same way as computing moments of a probability distribution.

Mean
The first color moment can be interpreted as the average color in the image, and it can be calculated by using the following formula


 * $$E_i=\textstyle\sum_{j=1}^{N} \frac{1}{N}p_{ij}$$

where N is the number of pixels in the image and $$p_{ij}$$ is the value of the j-th pixel of the image at the i-th color channel.

Standard Deviation
The second color moment is the standard deviation, which is obtained by taking the square root of the variance of the color distribution.


 * $$\sigma_i=\sqrt{(\frac{1}{N}\textstyle\sum_{j=1}^{N}(p_{ij}-E_i)^2)}$$

where $$E_i$$ is the mean value, or first color moment, for the i-th color channel of the image.

Skewness
The third color moment is the skewness. It measures how asymmetric the color distribution is, and thus it gives information about the shape of the color distribution. Skewness can be computed with the following formula:
 * $$s_i=\sqrt[3]{(\frac{1}{N}\textstyle\sum_{j=1}^{N}(p_{ij}-E_i)^3)}$$

Kurtosis
Kurtosis is the fourth color moment, and, similarly to skewness, it provides information about the shape of the color distribution. More specifically, kurtosis is a measure of how extreme the tails are in comparison to the normal distribution.

Higher-order color moments
Higher-order color moments are usually not part of the color moments feature set in image retrieval tasks as they require more data in order to obtain a good estimate of their value, and also the lower-order moments generally provide enough information.

Applications
Color moments have significant applications in image retrieval. They can be used in order to compare how similar two images are. This is a relatively new approach to color indexing. The greatest advantage of using color moments comes from the fact that there is no need to store the complete color distribution. This greatly speeds up image retrieval since there are less features to compare. In addition, the first three color moments have the same units, which allows for comparison between them.

Color indexing
Color indexing is the main application of color moments. Images can be indexed, and the index will contain the computed color moments. Then, if someone has a particular image and wants to find similar images in the database, the color moments of the image of interest will also be computed. After that the following function will be used in order to compute a similarity score between the image of interest and all the images in the database:
 * $$d_{mom}(H,I)=\textstyle\sum_{i=1}^{r}w_{i1}|E_i^1-E_i^2|+w_{i2}|\sigma_i^1-\sigma_i^2|+w_{i3}|s_i^1-s_i^2|$$

where:
 * H and I are the color distributions of the two images that are being compared
 * i is the channel index and r is the total number of channels
 * $$E_i^1$$ and $$E_i^2$$ are the first order moments computed for the image distributions.
 * $$\sigma_i^1$$ and $$\sigma_i^2$$ are the second order moments computed for the image distributions.
 * s_i^1 and s_i^2 are the third order moments computed for the image distributions.
 * $$w_{i1}$$, $$w_{i2}$$, and $$w_{i3}$$ are weights, specified by the user, for each of the three color moments used.

Finally, the images in the database will be ranked according to the computed similarity score with the image of interest, and the database images with the lowest $$d_{mom}(H,I)$$ value should be retrieved. "A retrieval based on $$d_{mom}(H,I)$$ may produce false positives because the index contains no information about the correlation between the color channels".

Example
A simple and concise example of the use of color moments for image retrieval tasks is illustrated in.

Consider having several test images in a database and a "New Image". The goal is to retrieve images from the database that are similar to the "New Image". The first three color moments are used as features. There are several steps in this computation.


 * 1) Image preprocessing (Optional) - The image preprocessing step of the computation process is optional. For example, in this step all images could be modified to be the same size (in terms of pixels). However, since color moments are invariant to scaling, it is not necessary to make all images the same width and height.
 * 2) Computing the features - Use the color moments formulae in order to compute the first three moments for each of the color channels in the image. For example, if the HSV color space is used, this means that for each of the images, 9 features in total will be computed (the first three order moments for the Hue, Saturation, and Value channels).
 * 3) Calculating the similarity score - After computing the color moments the weights for each of the moments in the $$d_{mom}(H,I)$$ function should be determined by the user. The weights have to be adjusted each time in accordance with the application or condition and quality of the images. Following that the $$d_{mom}(H,I)$$ function is used to calculate a similarity score for the "New Image" and each of the images in the database.
 * 4) Ranking and image retrieval - From the previous step the $$d_{mom}(H,I)$$ values were obtained. Now a comparison of these values can be made in order to decide which of the images in the database are more similar to the "New Image", and thus rank the database images accordingly. The smaller the $$d_{mom}(H,I)$$ value is the more similar the two color distributions are supposed to be. Finally, some of the top ranked images (the ones with the smallest $$d_{mom}(H,I)$$ value) from the database are retrieved.