Robust collaborative filtering

Robust collaborative filtering, or attack-resistant collaborative filtering, refers to algorithms or techniques that aim to make collaborative filtering more robust against efforts of manipulation, while hopefully maintaining recommendation quality. In general, these efforts of manipulation usually refer to shilling attacks, also called profile injection attacks. Collaborative filtering predicts a user's rating to items by finding similar users and looking at their ratings, and because it is possible to create nearly indefinite copies of user profiles in an online system, collaborative filtering becomes vulnerable when multiple copies of fake profiles are introduced to the system. There are several different approaches suggested to improve robustness of both model-based and memory-based collaborative filtering. However, robust collaborative filtering techniques are still an active research field, and major applications of them are yet to come.

Introduction
One of the biggest challenges to collaborative filtering is shilling attacks. That is, malicious users or a competitor may deliberately inject certain number of fake profiles to the system (typically 1~5%) in such a way that they can affect the recommendation quality or even bias the predicted ratings on behalf of their advantages. Some of the main shilling attack strategies are random attacks, average attacks, bandwagon attacks, and segment-focused attacks.

Random attacks insert profiles that give random ratings to a subset of items; average attacks give mean rating of each item. Bandwagon and segment-focused attacks are newer and more sophisticated attack model. Bandwagon attack profiles give random rating to a subset of items and maximum rating to very popular items, in an effort to increase the chances that these fake profiles have many neighbors. Segment-focused attack is similar to bandwagon attack model, but it gives maximum rating to items that are expected to be highly rated by target user group, instead of frequently rated.

In general, item-based collaborative filtering is known to be more robust than user-based collaborative filtering. However, item-based collaborative filtering are still not completely immune to bandwagon and segment attacks.

Robust collaborative filtering typically works as follows:


 * 1) Build spam user detection model
 * 2) Follow the workflow of regular collaborative filtering system, but only using rating data of non-spam users.

User relationships
This is a detection method suggested by Gao et al. to make memory-based collaborative filtering more robust. Some popular metrics used in collaborative filtering to measure user similarity are Pearson correlation coefficient, interest similarity, and cosine distance. (refer to Memory-based CF for definitions) A recommender system can detect attacks by exploiting the fact that the distributions of these metrics differ when there are spam users in the system. Because shilling attacks inject not just single fake profile but a large number of similar fake profiles, these spam users will have unusually high similarity than normal users do.

The entire system works like this. Given a rating matrix, it runs a density-based clustering algorithm on the user relationship metrics to detect spam users, and gives weight of 0 to spam users and weight of 1 to normal users. That is, the system will only consider ratings from normal users when computing predictions. The rest of the algorithm works exactly same as normal item-based collaborative filtering.

According to experimental results on MovieLens data, this robust CF approach preserves accuracy compared to normal item-based CF, but is more stable. Prediction result for normal CF shifts by 30-40% when spam user profiles are injected, but this robust approach shifts only about 5-10%.