User:Yfamy123/sandbox

= Rating-Based Collaborative Filtering = Rating-Based Collaborative Filtering is a branch of Collaborative Filtering (CF)', which only use numeric measures of a user’s preference on an item . Collaborative Filtering is a technique for recommendation system which focus on only presenting each user only those items that they most likely to be interested in.

History
In the early 1990s, the idea of helping users focus only on the items they will like sets the stage for collaborative filtering recommendation systems. The insight behind collaborative filtering recommender systems is that people have relative stable tastes. There- fore, if two people have agreed in the past they will likely continue to agree in the future. A key part of this insight – and the major difference between this and content based recommendation strategies – is that it doesn’t matter why two users agree. They could share tastes in books because they both like the same style of book binding, or they could share taste in movies due to a nuance of directing; a collaborative filtering recommender system doesn’t care. So long as the two users continue agreeing, we can use the stated prefer- ences of one user to predict the preferences of another. Since we need very little supplementary information, collaborative filtering algorithms are applicable in a wide range of possible circumstances.

Since the mid 1990s, collaborative filtering recommender systems have become very popular in the industry. Amazon, Netflix, Google, Facebook and many others, have deployed collaborative filtering algorithms to help their users find things they would enjoy. The popularity of these deployments has pushed the field of recommender systems, leading to faster, more accurate recommender systems. These improvements have been coupled to changes in how we think about deploying collaborative filtering systems to support users. Originally, collaborative filtering systems were used as a simple filter which helped users isolate interesting news articles from a broader stream of information. As recommender systems have grown larger it has become evident that users may not want a filter as they don’t need to see every good item. Instead, it is now common to think of these algorithms as recommending a short list of the best items for a target user, even if that means some good items will still be missed.

Concepts and  Notation
The two most central objects in a recommendation system are the users the system recommends to and the items the system might recommend. One user represents one independently tracked account for recommendation. Typically this represents one system account, and is assumed to represent one person’s tastes. We will denote the set of all users as $$U$$ with $$u,v,w \in U$$ being individual users from the set. One item represents one independently tracked thing that can be recommended. Most systems only track one kind of thing, which maps naturally to items for recommendation. Some systems track things at multiple levels of detail, some more applicable for recommendation than others. We will denote the set of all items as I with $$i,j,k \in I$$ being individual items from the set.

In the Rating-Based Collaborative Filtering System, ratings are collected from users on a given rating scale such as the 1-to-5 star scale used in MovieLens, Amazon, and Netflix or the ten star scale used by the Internet Movie DataBase. Often the choice of rating scale is not relevant for a recommendation algorithm; some interfaces, however, deserve special consideration. Small scales such as a binary (thumbs up, thumbs down) scale or unary “like” scales may require special adaptions when applying algorithms designed with a larger scale in mind.

Recommender systems often represent the set of all ratings as a sparse rating matrix $$R$$, which is a $$|U|\times|I|$$ matrix. $$R$$ only has values at $$r_{ui}\in R$$ where user $$u$$ has rated item $$i$$; for all other pairs of $$u$$ and $$i$$, $$r_{ui}$$ is blank. The set of all items rated by a user u is $$I_u\subset I$$, the collection of all ratings by one user can also be expressed as a sparse vector $$r_u$$. Similarly, the set of all users who have rated one item $$i$$ is $$U_i \subset U$$, and the collection of these ratings can be expressed as a sparse vector $$r_i$$.

Algorithm
The collabrative filtering, we can have two different task: 1) Prediction task: Predict what a target user $$u$$ would rate an item $$i$$. Rating predictions can be used by users to quickly evaluate an unknown item: if the item is predicted highly it might be worth further consideration. 2) Ranking task: Generate a personalized ranking of the item set for each user.

Any prediction algorithm can be used to generate rankings, but not all ranking algorithms produce scores that can be thought of as a prediction. We will use the syntax $$S(u,i)$$ to represent the output of both types of algorithm.

Baseline Predictors
The global baseline is $$S(u,i)=\mu$$, which simply predicts the average rating value for all users and items. It can be trivially improved by using a different constant for every item or user leading to the item baseline $$S(u,i)=\mu_i $$ and the user baseline $$S(u,i)=\mu_u $$ respectively. However most rating scales are not well anchored, two users might use different rating values to express the same preference for an item. Then it comes to a generic form of the baseline algorithm, the user-item bias model include $$\mu$$, the average rating; $$b_i$$, the item bias representing if an item is, on average, rated better or worse than average; and $$b_u$$ the user bias representing whether the user tends to rate high or low on average. Most of time, the functions will contain a damping term $$\beta$$ to avoid a poor performance when a user or item has very few ratings. Usually, damping parameter $$\beta$$ values of 5 to 25, but for best results β should be re-tuned for any given system

$$S(u,i) = \mu +b_u + b_i $$

$$\mu = \frac{\sum_{r_{ui}\in R} r_{ui}}{|R|}$$

$$b_i = \frac{\sum_{u\in U_i} (r_{ui}-\mu)}{|U_i|} \ \ \ or \ \ \ b_i = \frac{\sum_{u\in U_i} (r_{ui}-\mu)}{|U_i|+\beta}$$

$$b_u = \frac{\sum_{i\in I_u} (r_{ui}-b_i-\mu)}{|I_u|} \ \ \ or \ \ \ b_u = \frac{\sum_{i\in I_u} (r_{ui}-b_i-\mu)}{|I_u|+\beta}$$

Prediction Algorithm
Based on the machine learning taxonomies, prediction algorithms can be separated to model-based algorithms and memory-based algorithms. The most important rating-based collaborative filtering algorithms in these two groups are nearest neighbor algorithms and matrix factorization algorithms respectively.

Nearest Neighbor Algorithms (The first collaborative filtering algorithms)
This is the most direct implementations of the idea behind collaborative filtering, simply find users who have agreed with the current user in the past and use their ratings to make predictions for the current user. For a given user $$u$$ and item $$i$$ is to generate a set of users who are similar to $$u$$ and have rated item $$i$$. The set of similar users is normally referred to as the user’s neighborhood $$N_u $$, with the subset who have rated an item $$i$$ being $$N_{ui} $$. Once we have the set of neighbors we can take a weighted average of their ratings on an item as the prediction. Usually, we use Pearson’s r  as the similarity function $$sim(u,v) $$.
 * User-User Nearest Neighbor

However, it has two significant weaknesses. Item-based Nearest Neighbor is more popular than user-user nearest neighbor. Since the item-item algorithm is that item similarities and neighborhoods can be shared between users, furthermore, in systems where the set of users is much larger than the set of items, we would expect the average item to have very many ratings. In addition, item-based nearest neighbor is efficient, they can store the model in memory.
 * 1) It is only defined over the items rated by both users, and a decision needs to be made on how to handle ratings for other items. The standard statistical way of handling this is to ignore those ratings completely when computing the correlation.
 * 2) The correlation between any two users who have rated exactly two of the same items is 1, and in general correlations based on small numbers of common ratings tend towards extreme values based little data. When looking for similar neighbors to produce highquality predictions, however, it is better to prefer neighbors for which that similarity is well-substantiated by the data.
 * Item-based Nearest Neighbor

On interesting application of Item-based Nearest Neighbor is the K-furthest neighbors algorithm, the researchers can use it to increase the diversity of a recommendation system.

Matrix Factorization Algorithm
The matrix factorization algorithm one kind of latent feature model. The target of it is to find user’s preference matrix $$p_u $$and item’s relevance to features $$q_i $$ which satisfy $$S(u,i) = q_u\times p_i = \sum_{f=1}^kp_{uf}q_{if}$$. Usually, there are two way to solve this problem. There are two important and related difficulties with the singular value decomposition in its natural form. First, it is only defined over complete matrices, but most of R is unknown. In a rating-based system, this problem can be addressed by imputation, or assuming a default value (e.g. the item’s mean rating) for unknown values. If the ratings matrix is normalized by subtracting a baseline prediction before being decomposed, then the unknown values can be left as $$0 $$’s and the normalized matrix $$R'$$ processed using standard sparse matrix routines The target of training matrix decomposition model is to learn matrices $$P(m\times k)$$ and $$Q(n\times k)$$ such that predicting the known ratings in $$R$$ with the multiplication $$PQ^T$$ has minimal (squared) error ($$min \|R-PQ^T\|^2_2$$). To avoid bias, the formula can be rewritten as a biased matrix factorization model:
 * Singular Value Decomposition (SVD)
 * Training Matrix Decomposition Models

$$S(u,i) = \mu +b_i+b_u+p_u\cdot q_i$$

Where $$\mu$$, $$b_i,b_u,p_u\ and\ qi$$ can be learned by stochastic gradient descent algorithm.

Learn-to-rank Algorithm
Learning-to-rank algorithms are a recently popular class of algorithms from the broader field of machine learning. As their name suggests, these algorithm directly learn how to produce good rankings of items, instead of the indirect approach taken by the Top-N recommendation strategy. While there are several approaches, most learning- to-rank algorithms produce rankings by learning a ranking function. Like a prediction, a ranking function scores each user, item pair individually. Unlike predictions, however, the ranking score has no deliberate relationship with rating values, and is only interesting for its ranked order. Based on the relationship considered in the learn-to-rank algorithm, it can be divided as three model:
 * Pointwise algorithm: Pointwise algorithm category covers traditional prediction algorithms as well as some algorithms that allow for a broader treatment of rating data such as the ordrec algorithm.
 * Pairwise algorithm:   Pairwise algorithms focus on pairs of items, usually representing that a user prefers one item over another. Example: Bayesian Personalized Ranking (BPR) algorithm.
 * Listwise algorithm: Listwise algorithms focus on properties of the entire set of items for a given user, such as which item should be ranked first.

Other Algorithm

 * Probabilistic Model: Each profile has a distinct probability distribution over movies describing the movies that cluster tends to watch and, for each movie a probability distribution over ratings for that cluster. For example: PLSI (Probabilistic LSI) and LDA (latent Dirichlet allocation).
 * Linear Regression Approaches: In slope one we compute an average offset between all pairs of items. We then predict an item i by applying the offset to every other rating by that user, and performing a weighted average. For example: Slope one recommendation algorithm.
 * Graph-based recommender algorithms: these algorithms tend to impose a graph by connecting users to items they have rated. For example: Graph-based algorithms to combine content and collaborative filtering approaches.

Ensemble Recommendation
To improve the performance of recommendation algorithm, people can combine different algorithms.

$$S(u,i) = \alpha + \beta_1 S_1(u,i) + \beta_2 S_2(u,i)$$

Metrics and Evaluation
Every recommendation system is different, and no one algorithm has been discovered that works best for all systems. Different systems tend to have subtly different data properties, usage patterns, and recommendation needs. In general the best way of knowing which recommender is best for a system is to test the system with multiple algorithms and real users. But it expensive to test. To resolve this problem, recommender systems researchers have developed ways to evaluate a recommender algorithm online, without real users. Online evaluations are a common evaluation strategy from machine learning and information retrieval. But it is difficult to evaluate how well the system supports discovery of new preferences and favorites.So the best way to evaluate that is a combination of offline and online test. Using offline evaluation to narrow down possible algorithms to a small set of best choices. Then final selection of the best algorithm can be done based on an online evaluation (Lab study, Virtual Lab Study, A-B test).

Prediction Metrics

 * Prediction coverage metric (Coverage metric): Simply the percent of user item pairs in the whole system (or over the test set) that can be predicted.
 * Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).

Ranking Quality

 * Spearman ρ or Pearson r correlation coefficients.
 * Discounted cumulative gain metric (DCG).

Decision Support Matrix

 * Confusion matrix, Precision, Recall and F-Score
 * ROC curve and AUC

Others Matrix

 * Fallout metric: how often bad items are recommended, and compute the percent of recommended items that are known to be bad.
 * Mean average precision metric (MAP)
 * Mean reciprocal rank metric (MRR)