User:Xianteng/sandbox

Social Recommender System
A Social Recommender System is a personalized recommender system that takes advantage of users' social connection in online social networks. The rapid development of online social networks encouraged web users to create social connections with their friends. As humans are likely to make friends with people who share similar interests, opinions, and tastes, this knowledge can improve recommender systems in better predicting users' preferences. It narrows down the search scope of finding like-minded users and offers positive experience by providing more relevant items.

Sociology theory
Two sociology theories provide support for the usage of social links in developing recommender systems, i.e. homophily and social influence. Homophily explains the phenomenon that people selectively interact with others who possess similar social status and social value. Social status can be personal characteristics (e.g. age, religion, location), positions and classes. Social value refers to knowledge state and internal belief. Social influence describes a tendency that people are influenced by others and grow to resemble their friends. Deutsch and Gerard (1955) divided social influence into two categories - normative influence and informational influence. Normative influence means that humans are prone to confirm their behaviors to a community's norms in order to be a member of it; and informational influence refers to that people receive information from others to obtain a better understanding of the reality. Sociologists suggest that homophily and social influence are two major reasons to explain why people are similar to their neighbors. On one hand, people selectively form links to others who are similar to them; and on the other hand, people change their behaviors and knowledge base to resemble their friends. Accordingly, incorporating social links can help find like-minded users in recommender systems.

Social links
Social link can be broadly defined, covering explicit vs. implicit interactions, weighted vs. unweighted links, directed vs. undirected links, as well as pairwise vs. group-wise links. For example, Facebook requires mutual agreement of users to create a link, while Twitter supports asymmetric following relationships. Even though explicit connections are not available, implicit information can be used to construct social links, such as reviewing same products, commenting on same items and sharing same online groups etc.

Direct Friend-to-friend Recommendation
The direct friend-to-friend recommendation is one of the simplest recommendation strategies, imitating the "word-of-mouth" phenomenon in social networks. The activities that people share ideas, discuss movies and recommend restaurants to friends are all examples of direct recommendation. Despite of its simplicity, direct recommendation has exhibited promising potentials in boosting recommendation. Sinha and Swearingen (2001) conducted an experiment, in which participants were asked to evaluate items (books and movies) given by recommender systems as well as by their close friends. Their results revealed that close friends always gave more satisfying recommendations than existing recommender systems. Besides, another experiment conducted by Bonnard et al (2006) studied the influence of personal familiarity between advice-seeker and recommender in recommendation judgement. Their results showed that people were more willing to accept recommendations from their friends instead of strangers.

Nearest Neighbor (NN) Recommendation
Given a target user, the nearest neighbors are those persons who maintain the largest similarities with him or her. By exploring the opinions of his or her neighborhood, the algorithm is able to predict unknown rating that user probably gives to a candidate item. Traditional nearest neighbor algorithm is only based on user-item matrix, without consideration of any extra information. So the phenomenon is very common that strangers are selected to be the like-minded neighbors whom target user might not even know at all. Besides, another problem with traditional similarity measurement is data sparsity of user-item matrix, so that it would be difficult to do prediction for a new user who has few rating records.

To address above mentioned issues, researchers proposed to make use of social links in neighborhood generation. Instead of automatically finding neighbors from all user set, social link-based recommendation substitutes or complements with user specified social links. In this way, it guarantees that selected neighbors are well-known to target users, as well as benefits new user's prediction process. The effectiveness of social links have been proved by many prior work. For example, Groh and Ehmig (2007) proposed social filtering as a compartment to collaborative filtering. Unlike collaborative filtering, social filtering chooses neighbors based on social links (friendship) rather than the user-item matrix. They found that social filtering outperformed collaborative filtering in all their experimental settings. Besides, Massa and Avesani (2009) formulated a trust score to select neighbors in recommender system. The trust score measures how much the target user trusts a certain neighbor, which is obtained from social links. Superior to traditional approaches, their method is applicable to users who provided few ratings.

Matrix Factorization Recommendation
Matrix factorization is an effective solution for data sparsity problem in recommender systems. It decomposes a sparse matrix into smaller matrices, and the reduced dimension can be used to indicate latent features. For example, let $$\mathbf{R} \in \R^{n\times l}$$ be the original user-item matrix, and each element $$r_{ui}\in\R$$ be the rating user $$u$$ gives to item $$i$$, $$\mathbf{R}$$ can be decomposed into two lower-dimensional matrices $$\mathbf{P} \in \R^{n \times f}$$ and $$\mathbf{Q} \in \R^{l \times f}$$ as $$\mathbf{R} \approx \mathbf{P}\mathbf{Q}^T$$. Each row $$p_u \in \R^{1\times f}$$ in $$\mathbf{P}$$ represents user $$u$$'s feature vector, each row $$q_i \in \R^{1\times f}$$ in $$\mathbf{Q}$$ represents item $$i$$'s feature vector. Let $$\hat{\mathbf{R}}$$ be the approximated user-item matrix, then it is a dense matrix which can be calculated by $$\hat{\mathbf{R}} = \mathbf{P}\mathbf{Q}^T$$. Traditional matrix factorization is only based on user-item matrix, trying to minimize the discrepancy between original matrix $$\mathbf{R}$$ and the estimated matrix $$\hat{\mathbf{R}}$$:

$$\min_{\mathbf{P},\mathbf{Q}}\sum_u\sum_i\sigma_{ui}(r_{ui}-\hat{r}_{ui})^2+\lambda(||\mathbf{P}||_F^2+||\mathbf{Q}||_F^2)$$

where $$\sigma_{ui} = 1$$ if user $$u$$ has rated item $$i$$ and $$\sigma_{ui} = 0$$ otherwise, $$\lambda$$ is the regularization parameter. The traditional method assumes that users would always rate items independently but fails to consider the situation where users are influenced by their social connections. Therefore, more advanced matrix factorization algorithms, i.e. social link-based co-factorization is proposed.Social link-based co-factorization highlights the phenomenon that social links also influence users' features, thus it applies not only user-item matrix but also social links. Let $$\mathbf{S}\in\R^{n\times n}$$ denote the user-user matrix in social networks, it can be approximated by $$\mathbf{S}\approx \mathbf{P}\mathbf{Z}^T$$. Therefore social link-based co-factorization is described by:

$$\min_{\mathbf{P},\mathbf{Q},\mathbf{Z}}\sum_u\sum_i\sigma_{ui}(r_{ui}-\hat{r}_{ui})^2+\alpha\sum_u\sum_v\xi_{uv}(s_{uv}-\hat{s}_{uv})^2+\lambda(||\mathbf{P}||_F^2+||\mathbf{Q}||_F^2+||\mathbf{Z}||_F^2)$$

where $$\xi_{uv} = 1$$ if user $$u$$ and $$v$$ has social connection, and $$\xi_{uv} = 0$$ otherwise. The objective function tries to minimize two types of discrepancies simultaneously that comes from rate records as well as social links.

Ensemble Method
Ensemble methods estimate a missing rating through linearly combining target user's own prediction and all the predicted ratings of its social neighbors. For example, Mar et al (2009) proposed an ensemble method called Social Trust Ensemble. Given the learned user's feature vector $$p_u$$ and item's feature vector $$q_i$$, the final prediction for $$\hat{r}_{ui}$$ can be computed as follow:

$$\hat{r}_{ui} = \alpha p_u {q_i}^T + (1-\alpha)\sum_v\xi_{uv}p_v {q_i}^T$$

where $$\xi_{uv}$$ denotes two users' social connection.

Regularization Method
Regularization method extends matrix factorization by adding extra regularization terms, which can be called "social regularization". Social regularizations try to minimize the dissimilarity between social friends with high similarity. Ma et al (2011) proposed two types of social regularizations: individual-based regularization as well as average-based regularization. The former one imposes constraints between users and their social friends individually:

$$\min_{U,V} = \sum_{ui}\sigma_{ui}(r_{ui}-\hat{r}_{ui})^2+\lambda_1||U||_F^2+\lambda_2||V||_F^2+\beta\sum_i\sum_{j\in\mathcal{F}_i}sim(i,j)||u_i-u_j||_F^2$$

where $$sim(i,j)$$ represents similarity and $$\mathcal{F}_i$$ indicates $$i$$'s friend set. The second regularization constrain each user's feature with the average level of his or her social friends:

$$\min_{U,V} = \sum_{ui}\sigma_{ui}(r_{ui}-\hat{r}_{ui})^2+\lambda_1||U||_F^2+\lambda_2||V||_F^2+\beta\sum_i||u_i-\frac{\sum_{j\in\mathcal{F}_i}sim(i,j)u_j}{\sum_{j\in\mathcal{F}_i} sim(i,j)}||_F^2$$.

where the average level is a weighted combination of all social friends.

Graph-based Recommendation
Instead of indirectly leveraging social link information, graph-based strategy directly depends on the structure of social graphs. A popular social graph is trust-based graph in which vertices are users and links indicate trustiness. Usually, a random walker will search on the graph by following trusted links to reach reliable friends (direct or indirect), and return the rates for a candidate item given by those trusted friends. In addition to homogenous graphs, researchers also construct heterogeneous graphs that contain a variety of vertex entities, such as items and tags etc. No matter what kinds of graphs, it adopts the generic framework of random walk with restart to search useful information in those graphs.