Network homophily

Network homophily  refers to the theory in network science which states that, based on node attributes, similar nodes may be more likely to attach to each other than dissimilar ones. The hypothesis is linked to the model of preferential attachment and it draws from the phenomenon of homophily in social sciences and much of the scientific analysis of the creation of social ties based on similarity comes from network science. In fact, empirical research seems to indicate the frequent occurrence of homophily in real networks. Homophily in social relations may lead to a commensurate distance in networks leading to the creation of clusters that have been observed in social networking services. Homophily is a key topic in network science as it can determine the speed of the diffusion of information and ideas.

Node attributes and homophily
The existence of network homophily may necessitate a closer examination of node attributes as opposed to other theories on network evolution which focus on network properties. It is often assumed that nodes are identical and the evolution of networks is determined by the characteristics of the broader network such as the degree. Degree heterogeneity is also observed as a prevalent phenomenon (with a large number of nodes having a small number of links and a few of them having many). It may be linked to homophily as the two seem to show similar characteristics in networks. A large number of excess links caused by degree heterogeneity might be confused with homophily.

Influence on network evolution
Kim and Altmann (2017) find that homophily may affect the evolution of the degree distribution of scale-free networks. More specifically, homophily may cause a bias towards convexity instead of the often hypothesised concave shape of networks. Thus, homophily can significantly (and uniformly) affect the emergence of scale-free networks influenced by preferential attachment, regardless of the type of seed networks observed (e.g. whether it is centralized or decentralized). Although the size of clusters might affect the magnitude of relative homophily. A higher level of homophily can be associated to a more convex cumulative degree distribution instead of a concave one. Although not as salient, the link density of the network might also lead to short-term, localized deviations in the shape of the distribution.

In the development of the shape of the cumulative degree distribution curve the effects of the link structure of existing nodes (among themselves and with new nodes) and homophily work against each other, with the former leading to concavity and homophily causing convexity. Consequently, there is a level of homophily such that the two effects cancel each other out and the cumulative degree distribution reaches a linear shape in a log-log scale. Large variety of shapes observed in empirical studies of real complex networks may be explained by these phenomena. A low level of homophily could then be linked to a convex shape of cumulative degree distributions which have been observed in networks of Facebook wall posts, Flickr users, and message boards; while linear shapes have been noted in the networks of and software class dependency, Yahoo adverts, and YouTube users. Compared to these two shapes, convexity seems to be much less prevalent with examples in Google Plus and Filmtipset networks. This can be explained by the argument that high levels of homophily may significantly decrease the viability of networks, hence making convexity less frequent in complex networks.

Long-run convergence
In the long run, networks tend to converge in the case of unbiased network-based search. Nevertheless, younger nodes might show some bias in their connections. Bias may arise during network development through random meetings and network based search which are the two main processes through which new agents connect to established nodes. Bramoullé et al. (2012) illustrate this by conducting a study on the citation network of physics journals from the American Physical Society (APS) between 1985 and 2003. The two stages of the network development process of new nodes in this context is the random but potentially type biased finding of an article or reference by authors, and the discovery of references through citations in popular articles. Because similar articles are likely to cite similar references bias may arise in the formation of connections. Convergence is explained by three models of integration: weak integration, long-run integration, and partial integration.

Weak integration states that well-established nodes have a higher tendency to create new connections than young nodes regardless of the type of the node. Thus bias in link probabilities is eliminated over time as nodes age. Long-run integration states that the types of neighbouring nodes of any node will converge to the global distribution of types of the network as a whole which eliminates biases among neighbouring nodes. Partial integration causes the distribution of type in neighbouring nodes to converge monotonically to the global distribution with time albeit with some bias in the limit.

Homophily leads new nodes to connect to similar nodes with a higher probability but this bias is less apparent between second degree nodes than between first degree nodes of any given node. With time the connections created by network-based search get more and more prevalent (with the increase in the number of neighbours), and because second degree connections contain more and more randomly found nodes the connections of older nodes become more diverse and less influenced by homophily. Thus the citations of an older article is likely to come from a larger variety of subjects and scientific research fields.