Node influence metric

In graph theory and network analysis, node influence metrics are measures that rank or quantify the influence of every node (also called vertex) within a graph. They are related to centrality indices. Applications include measuring the influence of each person in a social network, understanding the role of infrastructure nodes in transportation networks, the Internet, or urban networks, and the participation of a given node in disease dynamics.

Origin and development
The traditional approach to understanding node importance is via centrality indicators. Centrality indices are designed to produce a ranking which accurately identifies the most influential nodes. Since the mid 2000s, however, social scientists and network physicists have begun to question the suitability of centrality indices for understanding node influence. Centralities may indicate the most influential nodes, but they are rather less informative for the vast majority of nodes which are not highly influential.

Borgatti and Everett's 2006 review article showed that the accuracy of centrality indices is highly dependent on network topology. This finding has been repeatedly observed since then. (e.g. ). In 2012, Bauer and colleagues reminded us that centrality indices only rank nodes but do not quantify the difference between them. In 2013, Sikic and colleagues presented strong evidence that centrality indices considerably underestimate the power of non-hub nodes. The reason is quite clear. The accuracy of a centrality measure depends on network topology, but complex networks have heterogeneous topology. Hence a centrality measure which is appropriate for identifying highly influential nodes will most likely be inappropriate for the remainder of the network.

This has inspired the development of novel methods designed to measure the influence of all network nodes. The most general of these are the accessibility, which uses the diversity of random walks to measure how accessible the rest of the network is from a given start node, and the expected force, derived from the expected value of the force of infection generated by a node. Both of these measures can be meaningfully computed from the structure of the network alone.

Accessibility
The Accessibility is derived from the theory of random walks. It measures the diversity of self-avoiding walks which start from a given node. A walk on a network is a sequence of adjacent vertices; a self-avoiding walk visits (lists) each vertex at most once. The original work used simulated walks of length 60 to characterize the network of urban streets in a Brazilian city. It was later formalized as a modified form of hierarchical degree which controls for both transmission probabilities and the diversity of walks of a given fixed length.

Definition
The hierarchical degree measures the number of nodes reachable from a start node by performing walks of length $$h$$. For a fixed $$h$$ and walk type, each of these neighbors is reached with a (potentially different) probability $$p_j^{(h)}$$. Given a vector of such probabilities, the accessibility of node $$i$$ at scale $$h$$ is defined


 * $$\kappa_i^{(h)} = \exp \left( - \sum_j p_j^{(h)} \log p_j^{(h)} \right) $$

The probabilities can be based on uniform-probability random walks, or additionally modulated by edge weights and/or explicit (per edge) transmission probabilities.

Applications
The accessibility has been shown to reveal community structure in urban networks, corresponds to the number of nodes which can be visited in a defined time period, and is predictive of the outcome of epidemiological SIR model spreading processes on networks with large diameter and low density.

Expected force
The expected force measures node influence from an epidemiological perspective. It is the expected value of the force of infection generated by the node after two transmissions.

Definition
The expected force of a node $$i$$ is given by


 * $$\kappa_i = - \sum_{j=1}^J d_j \log(d_j)$$

where the sum is taken over the set $$J$$ of all possible transmission clusters resulting from two transmissions starting from $$i$$. That is, node $$i$$ and two of its neighbors or $$i$$, one of its neighbors (called infected) and a neighbor of the infected neighbor. $$J$$ contains all possible orderings of the transmission events, so two clusters may contain the same nodes if they got infected in a different order. $$d_j$$ is the normalized cluster degree of cluster $$j \in J$$, that is, the number of edges with exactly one endpoint in cluster $$j$$.

The definition naturally extends to directed networks by limiting the enumeration $$J$$ by edge direction. Likewise, extension to weighted networks, or networks with heterogeneous transmission probabilities, is a matter of adjusting the normalization of $$d_j$$ to include the probability that that cluster forms. It is also possible to use more than two transmissions to define the set $$J$$.

Applications
The expected force has been shown to strongly correlate with SI, SIS, and SIR epidemic outcomes over a broad range of network topologies, both simulated and empirical. It has also been used to measure the pandemic potential of world airports, and mentioned in the context of digital payments, ecology, fitness, and project management.

Other approaches
Others suggest metrics which explicitly encode the dynamics of a specified process unfolding on the network. The dynamic influence is the proportion of infinite walks starting from each node, where walk steps are scaled such that the linear dynamics of the system are expected to converge to a non-null steady state. The Impact sums, over increasing walk lengths, the probability of transmission to the end node of the walk and that the end node has not been previously visited by a shorter walk. While both measures well predict the outcome of the dynamical systems they encode, in each case the authors admit that results from one dynamic do not translate to other dynamics.