User:ErWenn/Network robustness

In graph theory and network analysis, the "robustness" of a graph or class of graphs measures its resilience (in terms of connectivity) to the removal of edges or vertices. Robustness can be formalized in a variety of ways, depending on how connectivity is measured, whether edges or vertices are removed, and how the edges or vertices to be removed are chosen.

Robustness is of practical concern to those working with physical networks, including those involved in gene regulation, disease transmission, and the Internet. A gene or computer network that is robust with respect to random removals retains much of its functionality in the face of random failures. If the elimination of vertices or edges is directed in some fashion, then robustness implies resilience to targeted attacks. This kind of robustness is also of interest to epidemiologists who want to break up disease transmission networks by targeted vaccinations.

There are many ways to measure the connectedness of a graph, ranging from simply deciding whether the graph is connected to computing the graph's diameter (i.e., the maximum length of the shortest path between any two vertices of the graph). For the purposes of measuring the robustness of a network, the size of the graph's largest component is frequently used to measure connectivity. Of particular interest is whether the graph has a giant component (this is a component that contains a majority of the vertices of the graph). The giant component is also sometimes referred to as a "spanning cluster", which is related but not identical to the concept of the same name in percolation theory. Although this connectivity measure is commonly considered, some have claimed that it is not the most appropriate way to measure robustness for certain networks.

The robustness of a graph greatly depends on the graph's structure. In particular, the vulnerability of a network to random failures has been connected to its degree distribution. (The degree of a vertex is the number of neighbors it has, and the degree distribution describes how many vertices have each possible degree.) Randomly removing vertices from a random graph with a Poisson distribution of degrees will usually break the graph up into many small components fairly quickly. Performing the same procedure on a graph with a long-tail degree distribution (in which a small number of vertices have extremely large degrees) tends to leave most of the graph connected. The effects of structure on robustness are studied by examining the effect of vertex and edge deletions both on specific graphs representing particular physical networks and on abstract graphs generated by stochastic methods designed to produce particular structural features. Classes of such random graphs are analyzed both via purely theoretic means and also by large-scale computations.

Network Models
The simplest method for producing random graphs is the Erdős-Rényi model, in which a fixed number of vertices are randomly connected so that each pair of vertices has the same probability of being connected as any other pair of vertices. Erdős–Rényi random graphs are often referred to as "ER random graphs" or often just "random graphs". The degree distribution of a random graph tends to be a normal distribution, in which the degrees of the vertices are fairly close to an average degree with the number of vertices trailing off exponentially as the degree gets larger.

Unfortunately, many real-world networks (including the Internet, gene regulation networks, and contact networks for sexually transmitted diseases) do not display this kind of distribution. The degree distributions of such networks include small but statistically significant numbers of high-degree nodes. In graphs with such long-tail degree distributions, the presence of these high-degree nodes skews the average degree in such a way that the degrees no longer cluster around it. In fact, for some classes of networks with long-tail distributions, the average degree may not even be definable (technically, the average degree is always definable for a finite network, but as the number of nodes increases, the average degree blows up so rapidly that it's not considered a meaningful measure). There is much potential variety amongst long-tail distributions, but many real-world networks have distributions that are well-approximated by a power law, i.e., one in which the number $$P(n)$$ of nodes with degree $$n$$ is directly proportional to some fixed power $$-\gamma$$ of $$n$$:


 * $$P(n) \propto n^{-\gamma}$$

Networks with power-law distributions are often called "scale-free networks". Scale-free networks are encountered quite frequently, so models that generate graphs with power-law distributions are often employed to study how this kind of structure affects robustness. The well-known the Barabási–Albert (BA) model is used to generate graphs featuring power-law distributions with $$\gamma = 3$$. In this model, vertices are added to the graph one at a time. At each step, the new edge is connected to any existing vertex with a probability directly proportional to the that vertex's degree. Models of scale-free networks with different values of $$\gamma$$ can be generated using algorithms similar to the BA model in that new vertices are connected to old vertices with a probability dependent on the old vertices' degrees.

Such models are successful in capturing the scale-free nature of many real-world networks, but they do not necessarily reflect other structural features (such as clustering) found in many of these networks. There are also some real-world networks that are not scale-free, but are not well-modeled by ER random graphs. As a result, there are many more generative models in use.

Robustness of Diameter
Even when a graph remains connected despite failures or attacks, the distance between nodes can potentially increase. For computer networks, this may result in slower response times. The diameter of a graph (i.e., the average distance between nodes) can be used to study the effect of failures and attacks on this kind of interconnectedness. When a random vertex is removed from a random graph (generated by the ER model) the diameter of the graph tends to grow. Subsequent random removals will likely increase the diameter by the same amount. In other words, the diameter of the graph increases linearly in the number of removed edges. If the same procedure is performed on a scale-free graph (generated by the BA model), there is essentially no effect on the diameter of the graph, even when up to 5% of the nodes are removed. Results like this are usually verified both by the simulation of many random graphs and whenever possible, by theoretical means (in this case through the use of percolation theory). These features have been observed in many real-world scale-free networks (such as the Internet and the World Wide Web). This demonstrates why random failures of Internet routers do not impact connection rates.

Things are different if the removal of vertices is targeted at those that are most well-connected (i.e., that are of highest degree. For random graphs, the result is the same as in the random failure situation, the diameter growing at almost exactly the same rate. However, the diameter of a BA scale-free graph scales linearly (with the number of highest-degree vertices removed) at a factor much greater than that for a random graph. Again, this is the same behavior has been verified computationally, theoretically, and observed in many real-world networks.

Robustness of the Giant Component
The diameter of a network is important in terms of how strong the connections in a network are, but it's not as useful for studying what happens when the network is no longer connected at all. In the case of computer networks, it is desirable to avoid having large portions of the network cut off from the rest of the network. For this reason, many measures of robustness focus on the size of the largest component of the network. In most situations, this component contains a majority of the network's nodes, and is referred to as the "giant component". In the ideal initial situation, the giant component is the entire network, but in reality, there are often small components completely disconnected from the rest of the network. As long as the vast majority of the network is still connected, the network is considered healthy.

Random failures affect the size $$S$$ of the largest component in a much more complex way than they do the diameter. For many graphs (including random graphs) with a small number of removed nodes, $$S$$ decreases slowly, but the rate of decrease grows asymptotically as the fraction of removed nodes approaches some fixed number $$f^e_c$$, called the fragmentation threshold. After this point, $$S$$ drops rapidly, quickly fragmenting the network into a large number of small, disconnected components. For large random graphs, the fragmentation threshold $$f^e_c$$ is approximately 0.28. For random graphs, switching to attacks targetted at the well-connected nodes doesn't change anything. For $$\gamma \leq 3$$, the scale-free network behaves differently. For random node removal, $$S$$ decreases linearly at a slow rate, and there is effectively no fragmentation threshold. Technically, every finite graph must have a fragmentation threshold, but as the number of nodes in the graph increases, this threshold rapidly approaches 1. The ability for scale-free networks (such as the Internet) to remain almost entirely connected in the face of random failures is often necessary for them to remain operational. In 2000, the fragmentation threshold for the network of Internet routers was estimated to be greater than 0.99. Since this network has increased in size since then, $$f^e_c$$ has likely increased.

Scale-free networks with any power-law exponent fare much more poorly under targeted deletions, exhibiting similar behavior to the random graph case, but with a much lower fragmentation threshold (roughly 0.18 for large graphs generated by the BA algorithm). Different values of $$f^e_c$$ are observed for scale-free networks with different power-law coefficients $$\gamma$$. For the network of Internet routers (where $$\gamma$$ is approximately 2.48), $$f^e_c$$ has been measured to be near 0.03. Some people say that this is an inherent and potentially disastrous weakness in the structure of the Internet, claiming that the entire structure could be disabled with a few carefully placed attacks.

It should be noted that while a scale-free network can be fragmented by removing a only a small fraction of the nodes, this presumes that this fraction includes all of the highest degree nodes. It has been suggested by some that instead of measuring the fraction of highest degree nodes that are needed for fragmentation, we measure the maximum possible degree of any node in the fragmented graph (called the cutoff). From this point of view, scale-free graphs seem less fragile, as the cutoff must be set to a very low value to ensure fragmentation. For example, for a scale-free BA graph with $$\gamma = 2.7$$, the fragmentation threshold is .99, meaning that removing the top 1% (as measured by degree) of the nodes results in a fragmented graph. For the same graph, the cutoff necessary to fragment the graph is 10, meaning that every node of degree greater than 10 must be removed to fragment the graph. Effectively, this means that scale-free networks can be taken down with targeted attacks, but that these attacks may have to be perfectly targeted. It is not sufficient to target high-degree nodes; one must target the nodes with the highest degree.

Robustness of Average Path Length
Diameter measures worst-case connection differences, but sometimes it is more useful to consider the average distance between nodes in the giant component. For many classes, including random graphs and scale-free networks, when the graph is highly connected, this average distance is proportional to $$\log_k N$$, where $$N$$ is the size of the graph and $$k$$ is the average connectivity. Under targeted removal of highest-degree nodes, when the fraction of removed nodes is close to the fragmentation threshold, the average distance is instead proportional to $$\sqrt{M}$$, where $$M$$ is the size of the giant component. The effect of this is that in large graphs, when highest-degree nodes are removed, average distances increase much more rapidly when the number of removed nodes is near the fragmentation threshold than they do earier in the removal process.