User:Ensemblized/sandbox

The degree-corrected stochastic block model is a generalization of the stochastic block model, a generative model for random graphs, taking into consideration the broad degree distribution of empirical networks. This model tends to produce graphs containing blocks, subsets of nodes characterized by having the same statistical tendencies, in addition to having an arbitrary degree distribution within each block. For example, edges may be more common within communities than between communities, and within communities different activity rates (degree) could exist.

The current formulation of the degree-corrected stochastic block model was introduced in 2011 in the field of network science by Karrer and Newman. The degree-corrected stochastic block model outperforms the simple stochastic degree-corrected in the task of community detection.

Definition
The stochastic block model takes the following parameters:
 * The number $$n$$ of vertices;
 * a partition of the vertex set $$\{1,\ldots,n\}$$ into disjoint subsets $$C_1,\ldots,C_r$$, called blocks;
 * a symmetric $$r \times r$$ matrix $$\mathbf{\omega} $$ indicating the interaction intensity between the nodes from the respective block;
 * $$\theta_i$$ The number of expected degrees of vertices $$\{1,\ldots,n\}$$.

The expected value of the adjacency matrix element $$A_{ij}$$, indicating the probability of connection between nodes $$i$$ and $$j$$ would be $$\theta_i \theta_j \omega_{g_i g_j}$$where $$g_i, g_j$$ respectively denote the block memberships of nodes $$i$$ and $$j$$. This probability incorporates both the activity rates (average degrees) of each node and also the interaction intensity based on their block membership.

Special cases
In case the activity rates $$\theta$$ are constant and uniform across all the nodes making up the network, the model would be identical to a regular Stochastic Block Model where the $$\mathbf{\omega} $$ matrix would be reduced to the probability of connection between the two blocks.

If the interaction between all groups is uniform ($$\omega_{g_i g_j} = C $$), the model will be reduced to a configuration model where $$\theta_i$$ indicate the outgoing stubs/degree of each node.

In case both $$\theta$$ and $$\mathbf{\omega} $$ are constant, then the result is the Erdős–Rényi model $$G(n,p)$$. This case is degenerate—the partition into communities becomes irrelevant—but it illustrates a close relationship to the Erdős–Rényi model.

Performance
In networks with highly homogeneous degree distributions, there is little to no advantage in using the degree corrected version of the stochastic block model instead of the regular version. This is expected as the extra variables $$\theta$$ in the degree corrected model would only alter the distribution in case of degree homogeneity.

However, in networks with high degree heterogeneity, the degree corrected stochastic block model has a clear advantage. For example, the regular stochastic block model is not capable of capturing the correct blocks of the famous Zachary Karate Club network  but Newman and Karrer demonstrate that the degree stochastic distinguishes the underlying communities of this network with high precision.

This model exhibits asymptotic consistency, meaning that it can correctly infer the blocks in a network produced by a generative stochastic block model. The degree corrected stochastic block model also outperforms the regular version in inferring the blocks of synthetic networks.

Algorithms
A diverse set of algorithms can be employed for the degree corrected stochastic block model inference problem. Popular algorithms include spectral clustering of the vertices,  semidefinite programming,  forms of belief propagation,  and community detection among others.

The recovery of blocks in the model can be done using the principle of maximum likelihood, but this amounts to solving a constrained or regularized cut problem such as minimum bisection that is typically NP-complete. Hence, no known efficient algorithms will correctly compute the maximum-likelihood estimate in the worst case.

Similar Methods
Several other methods have been developed to address the degree distribution heterogeneity of empirical networks. Some methods devise more complex exponential random graph models, but are not as analytically tractable as the stochastic block model. Other attempts allow overlapping blocks where nodes can be members of several blocks, or mixed memberships, introducing membership probability-like vectors.

Directed stochastic block models with arbitrary expected in-degree and out-degree have also been formalized, however, these models are not solvable in their closed form, limiting their applicability. Degree heterogeneity has also been incorporated in diverse forms.