Leiden algorithm

The Leiden algorithm is a community detection algorithm developed by Traag et al at Leiden University. It was developed as a modification of the Louvain method to address the issues with disconnected communities.

Graph components
Before defining the Leiden algorithm, it will be helpful to define some of the components of a graph.

Vertices and edges
A graph is composed of vertices (nodes) and edges. Each edge is connected to two vertices, and each vertex may be connected to zero or more edges. Edges are typically represented by straight lines, while nodes are represented by circles or points. In set notation, let $$V$$ be the set of vertices, and $$E$$ be the set of edges:

$$ \begin{align} V &:= \{v_1, v_2, \dots, v_n \} \\ E &:= \{e_{ij}, e_{ik}, \dots, e_{kl} \} \end{align} $$

where $$e_{ij}$$ is the directed edge from vertex $$v_i$$ to vertex $$v_j$$. We can also write this as an ordered pair:

$$ \begin{align} e_{ij} &:= (v_i, v_j) \end{align} $$

Community
A community is a unique set of nodes:

$$ \begin{align} C_i &\subseteq V \\ C_i &\bigcap C_j = \emptyset ~ \forall ~ i \neq j \end{align} $$

and the union of all communities must be the total set of vertices:

$$ \begin{align} V &= \bigcup_{i=1} C_i \end{align} $$

Partition
A partition is the set of all communities:

$$ \begin{align} \mathcal{P} &= \{C_1, C_2, \dots, C_n \} \end{align} $$

Quality
Similar to modularity, the quality function is used to assess how well the communities have been allocated. The Leiden algorithm uses the Constant Potts Model (CPM):

$$ \begin{align} \mathcal{H}(G, \mathcal{P}) &= \sum_{C \in \mathcal{P}} \end{align} $$
 * E(C, C)| - \gamma \binom{||C||}{2}

Algorithm


The Leiden algorithm is similar to that of the Louvain method, with some important modifications.

Step 1: First, each node in the network is assigned to its own community.

Step 2: Next, we decide which communities to move the nodes into and update the partition $$\mathcal{P}$$.

queue = V(G) # create a queue from the nodes

while queue != empty: node = queue.next # get the next node delta_H = 0 for C in communities: # compute the change in quality for each community if delta_H(node, C) > delta_H: delta_H = delta_H(node, C)     community = C  if delta_H > 0: move node to community outside_nodes = { node_i | (node, node_i) are edges and node_i is not in community } # find the nodes which are connected to the node but not in the community queue.add(outside_nodes not already in queue)

Step 3: Assign each node in the graph to its own community in a new partition called $$\mathcal{P}_{\text{refined}}$$.

Step 4: The goal of this step is to separate poorly-connected communities:

for C in communities of P: # find the nodes in the community which have lots of edges within the community well_connected_nodes = { node | node is in C, |E(node, C\node)| >= gamma ||node||(||C|| - ||node||) } for node in well_connected_nodes: if node is singleton under P_refined: well_connected_communities = { C_i | C_i is in P_refined, C_i is a subset of C, |E(C_i, C\C_i)| >= gamma*||C_i||(||C|| - ||C_i||) for C_i in well_connected_communities: compute probability P(C_i) # 0 if assignment of node to C_i decreases quality of P_refined, greater weights for greater quality increases assign node to C_i by sampling P(C_i) distribution

Step 5: Use the refined partition $$\mathcal{P}_{\text{refined}}$$ to aggregate the graph. Each community in $$\mathcal{P}_{\text{refined}}$$ becomes a node in the new graph $$G_{\text{agg}}$$.

Example: Suppose that we have:

$$ \begin{align} V &= \{ v_1, v_2, v_3, v_4, v_5, v_6, v_7 \} \\ C_1 &= \{ v_1, v_2, v_3, v_4 \} \\ C_2 &= \{ v_5, v_6, v_7 \} \\ \mathcal{P} &= \{ C_1, C_2 \} \\ \mathcal{P}_{\text{refined}} &= \{ C_{1a}, C_{1b}, C_2 \} \end{align} $$

Then our new set of nodes will be:

$$ \begin{align} V_{agg} &= \{ C_{1a} \mapsto w_{1a}, C_{1b} \mapsto w_{1b}, C_2 \mapsto w_{2} \} \end{align} $$

Step 6: Update the partition $$\mathcal{P}$$ using the aggregated graph. We keep the communities from partition $$\mathcal{P}$$, but the communities can be separated into multiple nodes from the refined partition $$\mathcal{P}_{\text{refined}}$$:

$$ \begin{align} \mathcal{P} &= \{ \{v ~ | ~ v \subseteq C, v \in V(G_{\text{agg}})\} ~ | ~ C \in \mathcal{P} \} \end{align} $$

Example: Suppose that $$C$$ is a poorly-connected community from the partition $$\mathcal{P}$$:

$$ \begin{align} C &= \{ v_1, v_2, v_3, v_4, v_5 \} \\ \mathcal{P} &= \{ C \} \end{align} $$

Then suppose during the refinement step, it was separated into two communities, $$C_1$$ and $$C_2$$:

$$ \begin{align} C_1 &= \{ v_1, v_2, v_3 \} \\ C_2 &= \{ v_4, v_5 \} \\ \mathcal{P}_{\text{refined}} &= \{ C_1, C_2 \} \end{align} $$

When we aggregate the graph, the new nodes will be:

$$ \begin{align} V(G_{\text{agg}}) &= \{ C_1, C_2 \} \end{align} $$

but we will keep the old partition:

$$ \begin{align} \mathcal{P} &= \{ \{ C_1, C_2 \} \} \end{align} $$

Step 7: Repeat Steps 2 - 6 until each community consists of only one node.