Capacitated minimum spanning tree

Capacitated minimum spanning tree is a minimal cost spanning tree of a graph that has a designated root node $$r$$ and satisfies the capacity constraint $$c$$. The capacity constraint ensures that all subtrees (maximal subgraphs connected to the root by a single edge) incident on the root node $$r$$ have no more than $$c$$ nodes. If the tree nodes have weights, then the capacity constraint may be interpreted as follows: the sum of weights in any subtree should be no greater than $$c$$. The edges connecting the subgraphs to the root node are called gates. Finding the optimal solution is NP-hard.

Algorithms
Suppose we have a graph $$G = (V, E)$$, $$n = |G|$$ with a root $$r \in G$$. Let $$a_{i}$$ be all other nodes in $$G$$. Let $$c_{ij}$$ be the edge cost between vertices $$a_{i}$$ and $$a_{j}$$ which form a cost matrix $$C = {c_{ij}}$$.

Esau-Williams heuristic
Esau-Williams heuristic finds suboptimal CMST that are very close to the exact solutions, but on average EW produces better results than many other heuristics.

Initially, all nodes are connected to the root $$r$$ (star graph) and the network's cost is $$\displaystyle\sum_{i=0}^n c_{ri}$$; each of these edges is a gate. At each iteration, we seek the closest neighbor $$a_{j}$$ for every node in $$G-{r}$$ and evaluate the tradeoff function: $$t(a_{i}) = g_{i} - c_{ij}$$. We look for the greatest $$t(a_{i})$$ among the positive tradeoffs and, if the resulting subtree does not violate the capacity constraints, remove the gate $$g_{i}$$ connecting the $$i$$-th subtree to $$a_{j}$$ by an edge $$c_{ij}$$. We repeat the iterations until we can not make any further improvements to the tree.

Esau-Williams heuristics for computing a suboptimal CMST:

function CMST(c,C,r): T = {$$c_{1r}$$, $$c_{2r}$$, ..., $$c_{nr}$$} while have changes: for each node $$a_{i}$$ $$a_{j}$$ = closest node in a different subtree $$t(a_{i})$$ = $$g_{i}$$ - $$c_{ij}$$ t_max = max($$t(a_{i})$$) k = i such that $$t(a_{i})$$ = t_max if ( cost(i) + cost(j) <= c) T = T - $$g_{k}$$ T = T union $$c_{kj}$$ return T

It is easy to see that EW finds a solution in polynomial time.

Ahuja's heuristic
Ahuja's heuristic uses a local search in a large multi-exchange neighborhood from a randomized greedy initial solution.

Initial solution
The initial solution is found by using a randomized version of Esau-Williams. Randomization is achieved by executing a uniformly random join from the best $$p$$ ones instead of the best one in each step.

Local Search Neighborhood
Let $$T$$ be the initial solution with root $$r$$. The neighborhood consists of any combination of a single node or subtree (general subtrees, not as in the introduction of this article) displacing one in a different component of $$T\setminus r$$ such that the displaced structure is the next displacer, the last displacer displaces the first displacer, no original component has more than one displacer and the capacity is not exceeded in any resulting component.

Improvement Graph
An improvement graph is a tool to search a very large neighborhood efficiently. Paths through an improvement graph correspond to changes to a solution and the cost of the path is the change in the cost of the solution when applying the change. Here the improvement graph is a directed multigraph built by using 2 copies $$i',i''$$ of each node $$i\in V(T)$$ and up to 4 edges from any node to any node in a different component of $$T\setminus r$$. The edge $$i',j''$$ corresponds to the change of removing the node $$i$$ from its original component and replacing the subtree rooted at $$j$$ in the target component. Combining nodes $$i'$$ and subtrees $$i''$$ yields the 4 possible edges. An edge exists if the corresponding change does not lead to the target component exceeding the capacity. The cost of an edge is the difference in the cost of the minimal spanning trees on the vertices in the target component before and after the displacement. Thus neighbors in the local search correspond to cycles in the improvement graph that contain at most one node from each component.

Local Search Step
The local search step uses a dynamic programming approach to find a minimum cost cycle in the improvement graph. Paths through the improvement graph with increasing length are generated and only the most favorable with the same start and end as well as involved components is stored. To this end a hash table with the tuple of those 3 properties as key is used to hold paths. Since in each negative cycle there is a node such that all paths within that cycle containing this node have negative cost, only paths with negative cost need to be considered at all. As the comparison of sets of involved components between paths is one of the most common operations in the algorithm, it is implemented as comparison of indicator bit arrays stored as integers for speed. This however clearly stems from a lot of hash collisions, which might be a consequence of the particular choice of hash function and table structure, as well as high load factor due to space restrictions (paper from 2003).

Performance
At the time the paper was written (2003) this algorithm was state of the art on a standard operations research benchmark. The execution was dominated by the building (respectively updating) of the improvement graph. The number of edges in the improvement graph empirically scaled quadratically with the size of the input graph and since this determines the number of times the comparatively complex step of finding a minimum spanning tree has to be run, this is the most critical factor. Thus one can conclude that less dense input graphs greatly benefit the running time, as this reduces the number of edges in the improvement graph.

Applications
CMST problem is important in network design: when many terminal computers have to be connected to the central hub, the star configuration is usually not the minimum cost design. Finding a CMST that organizes the terminals into subnetworks can lower the cost of implementing a network.