Parallel single-source shortest path algorithm

A central problem in algorithmic graph theory is the shortest path problem. One of the generalizations of the shortest path problem is known as the single-source-shortest-paths (SSSP) problem, which consists of finding the shortest paths from a source vertex $$s$$ to all other vertices in the graph. There are classical sequential algorithms which solve this problem, such as Dijkstra's algorithm. In this article, however, we present two parallel algorithms solving this problem.

Another variation of the problem is the all-pairs-shortest-paths (APSP) problem, which also has parallel approaches: Parallel all-pairs shortest path algorithm.

Problem definition
Let $$G=(V,E)$$ be a directed graph with $$|V|=n$$ nodes and $$|E|=m$$ edges. Let $$s$$ be a distinguished vertex (called "source") and $$c$$ be a function assigning a non-negative real-valued weight to each edge. The goal of the single-source-shortest-paths problem is to compute, for every vertex $$v$$ reachable from $$s$$, the weight of a minimum-weight path from $$s$$ to $$v$$, denoted by $$\operatorname{dist}(s,v)$$ and abbreviated $$\operatorname{dist}(v)$$. The weight of a path is the sum of the weights of its edges. We set $$\operatorname{dist}(u,v):=\infty$$ if $$v$$ is unreachable from $$u$$.

Sequential shortest path algorithms commonly apply iterative labeling methods based on maintaining a tentative distance for all nodes; $$\operatorname{tent}(v)$$ is always $$\infty$$ or the weight of some path from $$s$$ to $$v$$ and hence an upper bound on $$\operatorname{dist}(v)$$. Tentative distances are improved by performing edge relaxations, i.e., for an edge $$(v,w)\in E$$ the algorithm sets $$\operatorname{tent}(w):=\min\{\operatorname{tent}(w), \operatorname{tent}(v)+c(v,w)\}$$.

For all parallel algorithms we will assume a PRAM model with concurrent reads and concurrent writes.

Delta stepping algorithm
The delta stepping algorithm is a label-correcting algorithm, which means the tentative distance of a vertex can be corrected several times via edge relaxations until the last step of the algorithm, when all tentative distances are fixed.

The algorithm maintains eligible nodes with tentative distances in an array of buckets each of which represents a distance range of size $$\Delta$$. During each phase, the algorithm removes all nodes of the first nonempty bucket and relaxes all outgoing edges of weight at most $$\Delta$$. Edges of a higher weight are only relaxed after their respective starting nodes are surely settled. The parameter $$\Delta$$ is a positive real number that is also called the "step width" or "bucket width".

Parallelism is obtained by concurrently removing all nodes of the first nonempty bucket and relaxing their outgoing light edges in a single phase. If a node $$v$$ has been removed from the current bucket $$B[i]$$ with non-final distance value then, in some subsequent phase, $$v$$ will eventually be reinserted into $$B[i]$$, and the outgoing light edges of $$v$$ will be re-relaxed. The remaining heavy edges emanating from all nodes that have been removed from $$B[i]$$ so far are relaxed once and for all when $$B[i]$$ finally remains empty. Subsequently, the algorithm searches for the next nonempty bucket and proceeds as described above.

The maximum shortest path weight for the source node $$s$$ is defined as $$L(s):=\max\{\operatorname{dist}(s,v) : \operatorname{dist}(s,v)<\infty\}$$, abbreviated $$L$$. Also, the size of a path is defined to be the number of edges on the path.

We distinguish light edges from heavy edges, where light edges have weight at most $$\Delta$$ and heavy edges have weight bigger than $$\Delta$$.

Following is the delta stepping algorithm in pseudocode: 1 foreach $$v \in V$$ do $$\operatorname{tent}(v):=\infty$$ 2 $$relax(s, 0)$$;                                                    (*Insert source node with distance 0*) 3 while $$\lnot isEmpty(B)$$ do                           (*A phase: Some queued nodes left (a)*) 4     $$i:=\min\{j\geq 0: B[j]\neq \emptyset\}$$                      (*Smallest nonempty bucket (b)*) 5     $$R:=\emptyset$$                                                (*No nodes deleted for bucket B[i] yet*) 6     while $$B[i]\neq \emptyset$$ do                     (*New phase (c)*) 7         $$Req:=findRequests(B[i],light)$$                           (*Create requests for light edges (d)*) 8         $$R:=R\cup B[i]$$                                           (*Remember deleted nodes (e)*) 9         $$B[i]:=\emptyset$$                                         (*Current bucket empty*) 10        $$relaxRequests(Req)$$                                      (*Do relaxations, nodes may (re)enter B[i] (f)*) 11    $$Req:=findRequests(R,heavy)$$                                  (*Create requests for heavy edges (g)*) 12    $$relaxRequests(Req)$$                                          (*Relaxations will not refill B[i] (h)*) 13 14 function $$findRequests(V',kind:\{\text{light}, \text{heavy}\})$$:set of Request 15    return $$\{(w, \operatorname{tent}(v)+c(v,w)): v\in V' \land (v,w) \in E_\text{kind}\}$$ 16 17 procedure $$relaxRequests(Req)$$ 18    foreach $$(w, x)\in Req$$ do $$relax(w,x)$$ 19 20 procedure $$relax(w,x)$$                                            (*Insert or move w in B if $$x<\operatorname{tent}(w)$$*) 21    if $$x < \operatorname{tent}(w)$$ then 22        $$B[\lfloor \operatorname{tent}(w)/\Delta\rfloor]:=B[\lfloor \operatorname{tent}(w)/\Delta\rfloor]\setminus \{w\}      $$                      (*If in, remove from old bucket*) 23        $$B[\lfloor x/\Delta\rfloor]:=B[\lfloor x/\Delta\rfloor]\cup \{w\}      $$                                 (*Insert into new bucket*) 24        $$\operatorname{tent}(w):=x      $$

Example
Following is a step by step description of the algorithm execution for a small example graph. The source vertex is the vertex A and $$\Delta$$ is equal to 3.

At the beginning of the algorithm, all vertices except for the source vertex A have infinite tentative distances.

Bucket $$B[0]$$ has range $$[0,2]$$, bucket $$B[1]$$ has range $$[3,5]$$ and bucket $$B[2]$$ has range $$[6,8]$$.

The bucket $$B[0]$$ contains the vertex A. All other buckets are empty. The algorithm relaxes all light edges incident to $$B[0]$$, which are the edges connecting A to B, G and E.

The vertices B,G and E are inserted into bucket $$B[1]$$. Since $$B[0]$$ is still empty, the heavy edge connecting A to D is also relaxed. Now the light edges incident to $$B[1]$$ are relaxed. The vertex C is inserted into bucket $$B[2]$$. Since now $$B[1]$$ is empty, the heavy edge connecting E to F can be relaxed. On the next step, the bucket $$B[2]$$ is examined, but doesn't lead to any modifications to the tentative distances.

The algorithm terminates.

Runtime
As mentioned earlier, $$L$$ is the maximum shortest path weight.

Let us call a path with total weight at most $$\Delta$$ and without edge repetitions a $$\Delta$$-path.

Let $$C_\Delta$$ denote the set of all node pairs $$\langle u,v \rangle$$ connected by some $$\Delta$$-path $$(u,\dots, v)$$ and let $$n_\Delta:=|C_{\Delta}|$$. Similarly, define $$C_{\Delta+}$$ as the set of triples $$\langle u,v',v \rangle$$ such that $$\langle u,v' \rangle \in C_\Delta$$ and $$(v',v)$$ is a light edge and let $$m_\Delta:=|C_{\Delta+}|$$.

The sequential delta-stepping algorithm needs at most$$\mathcal{O} (n+m + n_\Delta + m_\Delta + L/\Delta)$$ operations. A simple parallelization runs in time $$\mathcal{O} \left(\frac{L}{\Delta}\cdot d\ell_\Delta \log n\right)$$.

If we take $$\Delta = \Theta(1/d)$$ for graphs with maximum degree $$d$$ and random edge weights uniformly distributed in $$[0,1]$$, the sequential version of the algorithm needs $$\mathcal{O}(n+m+dL)$$ total average-case time and a simple parallelization takes on average $$\mathcal{O}(d^2 L \log^2n)$$.

Graph 500
The third computational kernel of the Graph 500 benchmark runs a single-source shortest path computation. The reference implementation of the Graph 500 benchmark uses the delta stepping algorithm for this computation.

Radius stepping algorithm
For the radius stepping algorithm, we must assume that our graph $$G$$ is undirected.

The input to the algorithm is a weighted, undirected graph, a source vertex, and a target radius value for every vertex, given as a function $$r : V \rightarrow \mathbb{R}_+$$. The algorithm visits vertices in increasing distance from the source $$s$$. On each step $$i$$, the Radius-Stepping increments the radius centered at $$s$$ from $$d_{i-1}$$ to $$d_i$$, and settles all vertices $$v$$ in the annulus $$d_{i-1}<d(s,v)\leq d_i$$.

Following is the radius stepping algorithm in pseudocode: Input: A graph $$G=(V,E,w)$$, vertex radii $$r(\cdot)$$, and a source node $$s$$. Output: The graph distances $$\delta(\cdot)$$ from $$s$$. 1 $$\delta(\cdot)\leftarrow +\infty$$, $$\delta(s)\leftarrow 0$$ 2 foreach $$v \in N(s)$$ do $$\delta(v)\leftarrow w(s,v)$$, $$S_0\leftarrow \{s\}$$, $$i \leftarrow 1$$ 3 while $$|S_{i-1}|<|V|$$ do 4     $$d_i\leftarrow \min_{v\in V\setminus S_{i-1}}\{\delta(v) + r(v)\}$$ 5     repeat 6         foreach $$u \in V\setminus S_{i-1}$$ s.t $$\delta(u) \leq d_i$$ do 7             foreach $$v \in N(u)\setminus S_{i-1}$$ do 8                 $$\delta(v) \leftarrow \min\{\delta(v), \delta(u)+w(u,v)\}$$ 9     until no $$\delta(v)\leq d_i$$ was updated 10    $$S_i=\{v\;|\;\delta(v)\leq d_i\}$$ 11    $$i=i+1$$ 12 return $$\delta(\cdot)$$ For all $$S\subseteq V$$, define $$N(S)=\bigcup_{u\in S}\{v\in V\mid d(u,v)\in E\}$$ to be the neighbor set of S. During the execution of standard breadth-first search or Dijkstra's algorithm, the frontier is the neighbor set of all visited vertices.

In the Radius-Stepping algorithm, a new round distance $$d_i$$ is decided on each round with the goal of bounding the number of substeps. The algorithm takes a radius $$r(v)$$ for each vertex and selects a $$d_i$$ on step $$i$$ by taking the minimum $$\delta(v) + r(v)$$ over all $$v$$ in the frontier (Line 4).

Lines 5-9 then run the Bellman-Ford substeps until all vertices with radius less than $$d_i$$ are settled. Vertices within $$d_i $$ are then added to the visited set $$S_i $$.

Example
Following is a step by step description of the algorithm execution for a small example graph. The source vertex is the vertex A and the radius of every vertex is equal to 1.

At the beginning of the algorithm, all vertices except for the source vertex A have infinite tentative distances, denoted by $$\delta$$ in the pseudocode.

All neighbors of A are relaxed and $$S_0=\{A\}$$. The variable $$d_1$$ is chosen to be equal to 4 and the neighbors of the vertices B, E and G are relaxed. $$S_1=\{A,B,E,G\}$$ The variable $$d_2$$ is chosen to be equal to 6 and no values are changed. $$S_2=\{A,B,C,D,E,G\}$$.

The variable $$d_3$$ is chosen to be equal to 9 and no values are changed. $$S_3=\{A,B,C,D,E,F, G\}$$.

The algorithm terminates.

Runtime
After a preprocessing phase, the radius stepping algorithm can solve the SSSP problem in $$\mathcal{O}(m\log n)$$ work and $$\mathcal{O}(\frac{n}{p}\log n\log pL)$$ depth, for $$p\leq \sqrt{n}$$. In addition, the preprocessing phase takes $$\mathcal{O}(m\log n + np^2)$$ work and  $$\mathcal{O}(p^2)$$ depth, or $$\mathcal{O}(m\log n + np^2\log n)$$ work and  $$\mathcal{O}(p\log p)$$ depth.