Distance oracle

In computing, a distance oracle (DO) is a data structure for calculating distances between vertices in a graph.

Introduction
Let G(V,E) be an undirected, weighted graph, with n = |V| nodes and m = |E| edges. We would like to answer queries of the form "what is the distance between the nodes s and t?".

One way to do this is just run the Dijkstra algorithm. This takes time $$O(m + n \log n)$$, and requires no extra space (besides the graph itself).

In order to answer many queries more efficiently, we can spend some time in pre-processing the graph and creating an auxiliary data structure.

A simple data structure that achieves this goal is a matrix which specifies, for each pair of nodes, the distance between them. This structure allows us to answer queries in constant time $$O(1)$$, but requires $$O(n^2)$$ extra space. It can be initialized in time $$O(n^3)$$ using an all-pairs shortest paths algorithm, such as the Floyd–Warshall algorithm.

A DO lies between these two extremes. It uses less than $$O(n^2)$$ space in order to answer queries in less than $$O(m + n \log n)$$ time. Most DOs have to compromise on accuracy, i.e. they don't return the accurate distance but rather a constant-factor approximation of it.

Approximate DO
Thorup and Zwick describe more than 10 different DOs. They then suggest a new DO that, for every k, requires space $$O(k n^{1+1/k})$$, such that any subsequent distance query can be approximately answered in time $$O(k)$$. The approximate distance returned is of stretch at most $$2k-1$$, that is, the quotient obtained by dividing the estimated distance by the actual distance lies between 1 and $$2k-1$$. The initialization time is $$O(kmn^{1/k})$$.

Some special cases include: Higher values of k do not improve the space or preprocessing time.
 * For $$k=1$$ we get the simple distance matrix.
 * For $$k=2$$ we get a structure using $$O(n^{1.5})$$ space which answers each query in constant time and approximation factor at most 3.
 * For $$k=\lfloor \log n \rfloor$$, we get a structure using $$O(n \log n)$$ space, query time $$O(\log n)$$, and stretch $$O(\log n)$$.

DO for general metric spaces
The oracle is built of a decreasing collection of k+1 sets of vertices:
 * $$A_0 = V$$
 * For every $$i=1,\ldots,k-1$$: $$A_i$$ contains each element of $$A_{i-1}$$, independently, with probability $$n^{-1/k}$$. Note that the expected size of $$A_i$$ is $$n^{1-i/k}$$. The elements of $$A_i$$ are called i-centers.
 * $$A_k = \emptyset$$

For every node v, calculate its distance from each of these sets:
 * For every $$i=0,\ldots,k-1$$: $$d(A_i,v) = \min{(d(w,v)|w\in A_i}$$ and $$p_i(v) = \arg \min{(d(w,v)|w\in A_i}$$. I.e., $$p_i(v)$$ is the i-center nearest to v, and $$d(A_i,v)$$ is the distance between them. Note that for a fixed v, this distance is weakly increasing with i. Also note that for every v, $$d(A_0,v)=0$$ and $$p_0(v)=v$$.
 * $$d(A_k,v)=\infty$$.

For every node v, calculate:
 * For every $$i=0,\ldots,k-1$$: $$B_i(v) = \{w\in A_i\setminus A_{i+1}|d(w,v)<d(A_{i+1},v)\}$$. $$B_i(v)$$ contains all vertices in $$A_i$$ which are strictly closer to v than all vertices in $$A_{i+1}$$. The partial unions of $$B_i$$s are balls in increasing diameter, that contain vertices with distances up to the first vertex of the next level.

For every v, compute its bunch: It is possible to show that the expected size of $$B(v)$$ is at most $$kn^{1/k}$$.
 * $$B(v) = \bigcup_{i=0}^{k-1}B_i(v).$$

For every bunch $$B(v)$$, construct a hash table that holds, for every $$w\in B(V)$$, the distance $$d(w,v)$$.

The total size of the data structure is $$O(kn+\Sigma|B(v)|)=O(kn+nkn^{1/k})=O(kn^{1+1/k})$$

Having this structure initialized, the following algorithm finds the distance between two nodes, u and v:


 * $$w:=u, i:=0$$
 * while $$w\notin B(v)$$:
 * $$i:=i+1$$
 * $$(u,v):=(v,u)$$ (swap the two input nodes; this does not change the distance between them since the graph is undirected).
 * $$w := p_i(u)$$
 * return $$d(w,u)+d(w,v)$$

It is possible to show that, in each iteration, the distance $$d(w,u)$$ grows by at most $$d(v,u)$$. Since $$A_{k-1}\subseteq B(v)$$, there are at most k-1 iterations, so finally $$d(w,u)\leq (k-1)d(v,u)$$. Now by the triangle inequality, $$d(w,v)\leq d(w,u)+d(u,v)\leq kd(v,u)$$, so the distance returned is at most $$(2k-1)d(u,v)$$.

Improvements
The above result was later improved by Patrascu and Roditty who suggest a DO of size $$O(n^{4/3} m^{1/3})$$ which returns a factor 2 approximation.

Reduction from set intersection oracle
If there is a DO with an approximation factor of at most 2, then it is possible to build a set intersection oracle (SIO) with query time $$O(1)$$ and space requirements $$O(N+n)$$, where n is the number of sets and N the sum of their sizes; see set intersection oracle.

It is believed that the SIO problem does not have a non-trivial solution. I.e., it requires $$\Omega(n^2)$$ space to answer queries in time $$O(1)$$, e.g. using an n-by-n table with the intersection between each two sets. If this conjecture is true, this implies that there is no DO with an approximation factor of less than 2 and a constant query time.