User:Solveering BenM/sandbox

History
M* was developed by Ben Masefield in 2015 whilst working on a project for Solveering LLC aimed at finding connections within a set of 3D graphs from a large number of starting nodes to a large number of end nodes. The computer implemented algorithm was patented in 2018

Algorithm
Let the node from which the path is to start be called the starting node and the node which is to be the destination be called the destination node. For each node in the graph, there are a limited number of neighboring nodes which are nodes that can be reached by traversing a single edge (or connection). The cost of traversing a single connection is considered the cost value.

The M* algorithm relies on a matrix (also called M*) which is created by:


 * 1) Create a zero matrix of size NxN (where N is the number of nodes in the graph).
 * 2) For each entry a(i,j) set the value in the matrix to be 1/cost of the connection from node i to node j.
 * 3) For each entry a(i,i) set the value in the matrix to be the negative sum of all the entries in row i.
 * 4) Add a term e to the first entry on the diagonal a(0,0). The value of e is to be any value that does not make this entry zero.
 * 5) Invert the matrix

In order to solve the path from the starting node to the destination node, the matrix is multiplied by a vector of length N where the entry in any row is determined by:

-1            if the node is the starting node

+1           if the node is the destination node

0              elsewhere

By multiplying the matrix with the vector, a new vector is obtained. This vector is considered the Potential of each node for the specific configuration of start and destination nodes. The M* matrix on the other hand is not dependent on the start/destination node and can thus be re-used for the same graph with different start/end configurations.

Obtaining the path is done (starting at the starting node) by:


 * 1) Calculate the value of the Potential at the current node minus the Potential at its neighboring node divided by the cost between them.
 * 2) By comparing the values for all possible neighboring nodes, the next node in the path is found by following the connection with the highest value.

Description
The M* algorithm is similar to other path finding algorithms in that it seeks to minimize the total path distance between nominally two locations. In this sense, the path distance can represent a number of things, not just a physical distance, but for example also travel time (in a road system) or effort required to traverse it. As long as there is a parameter that can be used to represent the quantity to be minimized that is applicable to a single connection (or edge in formal terminology) then the shortest total path can be found.

With some minor modifications, the algorithm uses an approach similar to that found in an electrical circuit where the starting node is the inflow from a current source and the destination node is the ground connection. The approach is the same as used for nodal analysis with some minor differences in how the matrix is set up. If the same approach were to be used, the matrix and solution would be fixed to one specific configuration of the start/destination nodes and the matrix would need to be recalculated each time. Instead, the nodes are used in a more generic form which allows the matrix to be written independently of the location of the current source so that it can be reused.

Solution of the matrix-vector calculation is not required prior to starting the path finding process. As the vector multiplying the matrix is sparse (only has entries for start/destination nodes), the relevant entries can be directly obtained during the computation of the values for all neighboring nodes at any given location. This not only saves computation in advance of the path finding, it also allows a single step to be calculated for cases where finding the next move is more important than finding the entire path.

Example
Given the graph

Where the nodes are identified by the numbers in the circles and the cost of traversing a connection is the value outside of the line. Following the approach for generating the M* Matrix, the following is found (e=0.1):

$$M* = \begin{vmatrix} 10.0&10.0&10.0&10.0&10.0&10.0&10.0&10.0&10.0\\10.0&7.651&8.035&8.15&7.974&8.083&8.916&8.321&8.174\\10.0&8.035&5.077&5.74&7.571&5.354&8.274&6.731&5.879\\10.0&8.15&5.74&3.932&7.605&5.428&8.083&6.142&4.482\\10.0&7.974&7.571&7.605&6.739&7.585&8.376&7.656&7.612\\10.0&8.083&5.354&5.428&7.585&4.753&8.194&6.485&5.548\\10.0&8.916&8.274&8.083&8.376&8.194&6.807&7.798&8.043\\10.0&8.321&6.731&6.142&7.656&6.485&7.798&5.262&6.019\\10.0&8.174&5.879&4.482&7.612&5.548&8.043&6.019&4.149 \end{vmatrix}$$

The steps for finding the path from Node 0 to Node 5 are thus (Vx->y indicates the value to go from Node x to Node y):

At Node 0:


 * V0->1=((10-10)-(8.083-10))/3=0.639
 * V0->6=((10-10)-(8.194-10))/5=0.361
 * Since 0.639 > 0.361, Node 1 is the next node on the path. Now, at node 1:


 * V1->0=((8.083-10)-(10-10))/3=-0.639
 * V1->2=((8.083-10)-(5.354-10))/7=0.390
 * V1->4=((8.083-10)-(7.585-10))/2=0.249

The value of -0.639 (V1->0) is simply the negative of V0->1 and could have been skipped.

Here, V1->2 is the largest value and thus Node 2 is next:


 * V2->1=((5.354-10)-(8.083-10))/7=-0.390
 * V2->3=((5.354-10)-(5.428-10))/9=-0.008
 * V2->4=((5.354-10)-(7.585-10))/11=-0.203
 * V2->5=((5.354-10)-(4.753-10))/1=0.055

From this the path concludes at Node 5. Though the last step may seem unnecessary as Node 5 was a neighboring node, running the last set of calculations is necessary as there may be cases where it is better to move to another node before ending at the destination. As a comparison to a Nearest Neighbor Search approach: if the same start/end locations were chosen, the path would go from Node 0 to 1, then 4, 6 (where it would get stuck trying to get back to 0), 7, 8, 3 and finally 5 for a total path cost of 30 compared to 11 for this approach.

Optimizations
The general approach whereby the matrix is multiplied by the vector to obtain the potential at each node wastes a number of calculations that are likely never used. As shown in the example, the potential can be calculated at each node directly from the matrix and thus only computes the values as-needed. This keeps the overall memory consumption smaller and allows for larger matrices to be computed and stored in such a way as to allow access to individual elements when needed rather than keep the entire data in memory.

Since there is no need to have a high level of precision in the data, it is possible to store the data as half-precision numbers or integers (multiplying them by a constant factor to keep a larger precision if needed). By using integer values (8, 16 or 32 bit) rather than floating point values, the calculations can be sped up.

Running Time
There are two aspects to the running time to consider: building the matrix and performing a step by step determination of the path.

Using the big-O notation and designating $$|V|$$ as the number of nodes, $$|C|$$ the number of connections $$|P|$$ the final number of steps along the path and $$|N|$$ the average number of neighbors for each node, then the time to build the matrix is in

$$O(|V|)$$since $$|N|$$ can be assumed a constant. Inversion of the matrix takes $$O(|V|^3)$$[1] which is generally slower than running the Dijkstra algorithm which runs in $$O(|V|^2)$$[2]. Once the matrix has been processed though, finding that path from a starting point to a destination takes $$O(|P|*|N|)$$which is independent on the number of nodes in the system.

Related problems

 * The algorithm is well suited to implementation in a Traveling Salesman Problem (TSP) where the distances between locations is given by the total distance under consideration of a road map rather than just the Euclidean distance.
 * The destination node is mathematically not limited to a single node and can be a plurality of nodes. The algorithm automatically determines the path to the closest node and different weights can be assigned to each destination.

Links
[1] https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations

[2] https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm