User:Regnaron~enwiki/Playground

The A* search algorithm is used in Computer Science to compute a shortest Path between two nodes in a graph with positive Edge Weights. It was first described in 1968 by Peter Hart, Nils Nilson and Betram Raphael. The A* search algorithm is a so called informed search, what means that the algorithm uses a heuristic to search towards the goal. Therefore it does – in contrast to uninformed search algorithms – expand the nodes that seem to be most promising leading to the goal first.

Gerneral Informtion
The special feature of the A* search algorithm is that it searches towards the goal, whereby it can ignore many of the possible nodes one could consider when searching a path from the start node to the goal node. Hence the algorithm generally detects the goal very fast.

However one has to decide which nodes are the most promising leading to the goal. This is done by a heuristic with which one can make some assumptions about the graph on which the A* search algorithm is running. Herby the algorithms is working the more goal oriented, the more accurate the heuristic is.

If one requests the heuristic to be monotone, which means that it never overestimates the actual distance between two nodes, one can proof that the A* search algorithm always finds a shortest path. But vice versa, this means that if one uses a bad heuristic, it cannot be guaranteed that the path which the algorithm will find, is actually a shortest one. A formal proof for the aspect of the Optimality of the A* search algorithm can be found under Proof of optimality later in the article.

Note: If one modifies the A* search algorithm a bit, it suffices to use an admissable heuristic, which means that the heuristic must not overestimate the actual cost getting to the goal, but may overestimate the actual distance between two nodes in general, to guarantee that the algorithm will always find a shortest path. A further insight into the differences between monotone and admissable heuristics can be found under heuristics later in the article.

How it works
{|
 * [[Image:MapGermanyGraph.svg|542px|thumb|right|A map of Germany used for this example with the distances between some cities annotated. The numbers denote the distance between the two connected cities.]]

If one applies the A* search algorithm on a node u, the first thing it does, is computing all the reacheable neighbors of u. Afterwards the heuristic estimates for every of this neighbors how costly it is to get from there to the goal. For every of these neighbors v the A* search algorithm adds the estimated cost getting from v to the goal to the costs getting from u to v. This approach can be described by the following equation:

f(v) = g(v) + h(v)

Whereas h(v) stands for the estimated cost of getting from vto the goal, g(v) stands for the cumulated cost getting from the start node to v, and f(v) stands for the expected costs if one follows the path from the start to the current position (u) to v, and from there somehow further to the goal. The node that has the lowest f-value will be the one to be explored in the next step.

Example
An example for the application of the A* search algorithm is searching for a shortest path on a map. In this example the A* search algorithm will be used to compute a shortest path between the german citys of Frankfurt and Munich.

First of all, one has to find a suitable heuristic for the problem, which gets very close to the actual distance between two cities, but never overestimates the distance between the two. For this example, the straight line distance between two cities is a very good choice for a heuristic. On the one hand, all road-connections between two cities are at least as long as the airline distance between the two cities, but on the other hand, most roads will be build quite directly, so that the straight line distance should not be too optimistic. Thus, estimating the straight line distances between all cities and the goal Munich, leads to the following table which will be used for the heuristic function h(v).

Augsburg <--> Munich: 43 km Erfurt <--> Munich: 342 km Frankfurt <--> Munich: 353 km Karlsruhe <--> Munich: 260 km Kassel <--> Munich: 446 km Mannheim <--> Munich: 311 km Munich <--> Munich: 0 km Nürnberg <--> Munich: 151 km Stuttgart <--> Munich: 199 km Würzburg <--> Munich: 229 km

Furthermore one needs the costs for all paths between two cities for gradually computing how costly it is to get from the start to a city u. These values will later be used for the function g(u) and can be obtained from the map above used for this example.

Note: In the following example, the number written in brackets behind the name of the city is the already computed value f(v) for the corresponding city v, which itself is composed of the actual distance from Frankfurt to the City v and the estimated cost of getting from that city to Munich.

Algorithm
For finding a shortest path, the A* search algorithm needs six inputs: The graph (graph) it should run on, the node where the search should start (start), the goal node in which the shortest path should end (goal), a Priority Queue (S) that stores all the nodes already known by the algorithm (and whose f-values are thereby known), but have not yet been visited, a list (N) that stores all the nodes that already have been visited, and the heuristic h which estimates for every node the distance between that node and the goal.

In the first step, the starting node gets inserted into the Priority Queue. Afterwards a loop is being started which runs until either a shortest path to the goal is found, or a node that already had been explored should be explored once more, which only occurs when there is no path between the start and the goal.

Within the loop, the first node, which is – since the queue is a priority queue – the node with the lowest f-cost, is removed from the queue and added to the list of already visited nodes. Afterwards it is checked if the algorithm can already be terminated due to two reasons: First of all, the just visited node could already have been visited once, which could normally not happen due to the fact that a monotone heuristic was used. If the A* search algorithms tries to expand a node that has already been expanded earlier, there cannot exist a path from the start to the goal, since the algorithm would have found that path before exploring this node a for a second time. A more precise explanation for this can be found in the part Proof of optimality further down. If the algorithm terminated because it found the goal node, it calls the subfunction reconstruct_shortest_path which reconstructs the path from the goal node backwards to the starting node. Therefore the function gets the starting node, the goal node and the complete graph. Now it follows step by step, starting at the goal, the pointers to the predecessors until it reaches the starting node. These pointers have been created by the A* search algorithm within the graph while expanding the nodes, so that the function reconstruct_shortest_path for a node v simply need to lookup in the graph from which node u the node v was visited. After determine the predecessor of a nodem this predecessor will be stored in a Stack and reconstruct_shortest_path computes recursively the predecessor of that predecessor until it reaches the starting node. Now it can simply return the Stack which by now contains all the nodes on the shortest path in the correct order.

But if the goal has not yet been found by the algorithm, all nodes that are reacheable from the current node are computed by expanding the current node. In this step, for all newly discovered nodes the algorithm also saves from which node they have been discovered. So, if such a node is expanded later on itself, this data can simply be saved within the graph, because due to the fact that a monotone heuristic is used, the first path used to get to a node always is a shortest one. Now, the algorithm computes the f-value for all the newly discovered nodes, while the computation uses the already given formula f(n) = g(n) + h(n) for a node n. The value of g(n) can directly be read from the graph and corresponds to the weight of the edge between the current node and the node just discovered plus the cost of getting from the start to the current node. Applying the function h(n) is easy since it is directly acessible to the algorithm. At last all newly detected nodes get inserted into the priority queue and the loop starts again with examining the node with the lowest f-value.

Note: In addition to the here used method of implementing the A* search algorithm with a monotone heuristic there is also a implementation which heavily uses a so called closed list of already visited nodes where it has to check in each step if the currently found route to a node is better or worse than the already computed one. This is necessary if one does not use a monotone heuristic, but just an admissable one. This is not necessary if one uses a monotone heuristic where it can be guaranteed that the first path computed to any node is in fact a shortest path to that node. A more-depth distinction between monotone and admissable heuristics can be found under heuristics further down in the article.

/* * Algorithm reconstructing a shortest path *  * P: computed shortest path * start: starting node * node: goal node */ reconstruct_shortest_path (start, node, graph) {   if (node != start) {      push(node, P); // Read the predecessor of the current node predecessor := getPredecessor(node, graph); reconstruct_shortest_path (start, predecessor, graph) }   else return P; }

/* * The main A* search algorithm *  * graph: The graph being examined * start: starting node * goal: goal node * h: heuristic for estimating the distance between any node and the goal * S: priority queue of the nodes whose f-value is already known * N: list of all nodes already visited */ A-Star (graph, start, goal, h, S, N) { insert (start,S); while not isEmpty(S) {      current_node := pop(S); if current_node in N then // There exists no path between start and goal return fail; insert (current_node, N); if (current_node == goal) then return reconstruct_shortest_path(start,goal, graph); else {         // Compute all nodes reacheable from the current node successors := expand(current_node, graph); // Save the predecessor of the current node save_predecessor_in_graph(current_node, graph) forall s in sucessors do {            // Save the node from which s was discovered predecessor(s) := current_node; // Compute and save the f-values f(s) := g(s) + h(s); insert(s,S); }      }    }    // No path could be found return fail; }

Heuristics
When considering heuristics for the A* search algorithm, one distinguishes between two kinds of heuristics: On the one hand admissble heuristics, which never overestimate the total cost getting from a node to the goal, but which may overestimate the cost getting from one node to another, and on the other hand monotone heuristics that guarantee not just to never overestimate the total cost from any node to the goal, but that are also guarantee to never overestimate the total cost between any two nodes. Hence, if a heuristic is monotone, this is a stronger proposition than if the heuristic would just be admissable. Even though it is possible to construct heuristics that are admissable but not monotone, most heuristics are both. The straight line distance, a heuristic for estimating the distance between two points is not just admissable, but also monotone, since it is not possible to find any way getting from point A to point B that is shorter than the straight line distance between those two points.

The most important difference between those two heuristics emerges when trying to implement the A* search algorithm. If one only can use an admissable heuristic, it may happen that the actual distance between two nodes u and v is overestimated by the heuristic, and therefore the algorithm may get from u to v via another node w, which is obviously a detour, if one could directly get from u to v. This now entails in the fact that the computed distance between u and v is not optimal, since there certainly exists a shorter path from u to v which just omits the vertex w. This now leads to the problem that one has to check for each discovered node if the newly discovered path to that node is actually shorter or longer as the previously computed one. If the new path is shorter than the old one, the old path has to be replaced by the new one, and if the newly computed path is longer than the old one was, the new path has to be discarded. On the other hand, if one chooses a monotone heuristic, this is unnecessary, since every time a node gets expanded the first time, the path to this node found so far is assured to be a shortest one.

Furthermore the chosen heuristic also has great influences on the running time of the A* search algorithm. On the one hand, there is the perfect heuristic, which will in each step just estimate the exact cost of getting from u to the goal, but this is useless, as it is the solution to the problem that should be solved by the A* search algorithm. On the other hand the trivial heuristic that estimates a value of 0 for getting from any node to the goal is obviously monotone, but is as useless as the perfect heuristic, as this "heuristic" does not help at all deciding which way to go, so that the algorithm has to search blindly for the goal, which it can just find by coincidence. This version of the A* search algorithm, where the heuristic estimates a distance of 0 between any node and the goal is better known as Dijkstra's algorithm. So a good heuristic is a tradeoff between accuracy and the cost it takes to compute it.

Proof of optimality
As already mentioned, a good heuristic is one of the key requirements for the A* search algorithm to work efficently. If one assumes that the used heuristic is a monotone one, which means that it never overestimates the distance between any two nodes, and if one furthermore assumes that the graph on which the algorithm runs has no negative edge weights, one can proof that the A* search algorithm will always find a shortest path, if one exists. If one examines how the algorithm works, one observes, that every time a node is picked, it is the one with the lowest f-value. But since a monotone heuristic was chosen, and therefore by assumption the estimated distance between any two nodes is never overestimated by the algorithm, and the costs for getting to the node u that currently has to be estimated, are known (because the algorithm has already found a path leading to u), also, the final f-value for that node u is a optimistic estimate, which will definately not be more than the actual cost. If there exists a node a with f(a) = 450, this node will not be expanded until all other nodes x with f(x) < 450) got expanded. So the algorithm cannot omit any nodes that possibly could lead cheaper to the goal. However, if one would allow negative edge weights, there could exists nodes – due to the edges with negative weights leading to them – with f-cost lower than 450. If one writes down these considerations, the optiality of the A* search algorithm can formally proven as follows:

To be proofed: Every path to the goal computed by the A* search algorithm is a shortest path and thereby an optimal solution.

Assumption: The A* search algorithm finds a suboptimal solution G2, while the optimal solution is G1 and has costs of C1.

Since for all Goal nodes the estimated distance from the goal to the goal is 0, the following equation holds:

h(G2) = 0

Since $$G_2$$ is a suboptimal solution after the made assumption the following equation holds:

f(G2) = g(G2) + h(G2) = g(G2) &gt; C1

If one now considers an arbitrary node n on the shortes path to the optimal goal G1 and the fact that the heuristic never overestimates the actual costs, it is true that:

f(n) = g(n) + h(n) &le; C1

By combining the two equations, one obtains:

f(n) &le; C1 < f(G2)

But this now means that the A* search algorithm never explores the node G2 before exploring the node G1 which is the optimal solution. So the Algorithm first discovers the solution G1 and thereby in fact computes a shortest path.

Field of application
The A* search algorithm is applied when computing a shortest path between two poins, as for example the route planner in a car or a route planner in the Internet. Another field of application is Artificial Intelligence as in computer games, where the algorithm is heavily used when some unit has to find its way to a special point on the map. This example can be generalized for robots or software agents, which autonomously should find a route in some specified world.