| 0xDE ( @ 2008-04-03 23:17:00 |
| Entry tags: | graph algorithms |
Reweighting a graph for faster shortest paths
When I teach shortest paths in my undergraduate algorithms classes, the methods I consider most important (in roughly that order) are breadth first search, Dijkstra's algorithm, A*, the linear time DAG shortest path algorithm (section 24.2 of CLRS), and Bellman-Ford. Every self-respecting computer scientist knows the first two, but it came as a bit of a shock to me to discover that very few of our incoming graduate students had even heard of A*.
Perhaps their ignorance should have been unsurprising: there's very little to say from the algorithms-analysis point of view about A*, so it often doesn't get covered in algorithms classes, and it's mentioned neither in the algorithms text I use (Goodrich and Tamassia's Algorithm Design) nor in the popular Introduction to Algorithms by Cormen et al. It's often considered an AI technique, but it may not get mentioned in many artificial intelligence classes either, because it's not knowledge representation or probabilistic reasoning or whatever else current AI researchers are generally interested in.
If you do ask someone who has heard of A* what it is, I think the answer you'll most often get is that it's a graph searching algorithm, similar to Dijkstra but more complicated. And if you're lucky they might also tell you that it's faster than Dijkstra. If you haven't heard of A* but read the Wikipedia article on it, you'll get the same impression of a more complicated version of Dijkstra, without even any explanation of why you'd want to use it. But that's not my current view of A*: I don't see it as an algorithm, but as a preconditioner that changes the weights of a graph to allow other algorithms (such as Dijkstra) to run faster. I think this view is not only simpler (if you already understand Dijkstra) and more general, but also more intuitive explanation for why A* is a good idea and when it would be appropriate to use it. So, what I'd like to do in the rest of this post is to explain this point of view.
Reweighting schemes
Dijkstra's algorithm, applied to the problem of finding a shortest path from a given start vertex to a given goal vertex that are at distance D from each other, will end up examining all vertices within distance D of the start. Geometrically, we can visualize this by drawing a circle, centered at the start with the shortest path to the goal as its radius; the algorithm will search everything inside the circle. The part that it searches along the actual shortest path is necessary, but everything else is wasted work.

The basic idea of A* is to change the distances in the graph in such a way that the shortest paths remain unchanged, but so that vertices that aren't on the shortest path appear farther from the start, causing fewer vertices to be expanded. The changed distances cause the "circle" to be distorted, in a way that (we hope) makes it smaller.

To change the edge lengths, suppose we have a real number ƒ(v) for every vertex v of a directed graph. Replace the edge length Lu,v of an edge (u,v) by the new length L*u,v = Lu,v − ƒ(u) + ƒ(v), and use these new lengths to measure the distances d*(u,v) between pairs of vertices. The same construction applies to an undirected graph, by replacing each undirected edge {u,v} by two directed edges (u,v) and (v,u) prior to reweighting.
If a path from u to v has length p in the original graph, it will have length p* = p − ƒ(u) + ƒ(v) in the reweighted graph; the weights at the vertices in the interior of the path contribute positively to one edge of the path and negatively to another edge, and these two contributions cancel out, leaving contributions only from the weights at the path endpoints. Therefore, if we compare any two paths between the same two vertices, the comparison is the same as if we did it in the original graph; shortest paths remain shortest paths.
However, for the same reason, if we are comparing the distances from some start vertex u to two different vertices v and w, the new distances d*(u,v) = d(u,v) − ƒ(u) + ƒ(v) and d*(u,w) = d(u,w) − ƒ(u) + ƒ(w) will not, in general, have the same ordering that they did before the reweighting: if we choose a small value for ƒ(v) and a large value for ƒ(w), the reweighting will cause v to appear closer to u than it was before, and w to appear farther away.
Thus, by choosing an appropriate weighting function ƒ we can distort the circle that has the start-goal shortest path as its radius, causing it to contain fewer unnecessary vertices, while preserving the shortest path between the start and the goal.
A* for monotonic weights
The simplest form of the A* algorithm is the following: First, choose a "monotonic heuristic", that is, a weight function ƒ such that the reweighted edge lengths are non-negative. For technical reasons, we'll also assume that ƒ is always non-negative and that it equals zero at the goal. Second, reweight the graph as described above. Third, run Dijkstra's algorithm on the reweighted graph. That's all! A* is not usually described this way. But, if you follow the sequence of steps performed by the usual pseudocode descriptions of A* and compare them to the steps described above, you'll find that they're identical.
Why should ƒ be required to be monotonic? Monotonicity is important because Dijkstra is guaranteed to work correctly when the lengths of all edges are non-negative. Therefore, when ƒ is a monotonic heuristic, A* correctly finds the shortest path.
For every non-goal vertex v, the reweighting causes v to appear farther away from the start vertex by ƒ(v) than it was in the unweighted graph, because ƒ is non-negative. However, the requirement that ƒ be zero at the goal implies that its reweighted distance from the start is essentially unchanged. Therefore, A* will never explore any additional vertices beyond the set of vertices explored by Dijkstra on the initial graph, and if we can find a weight function ƒ that is large enough then A* will explore a strict subset of the vertices explored by Dijkstra. Also, clearly, the same reweighting technique can be (and has been) applied not just to Dijkstra but also to any other general-purpose shortest path algorithm, such as bidirectional search or iterative deepening or certain K shortest paths algorithms.
A* for admissible weights
A* is often described in terms of an admissible heuristic: a non-negative weight function ƒ such that, for every vertex v, ƒ(v) is at most equal to the true distance to the goal. But, if we don't require monotonicity, it's easy to come up with examples of graphs in which the reweighted edges are negative, and in which Dijkstra can be tricked into finding a non-shortest path. In the example below, the lower path with length six to the penultimate vertex is expanded before the shorter upper path of length four is found, causing the algorithm to eventually find the wrong path to the goal.

So what is wrong? Isn't A* supposed to work for any admissible heuristic? It is, but it's missing a step in this case.
Graphs with non-negative edges aren't the only graphs for which Dijkstra's algorithm works correctly. Another important class of graphs on which it works are the trees. If the graph we're given isn't a tree, we can replace it by a path tree that has as its nodes the paths from the given starting vertex, where the parent of a node in the path tree is the prefix of the corresponding path formed by omitting the path's final edge.

The version of A* for an admissible but not monotonic heuristic is to choose the heuristic, use it to reweight the graph, and then apply Dijkstra to the path tree of the reweighted graph. As soon as any copy of the goal node is closed by Dijkstra's algorithm, we return the corresponding path as the shortest path to the goal. Admissibility of the heuristic implies that, before we could expand any non-shortest path to a copy of the goal vertex, we would have found all nodes on the true shortest path and therefore the shortest path itself. Therefore, if the heuristic is admissible, this algorithm finds the correct shortest path. Of course, it may expand many more nodes than Dijkstra due to the repetition of nodes caused by the path tree expansion, but this may be made up for by the reduction in nodes due to the A* reweighting.
Where do the weights come from?
The ideal weight function would be the actual distance to the goal. If we could use this as our weight function, it would be monotonic and admissible, and if we could use it to reweight the edges then all shortest path edges would get weight zero while all other edges would get nonzero weights. Therefore, using Dijkstra on this reweighted graph would focus exclusively on the shortest path itself with no wasted effort. It sounds silly to think about using this as an actual weight function for A*, though, rather than as an ideal to be approximated by our heuristic weight function, since the distance to the goal is the unknown that we're trying to compute: if we had a good heuristic way of computing it, we wouldn't need the search algorithm. But despite its silliness in this application, the idea is useful in other contexts: I used it as an important component of my K shortest paths algorithm, which uses a run of Dijkstra to compute the distances to the goal, reweights the edges using this distance as weight function, and then uses the modified weights to guide its search.
Another algorithm that computes a weight function algorithmically, for an arbitrary graph, is Johnson's algorithm for all pairs shortest paths in a graph with negative edge weights but no negative cycles. If all edge weights were non-negative, we could find all pairs shortest paths by running Dijkstra n times, once for each possible source, in time O(nm + n2 log n). If we just wanted shortest paths from a single source in a graph with negative weights, we could run Bellman-Ford, in time O(nm). Johnson's algorithm combines both of these ideas: run Bellman-Ford once using an artificial start vertex that has a zero-length edge to all other vertices, set ƒ(v) to be the absolute value of the distance from the start vertex as computed by this run of Bellman-Ford, reweight the graph, and now run Dijkstra for each possible source. The weights ƒ(v) computed in this way are monotonic, so the n runs of Dijkstra get non-negative edge weights and are guaranteed to find correct shortest paths. Some recent papers at SODA on shortest path implementations use distances from a few known "landmarks" to construct a weight function on the vertices in a similar way.
But for most applications of A*, these ideas are useless, because they involve a preprocessing phase that itself computes shortest paths, the very problem we're trying to solve. This is why the weight function is called a heuristic, and why the A* algorithm is studied as a branch of artificial intelligence: one generally has to import some domain-specific knowledge to the problem, some actual intelligence, to construct a good heuristic that will speed up the shortest path search. For finding shortest paths in road networks, for instance, one can use straight-line distance as a heuristic; it's not hard to show that this is monotonic and admissible. For finding shortest paths for sprites in computer games, a grid-based distance usually makes more sense, so one can use the Manhattan distance as the weight function. My recent paper on flip graphs suggested using a form of earthmover distance on an associated graph as a heuristic in an A* approach to computing flip distance (for point sets for which the other algorithms in my paper don't work well enough); again, it's not hard to show that it's monotonic and admissible.
In conclusion
If you think Dijkstra is the right algorithm for your problem, maybe A* is even righter. And if you think some other shortest path algorithm is right instead, A* still might help. You'll have to think some more, in order to come up with the distance estimate it needs, but you don't need any more code in the guts of the search algorithm.
And if you're thinking about implementing A*, think about implementing Dijkstra and graph reweighting separately. The combination does the same thing, and the modular design will make it easier to change one part while leaving the other alone.