star - route finding algorithm

What algorithms compute directions from point A to point B on a map? (12)

Graph algorithms like Dijkstra's algorithm will not work because the graph is enormous.

This argument doesn't necessarily hold because Dijkstra will not usually look at the complete graph but rather just a very small subset (the better interconnected the graph, the smaller this subset).

Dijkstra may actually perform rather well for well-behaved graphs. On the other hand, with careful parametrization A* will always perform just as good, or better. Have you already tried how it would perform on your data?

That said, I'd also be very interested to hear about other peoples' experiences. Of course, prominent examples like Google Map's search are particularly interesting. I could imagine something like a directed nearest neighbour heuristic.

How do map providers (such as Google or Yahoo! Maps) suggest directions?

I mean, they probably have real-world data in some form, certainly including distances but also perhaps things like driving speeds, presence of sidewalks, train schedules, etc. But suppose the data were in a simpler format, say a very large directed graph with edge weights reflecting distances. I want to be able to quickly compute directions from one arbitrary point to another. Sometimes these points will be close together (within one city) while sometimes they will be far apart (cross-country).

Graph algorithms like Dijkstra's algorithm will not work because the graph is enormous. Luckily, heuristic algorithms like A* will probably work. However, our data is very structured, and perhaps some kind of tiered approach might work? (For example, store precomputed directions between certain "key" points far apart, as well as some local directions. Then directions for two far-away points will involve local directions to a key points, global directions to another key point, and then local directions again.)

What algorithms are actually used in practice?

PS. This question was motivated by finding quirks in online mapping directions. Contrary to the triangle inequality, sometimes Google Maps thinks that X-Z takes longer and is farther than using an intermediate point as in X-Y-Z. But maybe their walking directions optimize for another parameter, too?

PPS. Here's another violation of the triangle inequality that suggests (to me) that they use some kind of tiered approach: X-Z versus X-Y-Z. The former seems to use prominent Boulevard de Sebastopol even though it's slightly out of the way.

Edit: Neither of these examples seem to work anymore, but both did at the time of the original post.

An all-pairs shortest path algorithm will compute the shortest paths between all vertices in a graph. This will allow paths to be pre-computed instead of requiring a path to be calculated each time someone wants to find the shortest path between a source and a destination. The Floyd-Warshall algorithm is an all-pairs shortest path algorithm.

I am a little suprised to not see Floyd Warshall's algorithm mentioned here. This algorithm work's very much like Dijkstra's. It also has one very nice feature which is it allows you to compute as long as you would like to continue allowing more intermediate vertices. So it will naturally find the routes which use interstates or highways fairly quickly.

I had some more thoughts on this:

1) Remember that maps represent a physical organization. Store the latitude/longitude of every intersection. You don't need to check much beyond the points that lie in the direction of your target. Only if you find yourself blocked do you need to go beyond this. If you store an overlay of superior connections you can limit it even more--you will normally never go across one of those in a way that goes away from your final destination.

2) Divide up the world into a whole bunch of zones defined by limited connectivity, define all connectivity points between the zones. Find what zones your source and target are in, for the start and end zone route from your location to each connection point, for the zones between simply map between connection points. (I suspect a lot of the latter is already pre-calculated.)

Note that zones can be smaller than a metropolitan area. Any city with terrain features that divide it up (say, a river) would be multiple zones.

I see what's up with the maps in the OP:

Look at the route with the intermediate point specified: The route goes slightly backwards due to that road that isn't straight.

If their algorithm won't backtrack it won't see the shorter route.

I was very curious about the heuristics used, when a while back we got routes from the same starting location near Santa Rosa, to two different campgrounds in Yosemite National Park. These different destinations produced quite different routes (via I-580 or CA-12) despite the fact that both routes converged for the last 100 miles (along CA-120) before diverging again by a few miles at the end. This was quite repeatable. The two routes were up to 50 miles apart for around 100 miles, but the distances/times were pretty close to each other as you would expect.

Alas I cannot reproduce that - the algorithms must have changed. But it had me curious about the algorithm. All I can speculate is that there was some directional pruning which happened to be exquisitely sensitive to the tiny angular difference between the destinations as seen from far away, or there were different precomputed segments selected by the choice of final destination.

I've not worked on Google or Microsoft or Yahoo Maps before, so I can't tell you how they work.

However, I did architect a custom supply chain optimization system for an energy company which included a scheduling and routing application for their fleet of trucks. However, our criteria on routing was far more business-specific than where is construction or traffic slows or lane closures.

We employed a technique called ACO (Ant colony optimization) to schedule and route trucks. This technique is an AI technique that was applied to the traveling salesman problem to solve routing problems. The trick with ACO is to build an error calculation based upon known facts of the routing so that the graph solving model knows when to quit (when is the error small enough).

You can google ACO or TSP to find more on this technique. I've not used any of the open-source AI tools for this however, so cannot suggest one (though I heard SWARM was pretty comprehensive).

Just addressing the triangle inequality violations, hopefully the extra factor they're optimising for is common sense. You don't necessarily want the shortest or fastest route, as it can lead to chaos and destruction. If you want your directions to prefer the major routes that are truck-friendly and can cope with having every sat-nav-following driver sent down them, you quickly discard the triangle inequality[1].

If Y is a narrow residential street between X and Z, you probably do only want to use the shortcut via Y if the user explicitly asks for X-Y-Z. If they ask for X-Z, they should stick to major roads even if it's a bit further and takes a bit longer. It's similar to Braess's paradox - if everyone tries to take the shortest, fastest route, the resulting congestion means that it's not the fastest route for anyone any more. From here we stray from graph theory into game theory.

[1] In fact, any hope that the distances produced will be a distance function in the mathematical sense dies when you allow one-way roads and lose the symmetry requirement. Losing the triangle inequality too is just rubbing salt in the wound.

Probably similar to the answer on pre-computed routes between major locations and layered maps, but my understanding is that in games, to speed up A*, you have a map that is very coarse for macro navigation, and a fine-grained map for navigation to the boundary of macro directions. So you have 2 small paths to calculate, and hence your search space is much much smaller than simply doing a single path to the destination. And if you're in the business of doing this a lot, you'd have a lot of that data pre-computed so at least part of the search is a search for pre-computed data, rather than a search for a path.

Speaking as someone who spent 18 months working at a mapping company, which included working on the routing algorithm... yes, Dijkstra's does work, with a couple of modifications:

  • Instead of doing Dijkstra's once from source to dest, you start at each end, and expand both sides until they meet in the middle. This eliminates roughly half the work (2*pi*(r/2)^2 vs pi*r^2).
  • To avoid exploring the back-alleys of every city between your source and destination, you can have several layers of map data: A 'highways' layer that contains only highways, a 'secondary' layer that contains only secondary streets, and so forth. Then, you explore only smaller sections of the more detailed layers, expanding as necessary. Obviously this description leaves out a lot of detail, but you get the idea.

With modifications along those lines, you can do even cross-country routing in a very reasonable timeframe.

The current state of the art in terms of query times for static road networks is the Hub labelling algorithm proposed by Abraham et al. . A through and excellently written survey of the field was recently published as a Microsoft technical report .

The short version is...

The Hub labelling algorithm provides the fastest queries for static road networks but requires a large amount of ram to run (18 GiB).

Transit node routing is slightly slower, although, it only requires around 2 GiB of memory and has a quicker preprocessing time.

Contraction Hierarchies provide a nice trade off between quick preprocessing times, low space requirements (0.4 GiB) and fast query times.

No one algorithm is completely dominate...

This Google tech talk by Peter Sanders may be of interest

Also this talk by Andrew Goldberg

An open source implementation of contraction hierarchies is available from Peter Sanders research group website at KIT.

Also an easily accessible blog post written by Microsoft on there usage of the CRP algorithm...

This is pure speculation on my part, but I suppose that they may use an influence map data structure overlaying the directed map in order to narrow the search domain. This would allow the search algorithm to direct the path to major routes when the desired trip is long.

Given that this is a Google app, it's also reasonable to suppose that a lot of the magic is done via extensive caching. :) I wouldn't be surprised if caching the top 5% most common Google Map route requests allowed for a large chunk (20%? 50%?) of requests to be answered by a simple look-up.