# sparse - johnson's algorithm cycle

## Good algorithm for finding the diameter of a(sparse) graph? (8)

*Edit* I'm undeleting again, simply so I can continue commenting. I have some comments on Johnson's Algorithm below this answer. - Aaron

*My original comment* :
I too am curious about this problem, but don't have an answer. It seems related to the Minimum Spanning Tree, the subgraph connecting all vertices but having fewest (or lowest weight) edges. That is an old problem with a number of algorithms; some of which seem quite easy to implement.

I had initially hoped that the diameter would be obvious once the MST had been found, but I'm losing hope now :-( Perhaps the MST can be used to place a reasonable upper bound on the diameter, which you can use to speed up your search for the actual diameter?

I have a large, connected, sparse graph in adjacency-list form. I would like to find two vertices that are as far apart as possible, that is, the diameter of the graph and two vertices achieving it.

I am interested in this problem in both the undirected and directed cases, for different applications. In the directed case, I of course care about directed distance (the shortest directed path from one vertex to another).

**Is there a better approach than computing all-pairs shortest paths?**

*Edit*: By "as far apart as possible", I of course mean the "longest shortest path" -- that is, the maximum over all pairs of vertices of the shortest distance from one to the other.

A dirty method:

We know that for a graph G(V,E) with |V|=n and |E|=m, Dijkstra algorithm runs in `O(m+nlogn)`

and this is for a single source. For your all-pairs problem, you need to run Dijkstra for each node as a starting point.

However, if you have many machines, you can easily parallel this process.

This method is easiest to implement, definitely not very good.

Here's some thoughts on doing better than all pairs shortest paths in an undirected graph, although I'm not sure just how much of an improvement it would be.

Here's a subroutine that will find two nodes distance D apart, if there are any. Pick an arbitrary node x and compute M[x] = maximum distance from x to any other node (using any single source shortest path algorithm). If M[x] >= D, then x is one of our nodes and the other is easy to find. However, if M[x] < D, then neither endpoint we're looking for can be less than distance D - M[x] from x (because there are paths from that node to all other nodes, through x, of distance < D). So find all nodes of distance less than D-M[x] from x and mark them as bad. Pick a new x, this time making sure we avoid all nodes marked as bad, and repeat. Hopefully, we'll mark lots of nodes as bad so we'll have to do many fewer than |V| shortest path computations.

Now we just need to set D=diam(G) and run the above procedure. We don't know what diam(G) is, but we can get a pretty tight range for it, as for any x, M[x] <= diam(G) <= 2M[x]. Pick a few x to start, compute M[x] for each, and compute upper and lower bounds on diam(G) as a result. We can then do binary search in the resulting range, using the above procedure to find a path of the guessed length, if any.

Of course, this is undirected only. I think you could do a similar scheme with directed graphs. The bad nodes are those which can reach x in less than D-M[x], and the upper bound on diam(G) doesn't work so you'd need a larger binary search range.

I don't know of a better method for computing diameter other than all shortest paths, but Mathematica uses the following approximation for PseudoDiameter:

- A graph geodesic is the shortest path between two vertices of a graph. The graph diameter is the longest possible length of all graph geodesics of the graph. PseudoDiameter finds an approximate graph diameter. It works by starting from a vertex u, and finds a vertex v that is farthest away from u. This process is repeated by treating v as the new starting vertex, and ends when the graph distance no longer increases. A vertex from the last level set that has the smallest degree is chosen as the final starting vertex u, and a traversal is done to see if the graph distance can be increased. This graph distance is taken to be the pseudo-diameter.

http://reference.wolfram.com/mathematica/GraphUtilities/ref/PseudoDiameter.html

Not sure if it fits the bill, but interesting:

HADI: Fast Diameter Estimation and Mining in Massive Graphs with Hadoop

U. Kang, C. Tsourakakis, A. P. Appel, C. Faloutsos, J. Leskovec, “HADI: Fast Diameter Estimation and Mining in Massive Graphs with Hadoop”, CMU ML Tech Report CMU-ML-08-117, 2008.

One way to obtain an *estimate* of this number is to start at a random point, and do a breadth-first "grassfire" algorithm, marking the shortest distance to each node. The longest distance here is your estimate.

Running this extremely fast algorithm multiple times with different starting points and then taking the maximum will increase the accuracy of the estimate, and, of course, give you a decent lower bound. Depending on the distribution and connectivity of your graph, this estimate may even be accurate!

If your graph is large enough, asymptotic analysis of how the estimate changes as you add more samples might allow you to project to an even better guess.

If you're interested in an exact answer, it seems unlikely that you can get away with cutting corners too much unless your graph is easy to partition into components that are weakly connected with each other - in which case you can restrict your search to shortest path between all pairs of vertices in different components.

Well, I've put a little bit of thought on the problem, and a bit of googling, and I'm sorry, but I can't find any algorithm that doesn't seem to be "just find all pairs shortest path".

However, if you assume that Floyd-Warshall is the only algorithm for computing such a thing (Big-Theta of |V|^3), then I have a bit of good news for you: Johnson's Algorithm for Sparse Graphs (thank you, trusty CLRS!) computes all pairs shortest paths in (Big-Oh (|V|^2 * lgV + VE)), which should be asymptotically faster for sparse graphs.

Wikipedia says it works for directed (not sure about undirected, but at least I can't think of a reason why not), here's the link.

Is there anything else about the graph that may be useful? If it can be mapped easily onto a 2D plane (so, its planar and the edge weights obey the triangle inequality [it may need to satisfy a stricter requirement, I'm not sure]) you may be able to break out some geometric algorithms (convex-hull can run in nlogn, and finding the farthest pair of points is easy from there).

Hope this helps! - Agor

Edit: I hope the link works now. If not, just google it. :)

Yes there is a better method for finding the Diameter of the graph. Here I made a simple class to demonstrate it. The Vertices would be the Points on your graph.

```
public class MyTestClass
{
//Simple Point struct
struct Vertex
{
public float X, Y;
public Vertex(float pX, float pY)
{
X = pX;
Y = pY;
}
}
//For getting the bounds of your graph
struct BoundingBox
{
public float Left, Right, Bottom, Top;
public BoundingBox(float pLeft, float pRight, float pBottom, float pTop)
{
Left = pLeft;
Right = pRight;
Bottom = pBottom;
Top = pTop;
}
}
//Properties
Vertex[] vertices;
BoundingBox bound;
float diameter;
//Constructor
//Here is the fastest way to get the diameter >>
public MyTestClass()
{
//Init objects
vertices = new Vertex[100];
for(int i = 0; i != vertices.Length; ++i) vertices[i] = new Vertex(i, i);
bound = new BoundingBox(vertices[0].X, vertices[0].X, vertices[0].Y, vertices[0].Y);
//Calculate BoundingBox
for(int i = 0; i != vertices.Length; ++i)
{
bound.Left = (vertices[i].X <= bound.Left) ? vertices[i].X:bound.Left;
bound.Right = (vertices[i].X >= bound.Right) ? vertices[i].X:bound.Right;
bound.Bottom = (vertices[i].Y <= bound.Bottom) ? vertices[i].Y:bound.Bottom;//NOTE: If Y is faces down, then flip bottom & top comparison
bound.Top = (vertices[i].Y >= bound.Top) ? vertices[i].Y:bound.Top;
}
//Messure Size of the BoundingBox
float vecX = (bound.Right-bound.Left);
float vecY = (bound.Top-bound.Bottom);
diameter = (float)System.Math.Sqrt((vecX*vecX) + (vecY*vecY));
}
}
```