java - method - under what circumstances inlining functions can make your code slower

Java-calling static methods vs manual inlining-performance overhead (4)

However, the other day, I replaced this manually inlined code with invocation of static methods and seen a performance boost. How is that possible?

Probably the JVM profiler sees the bottleneck more easily if it is in one place (a static method) than if it is implemented several times separately.

I am interested whether should I manually inline small methods which are called 100k - 1 million times in some performance-sensitive algorithm.

First, I thought that, by not inlining, I am incurring some overhead since JVM will have to find determine whether or not to inline this method (or even fail to do so).

However, the other day, I replaced this manually inlined code with invocation of static methods and seen a performance boost. How is that possible? Does this suggest that there is actually no overhead and that by letting JVM inline at "its will" actually boosts performance? Or this hugely depends on the platform/architecture?

(The example in which a performance boost occurred was replacing array swapping (int t = a[i]; a[i] = a[j]; a[j] = t;) with a static method call swap(int[] a, int i, int j). Another example in which there was no performance difference was when I inlined a 10-liner method which was called 1000000 times.)

I have seen something similar. "Manual inlining" isn't necessarily faster, the result program can be too complex for optimizer to analyze.

In your example let's make some wild guesses. When you use the swap() method, JVM may be able to analyze the method body, and conclude that since i and j don't change, although there are 4 array accesses, only 2 range checks are needed instead of 4. Also the local variable t isn't necessary, JVM can use 2 registers to do the job, without involving r/w of t on stack.

Later, the body of swap() is inlined into the caller method. That is after the previous optimization, so the saves are still in place. It's even possible that caller method body has proved that i and j are always within range, so the 2 remaining range checks are also dropped.

Now in the manually inlined version, the optimizer has to analyze the whole program at once, there are too many variables and too many actions, it may fail to prove that it's safe to save range checks, or eliminate the local variable t. In the worst case this version may cost 6 more memory accesses to do the swap, which is a huge overhead. Even if there is only 1 extra memory read, it is still very noticeable.

Of course, we have no basis to believe that it's always better to do manual "outlining", i.e. extract small methods, wishfully thinking that it will help the optimizer.


What I've learned is that, forget manual micro optimizations. It's not that I don't care about micro performance improvements, it's not that I always trust JVM's optimization. It is that I have absolutely no idea what to do that does more good than bad. So I gave up.

The Hotspot JIT compiler is capable of inlining a lot of things, especially in -server mode, although I don't know how you got an actual performance boost. (My guess would be that inlining is done by method invocation count and the method swapping the two values isn't called too often.)

By the way, if its performance really matters, you could try this for swapping two int values. (I'm not saying it will be faster, but it may be worth a punt.)

a[i] = a[i] ^ a[j];
a[j] = a[i] ^ a[j];
a[i] = a[i] ^ a[j];

The JVM can inline small methods very efficiently. The only benifit inlining yourself is if you can remove code i.e. simplify what it does by inlining it.

The JVM looks for certain structures and has some "hand coded" optimisations when it recognises those structures. By using a swap method, the JVM may recognise the structure and optimise it differently with a specific optimisation.

You might be interested to try the OpenJDK 7 debug version which has an option to print out the native code it generates.