java how - Why does a 4 billion-iteration Java loop take only 2 ms?




3 Answers

There are one of two possibilities going on here:

  1. The compiler realized that the loop is redundant and doing nothing so it optimized it away.

  2. The JIT (just-in-time compiler) realized that the loop is redundant and doing nothing, so it optimized it away.

Modern compilers are very intelligent; they can see when code is useless. Try putting an empty loop into GodBolt and look at the output, then turn on -O2 optimizations, you will see that the output is something along the lines of

main():
    xor eax, eax
    ret

I would like to clarify something, in Java most of the optimizations are done by the JIT. In some other languages (like C/C++) most of the optimizations are done by the first compiler.

long for

I'm running the following Java code on a laptop with 2.7 GHz Intel Core i7. I intended to let it measure how long it takes to finish a loop with 2^32 iterations, which I expected to be roughly 1.48 seconds (4/2.7 = 1.48).

But actually it only takes 2 milliseconds, instead of 1.48 s. I'm wondering if this is a result of any JVM optimization underneath?

public static void main(String[] args)
{
    long start = System.nanoTime();

    for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++){
    }
    long finish = System.nanoTime();
    long d = (finish - start) / 1000000;

    System.out.println("Used " + d);
}



I just will state the obvious - that this is a JVM optimization that happens, the loop will simply be remove at all. Here is a small test that shows what a huge difference JIT has when enabled/enabled only for C1 Compiler and disabled at all.

Disclaimer: don't write tests like this - this is just to prove that the actual loop "removal" happens in the C2 Compiler:

@Benchmark
@Fork(1)
public void full() {
    long result = 0;
    for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++) {
        ++result;
    }
}

@Benchmark
@Fork(1)
public void minusOne() {
    long result = 0;
    for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE - 1; i++) {
        ++result;
    }
}

@Benchmark
@Fork(value = 1, jvmArgsAppend = { "-XX:TieredStopAtLevel=1" })
public void withoutC2() {
    long result = 0;
    for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE - 1; i++) {
        ++result;
    }
}

@Benchmark
@Fork(value = 1, jvmArgsAppend = { "-Xint" })
public void withoutAll() {
    long result = 0;
    for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE - 1; i++) {
        ++result;
    }
}

The results show that depending on which part of the JIT is enabled, method gets faster (so much faster that it looks like it's doing "nothing" - loop removal, which seems to be happening in the C2 Compiler - which is the maximum level):

 Benchmark                Mode  Cnt      Score   Error  Units
 Loop.full        avgt    2     ≈ 10⁻⁷          ms/op
 Loop.minusOne    avgt    2     ≈ 10⁻⁶          ms/op
 Loop.withoutAll  avgt    2  51782.751          ms/op
 Loop.withoutC2   avgt    2   1699.137          ms/op 



You are measuring the time it take to detect the loop doesn't do anything, compile the code in a background thread and eliminate the code.

for (int t = 0; t < 5; t++) {
    long start = System.nanoTime();
    for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++) {
    }
    long time = System.nanoTime() - start;

    String s = String.format("%d: Took %.6f ms", t, time / 1e6);
    Thread.sleep(50);
    System.out.println(s);
    Thread.sleep(50);
}

If you run this with -XX:+PrintCompilation you can see the code has been compiled in the background to level 3 or C1 compiler and after a few loops to level 4 of C4.

    129   34 %     3       A::main @ 15 (93 bytes)
    130   35       3       A::main (93 bytes)
    130   36 %     4       A::main @ 15 (93 bytes)
    131   34 %     3       A::main @ -2 (93 bytes)   made not entrant
    131   36 %     4       A::main @ -2 (93 bytes)   made not entrant
0: Took 2.510408 ms
    268   75 %     3       A::main @ 15 (93 bytes)
    271   76 %     4       A::main @ 15 (93 bytes)
    274   75 %     3       A::main @ -2 (93 bytes)   made not entrant
1: Took 5.629456 ms
2: Took 0.000000 ms
3: Took 0.000364 ms
4: Took 0.000365 ms

If you change the loop to use a long it doesn't get as optimised.

    for (long i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++) {
    }

instead you get

0: Took 1579.267321 ms
1: Took 1674.148662 ms
2: Took 1885.692166 ms
3: Took 1709.870567 ms
4: Took 1754.005112 ms



Related

java for-loop jvm