[c++] An expensive jump with GCC 5.4.0



Answers

One important thing to note is that

(curr[i] < 479) && (l[i + shift] < 479)

and

(curr[i] < 479) * (l[i + shift] < 479)

are not semantically equivalent! In particular, the if you ever have the situation where:

  • 0 <= i and i < curr.size() are both true
  • curr[i] < 479 is false
  • i + shift < 0 or i + shift >= l.size() is true

then the expression (curr[i] < 479) && (l[i + shift] < 479) is guaranteed to be a well-defined boolean value. For example, it does not cause a segmentation fault.

However, under these circumstances, the expression (curr[i] < 479) * (l[i + shift] < 479) is undefined behavior; it is allowed to cause a segmentation fault.

This means that for the original code snippet, for example, the compiler can't just write a loop that performs both comparisons and does an and operation, unless the compiler can also prove that l[i + shift] will never cause a segfault in a situation it's required not to.

In short, the original piece of code offers fewer opportunities for optimization than the latter. (of course, whether or not the compiler recognizes the opportunity is an entirely different question)

You might fix the original version by instead doing

bool t1 = (curr[i] < 479);
bool t2 = (l[i + shift] < 479);
if (t1 && t2) {
    // ...
Question

I had a function which looked like this (showing only the important part):

double CompareShifted(const std::vector<uint16_t>& l, const std::vector<uint16_t> &curr, int shift, int shiftY)  {
...
  for(std::size_t i=std::max(0,-shift);i<max;i++) {
     if ((curr[i] < 479) && (l[i + shift] < 479)) {
       nontopOverlap++;
     }
     ...
  }
...
}

Written like this, the function took ~34ms on my machine. After changing the condition to bool multiplication (making the code look like this):

double CompareShifted(const std::vector<uint16_t>& l, const std::vector<uint16_t> &curr, int shift, int shiftY)  {
...
  for(std::size_t i=std::max(0,-shift);i<max;i++) {
     if ((curr[i] < 479) * (l[i + shift] < 479)) {
       nontopOverlap++;
     }
     ...
  }
...
}

the execution time decreased to ~19ms.

The compiler used was GCC 5.4.0 with -O3 and after checking the generated asm code using godbolt.org I found out that the first example generates a jump, while the second one does not. I decided to try GCC 6.2.0 which also generates a jump instruction when using the first example, but GCC 7 seems to not generate one anymore.

Finding out this way to speed up the code was rather gruesome and took quite some time. Why does the compiler behave this way? Is it intended and is it something the programmers should look out for? Are there any more things similar to this?

EDIT: link to godbolt https://godbolt.org/g/5lKPF3




This might be because when you are using the logical operator && the compiler has to check two conditions for the if statement to succeed. However in the second case since you are implicitly converting an int value to a bool, the compiler makes some assumptions based on the types and values being passed in, along with (possibly) a single jump condition. It is also possible that the compiler completely optimizes away the jmps with bit shifts.




Links



Tags

c++ c++   gcc