value - rvo c++98




Optimizing away a “while(1);” in C++0x (6)

Updated, see below!

I have heard and read that C++0x allows an compiler to print "Hello" for the following snippet

#include <iostream>

int main() {
  while(1) 
    ;
  std::cout << "Hello" << std::endl;
}

It apparently has something to do with threads and optimization capabilities. It looks to me that this can surprise many people though.

Does someone have a good explanation of why this was necessary to allow? For reference, the most recent C++0x draft says at 6.5/5

A loop that, outside of the for-init-statement in the case of a for statement,

  • makes no calls to library I/O functions, and
  • does not access or modify volatile objects, and
  • performs no synchronization operations (1.10) or atomic operations (Clause 29)

may be assumed by the implementation to terminate. [ Note: This is intended to allow compiler transfor- mations, such as removal of empty loops, even when termination cannot be proven. — end note ]

Edit:

This insightful article says about that Standards text

Unfortunately, the words "undefined behavior" are not used. However, anytime the standard says "the compiler may assume P," it is implied that a program which has the property not-P has undefined semantics.

Is that correct, and is the compiler allowed to print "Bye" for the above program?


There is an even more insightful thread here, which is about an analogous change to C, started off by the Guy done the above linked article. Among other useful facts, they present a solution that seems to also apply to C++0x (Update: This won't work anymore with n3225 - see below!)

endless:
  goto endless;

A compiler is not allowed to optimize that away, it seems, because it's not a loop, but a jump. Another guy summarizes the proposed change in C++0x and C201X

By writing a loop, the programmer is asserting either that the loop does something with visible behavior (performs I/O, accesses volatile objects, or performs synchronization or atomic operations), or that it eventually terminates. If I violate that assumption by writing an infinite loop with no side effects, I am lying to the compiler, and my program's behavior is undefined. (If I'm lucky, the compiler might warn me about it.) The language doesn't provide (no longer provides?) a way to express an infinite loop without visible behavior.


Update on 3.1.2011 with n3225: Committee moved the text to 1.10/24 and say

The implementation may assume that any thread will eventually do one of the following:

  • terminate,
  • make a call to a library I/O function,
  • access or modify a volatile object, or
  • perform a synchronization operation or an atomic operation.

The goto trick will not work anymore!


Does someone have a good explanation of why this was necessary to allow?

Yes, Hans Boehm provides a rationale for this in N1528: Why undefined behavior for infinite loops?, although this is WG14 document the rationale applies to C++ as well and the document refers to both WG14 and WG21:

As N1509 correctly points out, the current draft essentially gives undefined behavior to infinite loops in 6.8.5p6. A major issue for doing so is that it allows code to move across a potentially non-terminating loop. For example, assume we have the following loops, where count and count2 are global variables (or have had their address taken), and p is a local variable, whose address has not been taken:

for (p = q; p != 0; p = p -> next) {
    ++count;
}
for (p = q; p != 0; p = p -> next) {
    ++count2;
}

Could these two loops be merged and replaced by the following loop?

for (p = q; p != 0; p = p -> next) {
        ++count;
        ++count2;
}

Without the special dispensation in 6.8.5p6 for infinite loops, this would be disallowed: If the first loop doesn't terminate because q points to a circular list, the original never writes to count2. Thus it could be run in parallel with another thread that accesses or updates count2. This is no longer safe with the transformed version which does access count2 in spite of the infinite loop. Thus the transformation potentially introduces a data race.

In cases like this, it is very unlikely that a compiler would be able to prove loop termination; it would have to understand that q points to an acyclic list, which I believe is beyond the ability of most mainstream compilers, and often impossible without whole program information.

The restrictions imposed by non-terminating loops are a restriction on the optimization of terminating loops for which the compiler cannot prove termination, as well as on the optimization of actually non-terminating loops. The former are much more common than the latter, and often more interesting to optimize.

There are clearly also for-loops with an integer loop variable in which it would be difficult for a compiler to prove termination, and it would thus be difficult for the compiler to restructure loops without 6.8.5p6. Even something like

for (i = 1; i != 15; i += 2)

or

for (i = 1; i <= 10; i += j)

seems nontrivial to handle. (In the former case, some basic number theory is required to prove termination, in the latter case, we need to know something about the possible values of j to do so. Wrap-around for unsigned integers may complicate some of this reasoning further.)

This issue seems to apply to almost all loop restructuring transformations, including compiler parallelization and cache-optimization transformations, both of which are likely to gain in importance, and are already often important for numerical code. This appears likely to turn into a substantial cost for the benefit of being able to write infinite loops in the most natural way possible, especially since most of us rarely write intentionally infinite loops.

The one major difference with C is that C11 provides an exception for controlling expressions that are constant expressions which differs from C++ and makes your specific example well-defined in C11.


I think it's worth pointing out that loops which would be infinite except for the fact that they interact with other threads via non-volatile, non-synchronised variables can now yield incorrect behaviour with a new compiler.

I other words, make your globals volatile -- as well as arguments passed into such a loop via pointer/reference.


I think the issue could perhaps best be stated, as "If a later piece of code does not depend on an earlier piece of code, and the earlier piece of code has no side-effects on any other part of the system, the compiler's output may execute the later piece of code before, after, or intermixed with, the execution of the former, even if the former contains loops, without regard for when or whether the former code would actually complete. For example, the compiler could rewrite:

void testfermat(int n)
{
  int a=1,b=1,c=1;
  while(pow(a,n)+pow(b,n) != pow(c,n))
  {
    if (b > a) a++; else if (c > b) {a=1; b++}; else {a=1; b=1; c++};
  }
  printf("The result is ");
  printf("%d/%d/%d", a,b,c);
}

as

void testfermat(int n)
{
  if (fork_is_first_thread())
  {
    int a=1,b=1,c=1;
    while(pow(a,n)+pow(b,n) != pow(c,n))
    {
      if (b > a) a++; else if (c > b) {a=1; b++}; else {a=1; b=1; c++};
    }
    signal_other_thread_and_die();
  }
  else // Second thread
  {
    printf("The result is ");
    wait_for_other_thread();
  }
  printf("%d/%d/%d", a,b,c);
}

Generally not unreasonable, though I might worry that:

  int total=0;
  for (i=0; num_reps > i; i++)
  {
    update_progress_bar(i);
    total+=do_something_slow_with_no_side_effects(i);
  }
  show_result(total);

would become

  int total=0;
  if (fork_is_first_thread())
  {
    for (i=0; num_reps > i; i++)
      total+=do_something_slow_with_no_side_effects(i);
    signal_other_thread_and_die();
  }
  else
  {
    for (i=0; num_reps > i; i++)
      update_progress_bar(i);
    wait_for_other_thread();
  }
  show_result(total);

By having one CPU handle the calculations and another handle the progress bar updates, the rewrite would improve efficiency. Unfortunately, it would make the progress bar updates rather less useful than they should be.


I think this is along the lines of the this type of question, which references another thread. Optimization can occasionally remove empty loops.


The relevant issue is that the compiler is allowed to reorder code whose side effects do not conflict. The surprising order of execution could occur even if the compiler produced non-terminating machine code for the infinite loop.

I believe this is the right approach. The language spec defines ways to enforce order of execution. If you want an infinite loop that cannot be reordered around, write this:

volatile int dummy_side_effect;

while (1) {
    dummy_side_effect = 0;
}

printf("Never prints.\n");

To me, the relevant justification is:

This is intended to allow compiler transfor- mations, such as removal of empty loops, even when termination cannot be proven.

Presumably, this is because proving termination mechanically is difficult, and the inability to prove termination hampers compilers which could otherwise make useful transformations, such as moving nondependent operations from before the loop to after or vice versa, performing post-loop operations in one thread while the loop executes in another, and so on. Without these transformations, a loop might block all other threads while they wait for the one thread to finish said loop. (I use "thread" loosely to mean any form of parallel processing, including separate VLIW instruction streams.)

EDIT: Dumb example:

while (complicated_condition()) {
    x = complicated_but_externally_invisible_operation(x);
}
complex_io_operation();
cout << "Results:" << endl;
cout << x << endl;

Here, it would be faster for one thread to do the complex_io_operation while the other is doing all the complex calculations in the loop. But without the clause you have quoted, the compiler has to prove two things before it can make the optimisation: 1) that complex_io_operation() doesn't depend on the results of the loop, and 2) that the loop will terminate. Proving 1) is pretty easy, proving 2) is the halting problem. With the clause, it may assume the loop terminates and get a parallelisation win.

I also imagine that the designers considered that the cases where infinite loops occur in production code are very rare and are usually things like event-driven loops which access I/O in some manner. As a result, they have pessimised the rare case (infinite loops) in favour of optimising the more common case (noninfinite, but difficult to mechanically prove noninfinite, loops).

It does, however, mean that infinite loops used in learning examples will suffer as a result, and will raise gotchas in beginner code. I can't say this is entirely a good thing.

EDIT: with respect to the insightful article you now link, I would say that "the compiler may assume X about the program" is logically equivalent to "if the program doesn't satisfy X, the behaviour is undefined". We can show this as follows: suppose there exists a program which does not satisfy property X. Where would the behaviour of this program be defined? The Standard only defines behaviour assuming property X is true. Although the Standard does not explicitly declare the behaviour undefined, it has declared it undefined by omission.

Consider a similar argument: "the compiler may assume a variable x is only assigned to at most once between sequence points" is equivalent to "assigning to x more than once between sequence points is undefined".





c++11