# visual - writing an assembler in c++

## How to force GCC to assume that a floating-point expression is non-negative? (3)

There are cases where you know that a certain floating-point expression will always be non-negative. For example, when computing the length of a vector, one does
`sqrt(a[0]*a[0] + ... + a[N-1]*a[N-1])`

(NB: I
*
am
*
aware of
`std::hypot`

, this is not relevant to the question), and the expression under the square root is clearly non-negative. However, GCC
outputs
the following assembly for
`sqrt(x*x)`

:

```
mulss xmm0, xmm0
pxor xmm1, xmm1
ucomiss xmm1, xmm0
ja .L10
sqrtss xmm0, xmm0
ret
.L10:
jmp sqrtf
```

That is, it compares the result of
`x*x`

to zero, and if the result is non-negative, it does the
`sqrtss`

instruction, otherwise it calls
`sqrtf`

.

So, my question is:
**
how can I force GCC into assuming that
x*x
is always non-negative so that it skips the comparison and the
sqrtf
call, without writing inline assembly?
**

I wish to emphasize that I am interested in a local solution, and not doing things like
`-ffast-math`

,
`-fno-math-errno`

, or
`-ffinite-math-only`

(though these do indeed solve the issue, thanks to ks1322, harold, and Eric Postpischil in the comments).

Furthemore, "force GCC into assuming
`x*x`

is non-negative" should be interpreted as
`assert(x*x >= 0.f)`

, so this also excludes the case of
`x*x`

being NaN.

I am OK with compiler-specific, platform-specific, CPU-specific, etc. solutions.

After about a week, I asked on the matter on GCC Bugzilla & they've provided a solution which is the closest to what I had in mind

```
float test (float x)
{
float y = x*x;
if (std::isless(y, 0.f))
__builtin_unreachable();
return std::sqrt(y);
}
```

that compiles to the following assembly:

```
test(float):
mulss xmm0, xmm0
sqrtss xmm0, xmm0
ret
```

I'm still not quite sure what exactly happens here, though.

Pass the option
`-fno-math-errno`

to gcc. This fixes the problem without making your code unportable or leaving the realm of ISO/IEC 9899:2011 (C11).

What this option does is not attempting to set
`errno`

when a math library function fails:

-fno-math-errno Do not set "errno" after calling math functions that are executed with a single instruction, e.g., "sqrt". A program that relies on IEEE exceptions for math error handling may want to use this flag for speed while maintaining IEEE arithmetic compatibility. This option is not turned on by any -O option since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications. The default is -fmath-errno. On Darwin systems, the math library never sets "errno". There is therefore no reason for the compiler to consider the possibility that it might, and -fno-math-errno is the default.

Given that you don't seem to be particularly interested in math routines setting
`errno`

, this seems like a good solution.

You can write
`assert(x*x >= 0.f)`

as a compile-time promise instead of a runtime check as follows in GNU C:

```
#include <cmath>
float test1 (float x)
{
float tmp = x*x;
if (!(tmp >= 0.0f))
__builtin_unreachable();
return std::sqrt(tmp);
}
```

(related:
What optimizations does __builtin_unreachable facilitate?
You could also wrap
`if(!x)__builtin_unreachable()`

in a macro and call it
`promise()`

or something.)

But gcc doesn't know how to take advantage of that promise that
`tmp`

is non-NaN and non-negative. We still get (
Godbolt
) the same canned asm sequence that checks for
`x>=0`

and otherwise calls
`sqrtf`

to set
`errno`

.
**
Presumably that expansion into a compare-and-branch happens after other optimization passes,
**
so it doesn't help for the compiler to know more.

This is a missed-optimization in the logic that speculatively inlines
`sqrt`

when
`-fmath-errno`

is enabled (on by default unfortunately).

##
What you want instead is
`-fno-math-errno`

, which is safe globally

**
This is 100% safe if you don't rely on math functions ever setting
errno
**
. Nobody wants that, that's what NaN propagation and/or sticky flags that record masked FP exceptions are for. e.g. C99/C++11

`fenv`

access via
`#pragma STDC FENV_ACCESS ON`

and then functions like
`fetestexcept()`

. See the example in
`feclearexcept`

which shows using it to detect division by zero.
The FP environment is part of thread context while
`errno`

is global.

Support for this obsolete misfeature is not free; you should just turn it off unless you have old code that was written to use it. Don't use it in new code: use
`fenv`

. Ideally support for
`-fmath-errno`

would be as cheap as possible but the rarity of anyone actually using
`__builtin_unreachable()`

or other things to rule out a NaN input presumably made it not worth developer's time to implement the optimization. Still, you could report a missed-optimization bug if you wanted.

Real-world FPU hardware does in fact have these sticky flags that stay set until cleared, e.g.
x86's
`mxcsr`

status/control register for SSE/AVX math, or hardware FPUs in other ISAs. On hardware where the FPU can detect exceptions, a quality C++ implementation will support stuff like
`fetestexcept()`

. And if not, then math-
`errno`

probably doesn't work either.

`errno`

for math was an old obsolete design that C / C++ is still stuck with by default, and is now widely considered a bad idea. It makes it harder for compilers to inline math functions efficiently. Or maybe we're not as stuck with it as I thought:
Why errno is not set to EDOM even sqrt takes out of domain arguement?
explains that setting errno in math functions is
*
optional
*
in ISO C11, and an implementation can indicate whether they do it or not. Presumably in C++ as well.

**
It's a big mistake to lump
-fno-math-errno
in with value-changing optimizations like
-ffast-math
or
-ffinite-math-only
.
**
You should strongly consider enabling it globally, or at least for the whole file containing this function.

```
float test2 (float x)
{
return std::sqrt(x*x);
}
```

```
# g++ -fno-math-errno -std=gnu++17 -O3
test2(float): # and test1 is the same
mulss xmm0, xmm0
sqrtss xmm0, xmm0
ret
```

You might as well use
`-fno-trapping-math`

as well, if you aren't ever going to unmask any FP exceptions with
`feenableexcept()`

. (Although that option isn't required for this optimization, it's only the
`errno`

-setting crap that's a problem here.).

`-fno-trapping-math`

doesn't assume no-NaN or anything, it only assumes that FP exceptions like Invalid or Inexact won't ever actually invoke a signal handler instead of producing NaN or a rounded result.
`-ftrapping-math`

is the default but
it's broken and "never worked" according to GCC dev Marc Glisse
. (Even with it on, GCC does some optimizations which can change the number of exceptions that would be raised from zero to non-zero or vice versa. And it blocks some safe optimizations). But unfortunately,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54192
(make it off by default) is still open.

If you actually ever did unmask exceptions, it might be better to have
`-ftrapping-math`

, but again it's very rare that you'd ever want that instead of just checking flags after some math operations, or checking for NaN. And it doesn't actually preserve exact exception semantics anyway.

See
SIMD for float threshold operation
for a case where
`-fno-trapping-math`

incorrectly blocks a safe optimization. (Even after hoisting a potentially-trapping operation so the C does it unconditionally, gcc makes non-vectorized asm that does it conditionally! So not only does it block vectorization, it changes the exception semantics vs. the C abstract machine.)