questions - bitwise xor c++




Bit wise '&' with signed vs unsigned operand (5)

I faced an interesting scenario in which I got different results depending on the right operand type, and I can't really understand the reason for it.

Here is the minimal code:

#include <iostream>
#include <cstdint>

int main()
{
    uint16_t check = 0x8123U;

    uint64_t new_check = (check & 0xFFFF) << 16;

    std::cout << std::hex << new_check << std::endl;

    new_check = (check & 0xFFFFU) << 16;

    std::cout << std::hex << new_check << std::endl;

    return 0;
}

I compiled this code with g++ (gcc version 4.5.2) on Linux 64bit: g++ -std=c++0x -Wall example.cpp -o example

The output was:

ffffffff81230000

81230000

I can't really understand the reason for the output in the first case.

Why at some point would any of the temporal calculation results be promoted to a signed 64bit value (int64_t) resulting in the sign extension?

I would accept a result of '0' in both cases if a 16bit value is shifted 16 bits left in the first place and then promoted to a 64bit value. I also do accept the second output if the compiler first promotes the check to uint64_t and then performs the other operations.

But how come & with 0xFFFF (int32_t) vs. 0xFFFFU (uint32_t) would result in those two different outputs?


0xFFFF is a signed int. So after the & operation, we have a 32-bit signed value:

#include <stdint.h>
#include <type_traits>

uint64_t foo(uint16_t a) {
  auto x = (a & 0xFFFF);
  static_assert(std::is_same<int32_t, decltype(x)>::value, "not an int32_t")
  static_assert(std::is_same<uint16_t, decltype(x)>::value, "not a uint16_t");
  return x;
}

http://ideone.com/tEQmbP

Your original 16 bits are then left-shifted which results in 32-bit value with the high-bit set (0x80000000U) so it has a negative value. During the 64-bit conversion sign-extension occurs, populating the upper words with 1s.


Let's take a look at

uint64_t new_check = (check & 0xFFFF) << 16;

Here, 0xFFFF is a signed constant, so (check & 0xFFFF) gives us a signed integer by the rules of integer promotion.

In your case, with 32-bit int type, the MSbit for this integer after the left shift is 1, and so the extension to 64-bit unsigned will do a sign extension, filling the bits to the left with 1's. Interpreted as a two's complement representation that gives the same negative value.

In the second case, 0xFFFFU is unsigned, so we get unsigned integers and the left shift operator works as expected.

If your toolchain supports __PRETTY_FUNCTION__, a most-handy feature, you can quickly determine how the compiler perceives expression types:

#include <iostream>
#include <cstdint>

template<typename T>
void typecheck(T const& t)
{
    std::cout << __PRETTY_FUNCTION__ << '\n';
    std::cout << t << '\n';
}
int main()
{
    uint16_t check = 0x8123U;

    typecheck(0xFFFF);
    typecheck(check & 0xFFFF);
    typecheck((check & 0xFFFF) << 16);

    typecheck(0xFFFFU);
    typecheck(check & 0xFFFFU);
    typecheck((check & 0xFFFFU) << 16);

    return 0;
}

Output

void typecheck(const T &) [T = int]
65535
void typecheck(const T &) [T = int]
33059
void typecheck(const T &) [T = int]
-2128412672
void typecheck(const T &) [T = unsigned int]
65535
void typecheck(const T &) [T = unsigned int]
33059
void typecheck(const T &) [T = unsigned int]
2166554624

The & operation has two operands. The first is an unsigned short, which will undergo the usual promotions to become an int. The second is a constant, in one case of type int, in the other case of type unsigned int. The result of the & is therefore int in one case, unsigned int in the other case. That value is shifted to the left, resulting either in an int with the sign bit set, or an unsigned int. Casting a negative int to uint64_t will give a large negative integer.

Of course you should always follow the rule: If you do something, and you don't understand the result, then don't do that!


The first thing to realize is that binary operators like a&b for built-in types only work if both sides have the same type. (With user-defined types and overloads, anything goes). This might be realized via implicit conversions.

Now, in your case, there definitely is such a conversion, because there simply isn't a binary operator & that takes a type smaller than int. Both sides are converted to at least int size, but what exact types?

As it happens, on your GCC int is indeed 32 bits. This is important, because it means that all values of uint16_t can be represented as an int. There is no overflow.

Hence, check & 0xFFFF is a simple case. The right side is already an int, the left side promotes to int, so the result is int(0x8123). This is perfectly fine.

Now, the next operation is 0x8123 << 16. Remember, on your system int is 32 bits, and INT_MAX is 0x7FFF'FFFF. In the absence of overflow, 0x8123 << 16 would be 0x81230000, but that clearly is bigger than INT_MAX so there is in fact overflow.

Signed integer overflow in C++11 is Undefined Behavior. Literally any outcome is correct, including purple or no output at all. At least you got a numerical value, but GCC is known to outright eliminate code paths which unavoidably cause overflow.

[edit] Newer GCC versions support C++14, where this particular form of overflow has become implementation-defined - see Serge's answer.


Your platform has 32-bit int.

Your code is exactly equivalent to

#include <iostream>
#include <cstdint>

int main()
{
    uint16_t check = 0x8123U;
    auto a1 = (check & 0xFFFF) << 16
    uint64_t new_check = a1;
    std::cout << std::hex << new_check << std::endl;

    auto a2 = (check & 0xFFFFU) << 16;
    new_check = a2;
    std::cout << std::hex << new_check << std::endl;
    return 0;
}

What's the type of a1 and a2?

  • For a2, the result is promoted to unsigned int.
  • More interestingly, for a1 the result is promoted to int, and then it gets sign-extended as it's widened to uint64_t.

Here's a shorter demonstration, in decimal so that the difference between signed and unsigned types is apparent:

#include <iostream>
#include <cstdint>

int main()
{
    uint16_t check = 0;
    std::cout << check
              << "  " << (int)(check + 0x80000000)
              << "  " << (uint64_t)(int)(check + 0x80000000) << std::endl;
    return 0;
}

On my system (also 32-bit int), I get

0  -2147483648  18446744071562067968

showing where the promotion and sign-extension happens.







sign-extension