tutorial - unions in c++




Purpose of Unions in C and C++ (10)

I have used unions earlier comfortably; today I was alarmed when I read this post and came to know that this code

union ARGB
{
    uint32_t colour;

    struct componentsTag
    {
        uint8_t b;
        uint8_t g;
        uint8_t r;
        uint8_t a;
    } components;

} pixel;

pixel.colour = 0xff040201;  // ARGB::colour is the active member from now on

// somewhere down the line, without any edit to pixel

if(pixel.components.a)      // accessing the non-active member ARGB::components

is actually undefined behaviour I.e. reading from a member of the union other than the one recently written to leads to undefined behaviour. If this isn't the intended usage of unions, what is? Can some one please explain it elaborately?

Update:

I wanted to clarify a few things in hindsight.

  • The answer to the question isn't the same for C and C++; my ignorant younger self tagged it as both C and C++.
  • After scouring through C++11's standard I couldn't conclusively say that it calls out accessing/inspecting a non-active union member is undefined/unspecified/implementation-defined. All I could find was §9.5/1:

    If a standard-layout union contains several standard-layout structs that share a common initial sequence, and if an object of this standard-layout union type contains one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of standard-layout struct members. §9.2/19: Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.

  • While in C, (C99 TC3 - DR 283 onwards) it's legal to do so (thanks to Pascal Cuoq for bringing this up). However, attempting to do it can still lead to undefined behavior, if the value read happens to be invalid (so called "trap representation") for the type it is read through. Otherwise, the value read is implementation defined.
  • C89/90 called this out under unspecified behavior (Annex J) and K&R's book says it's implementation defined. Quote from K&R:

    This is the purpose of a union - a single variable that can legitimately hold any of one of several types. [...] so long as the usage is consistent: the type retrieved must be the type most recently stored. It is the programmer's responsibility to keep track of which type is currently stored in a union; the results are implementation-dependent if something is stored as one type and extracted as another.

  • Extract from Stroustrup's TC++PL (emphasis mine)

    Use of unions can be essential for compatness of data [...] sometimes misused for "type conversion".

Above all, this question (whose title remains unchanged since my ask) was posed with an intention of understanding the purpose of unions AND not on what the standard allows E.g. Using inheritance for code reuse is, of course, allowed by the C++ standard, but it wasn't the purpose or the original intention of introducing inheritance as a C++ language feature. This is the reason Andrey's answer continues to remain as the accepted one.


Although this is strictly undefined behaviour, in practice it will work with pretty much any compiler. It is such a widely used paradigm that any self-respecting compiler will need to do "the right thing" in cases such as this. It's certainly to be preferred over type-punning, which may well generate broken code with some compilers.


As you say, this is strictly undefined behaviour, though it will "work" on many platforms. The real reason for using unions is to create variant records.

union A {
   int i;
   double d;
};

A a[10];    // records in "a" can be either ints or doubles 
a[0].i = 42;
a[1].d = 1.23;

Of course, you also need some sort of discriminator to say what the variant actually contains. And note that in C++ unions are not much use because they can only contain POD types - effectively those without constructors and destructors.


In C it was a nice way to implement something like an variant.

enum possibleTypes{
  eInt,
  eDouble,
  eChar
}


struct Value{

    union Value {
      int iVal_;
      double dval;
      char cVal;
    } value_;
    possibleTypes discriminator_;
} 

switch(val.discriminator_)
{
  case eInt: val.value_.iVal_; break;

In times of litlle memory this structure is using less memory than a struct that has all the member.

By the way C provides

    typedef struct {
      unsigned int mantissa_low:32;      //mantissa
      unsigned int mantissa_high:20;
      unsigned int exponent:11;         //exponent
      unsigned int sign:1;
    } realVal;

to access bit values.


In C++, Boost Variant implement a safe version of the union, designed to prevent undefined behavior as much as possible.

Its performances are identical to the enum + union construct (stack allocated too etc) but it uses a template list of types instead of the enum :)


Others have mentioned the architecture differences (little - big endian).

I read the problem that since the memory for the variables is shared, then by writing to one, the others change and, depending on their type, the value could be meaningless.

eg. union{ float f; int i; } x;

Writing to x.i would be meaningless if you then read from x.f - unless that is what you intended in order to look at the sign, exponent or mantissa components of the float.

I think there is also an issue of alignment: If some variables must be word aligned then you might not get the expected result.

eg. union{ char c[4]; int i; } x;

If, hypothetically, on some machine a char had to be word aligned then c[0] and c[1] would share storage with i but not c[2] and c[3].


Technically it's undefined, but in reality most (all?) compilers treat it exactly the same as using a reinterpret_cast from one type to the other, the result of which is implementation defined. I wouldn't lose sleep over your current code.


The behaviour may be undefined, but that just means there isn't a "standard". All decent compilers offer #pragmas to control packing and alignment, but may have different defaults. The defaults will also change depending on the optimisation settings used.

Also, unions are not just for saving space. They can help modern compilers with type punning. If you reinterpret_cast<> everything the compiler can't make assumptions about what you are doing. It may have to throw away what it knows about your type and start again (forcing a write back to memory, which is very inefficient these days compared to CPU clock speed).


The most common use of union I regularly come across is aliasing.

Consider the following:

union Vector3f
{
  struct{ float x,y,z ; } ;
  float elts[3];
}

What does this do? It allows clean, neat access of a Vector3f vec;'s members by either name:

vec.x=vec.y=vec.z=1.f ;

or by integer access into the array

for( int i = 0 ; i < 3 ; i++ )
  vec.elts[i]=1.f;

In some cases, accessing by name is the clearest thing you can do. In other cases, especially when the axis is chosen programmatically, the easier thing to do is to access the axis by numerical index - 0 for x, 1 for y, and 2 for z.


You can use a a union for two main reasons:

  1. A handy way to access the same data in different ways, like in your example
  2. A way to save space when there are different data members of which only one can ever be 'active'

1 Is really more of a C-style hack to short-cut writing code on the basis you know how the target system's memory architecture works. As already said you can normally get away with it if you don't actually target lots of different platforms. I believe some compilers might let you use packing directives also (I know they do on structs)?

A good example of 2. can be found in the VARIANT type used extensively in COM.


You could use unions to create structs like the following, which contains a field that tells us which component of the union is actually used:

struct VAROBJECT
{
    enum o_t { Int, Double, String } objectType;

    union
    {
        int intValue;
        double dblValue;
        char *strValue;
    } value;
} object;




type-punning