The union Data Structure
The union data structure is a legacy feature carried over from the C programming language, and the kind of facility it provides can often be delivered in C++ by using the inheritance mechanism.
A union is like a struct in that it generally has several fields, all of which are public by default. Unlike a struct, however, only one of the fields is used at any given time. In other words, it is a structure that allows the same storage space to be used to store values of different data types at different times. Thus it is necessary, and the programmer's responsibility, to keep track of what is actually stored in the union at any given time.
Unions can be used to conserve memory when a structure is needed in which several pieces of information of different types must be represented but only one will be used at a time, as when the components of a container must contain values of differing data types.
The usual syntax for a union is illustrated by
union UnionType { int i; double d; char c; }; UnionType u;
in which we see the definition of a union data type called UnionType, followed by the declaration of a variable of that type, namely u. The variable u can hold either an int value, or a double value, or a char value at any given time, but only one of these values.
The syntax for accessing the value in u is what you would probably expect-namely, one of the following, depending on what kind of value is stored in the union:
u.i u.d u.c
Arrow notation (up->i
, if up
is a
pointer to a union of the above type) can also be used, if and when
appropriate.
Because it is the programmer's responsibility to keep track of what is in a union at any time, unions are frequently enclosed in a larger structure which has, perhaps among many other fields, a field called a "tag field" to keep track of what is in the union. For example, the above union might appear in a situation like this:
enum TagType { INTEGER, DOUBLE, CHARACTER }; struct DataType { OtherType otherField; TagType tag; UnionType u; }; DataType myData;
With a setup like this, if you want to store a double value, you might do this:
myData.tag = DOUBLE; myData.u.d = 3.14;
Then, later on, you can do this:
if (myData.tag == DOUBLE) // You know you can use a double value here ...
You can also use an "anonymous" union (a union without a name), as in
struct DataType { OtherType otherField; TagType tag; union { int i; double d; char c; }; }
in which case the fields of the union are treated simply as fields
of the struct (so there is no need to use the . or
->
operator, but there must of course be no name
conflicts with other fields in the struct).
In either of the above examples the field otherField
may be called the "invariant" part of the struct (the part that does
not vary in the kind of data it holds), while the union part might be
called the "variant" part of the struct (the part that does vary).
Here are some general rules regarding unions that might occasionally come in handy but we give them only for the sake of completeness since we will be using unions only in the context of binary expression trees:
See the Deitel reference for a good discussion of unions.