C++ Reference Material | Basic Pointer Concepts

Basic Pointer Concepts

Definition of a pointer

A pointer is a variable which can contain the address (storage location) of another variable or object.

Pointer syntax and terminology

Here are some declarations of pointer variables:

int* p_int; //p_int is a "pointer to an int".
            //p_int can contain the storage location of an int.
char* p_char; //p_char is a "pointer to a char".
              //p_char can contain the storage location of a char.
struct ClubMember
{
    string name;
    int age;
    double balance;
};
ClubMember* p_cm; //p_cm is a "pointer to a ClubMember".
                  //p_cm can contain the storage location of a ClubMember.

Notes on the declaration syntax illustrated above:

All of the following declarations are equivalent:
```
int* p;   //Option 1 ... seems to be the most frequently used
int * p;  //Option 2 ... seems to be the most infrequently used
int *p;   //Option 3 ... somewhere in between in terms of usage
```
But you need to be careful, since in all cases the * symbol "binds" to the p and not to the int. This means, for example, that if you give declarations like
```
int* p1, p2;
```
it is natural to think that you have declared two "pointer-to-int" variables. But you have not. The first, p1, is a pointer-to-int, but the second, p2, is just an ordinary int variable. This is just one more argument for having only one declaration per line, since you do not have this potential misunderstanding with these declarations:
```
int* p1;
int* p2;
```
Those who use Option 3 above as their declaration convention are somewhat less likely to have a problem, since they would be inclined to use the following (correct) syntax when placing two declarations on one line:
```
int *p1, *p2;
```
Each of the above pointer variables can contain the address of a value of its corresponding type, but at the moment it does not.

Initializing, and assigning values to, pointer variables

There are essentially two times we can make a pointer variable "point" at something:

by initializing the pointer at the time of its declaration, or
by declaring it first, and assigning a value to it at some later time

In this sense, a pointer variable is no different from any other "ordinary" variable.

There are also essentially two kinds of things a pointer can be made to point at:

at something that already exists, or
at something that is "created" in memory (on "the heap", which is also called "the free store") at the same time the initialization or assignment of the pointer value is made

Here are some relevant examples in which a pointer is made to point at something that already exists:

In this first case, the pointer variable p_int is initialized with the "address of i", and therefore may be said to "point to" or "point at" the already-existing int variable i. Note that & is the C++ "address of" operator. When prepended to a variable, this operator returns the address of that variable in memory.
```
int i = 6;
int* p_int = &i;
```
In the following case, the pointer variable p_int is not initialized at the time of declaration. Instead, it is first just declared, and then assigned a value, which is again the address of i. The final result is, of course, the same as in the previous example.
```
int i = 6;
int* p_int;
p_int = &i;
```
In this case, the pointer variable p_cm is initialized to point at a variable of type struct. It could also, be first declared, and later assigned to, as was done with p_int in the previous example.
```
ClubMember cm = { "John", 32, 100.00 };
ClubMember* p_cm = &cm;
```

And here are some relevant examples in which a pointer is made to point at a newly acquired memory location (i.e., a piece of memory acquired "dynamically" by the program when it executes the given statement).

In this first case, the pointer variable p_int is initialized with the address of a memory location on the "heap" (or the "free store") where a value of type int can be stored. Once again p_int may be said to "point to" or "point at" this location, which at the moment does not contain a value.
```
int* p_int = new int;
```
In the following case, the pointer variable p_int is not initialized at the time of declaration. Instead, it is first just declared, and then assigned a value, which is again the address of a newly-acquired memory location from the heap. The final result is, of course, the same as in the previous example.
```
int* p_int;
p_int = new int;
```
In this case, the pointer variable p_cm is initialized to point at a new memory location on the heap which can store a value of type struct. In this case too, p_cm could first be declared, and later assigned to, as was done with p_int in the previous example.
```
ClubMember* p_cm = new ClubMember;
```

There is an important "universal" pointer value called nullptr (a brand new keyword in C++11). A pointer value of any type may be assigned this value to indicate explicitly that the pointer does not point at anything. Note that this is not the same as having a pointer variable whose value is "undefined". Thus it is always true that

AnyType* p = nullptr;

makes sense, provided that AnyType itself is defined. A pointer variable that contains no value, not even nullptr, or a garbage value,is sometimes called a "dangling pointer".

Note that you will continue to see the predefined constant NULL and the value 0 used to indicate that a pointer does not point to anything. Legacy code (older code that has not been updated will contain these values for this purpose), but any new code that you write should use nullptr.

Dereferencing pointer variables (or, referring to the value pointed at)

If we have a pointer variable, say p, which "points at" or "points to" something, a natural question is, "How do we use the pointer variable p to gain access to the thing to which it points?" The answer is that the expression *p refers to the thing to which p points. If the thing to which p points is a memory location on the heap, then that memory location may have no other name, and may therefore be accessible only in this way (via the pointer variable p). That is why such memory locations on the heap are sometimes referred to as anonymous variables, as well as dynamic variables (because they are obtained "dynamically", as the program runs).

Here is a simple example, in which we first obtain a new memory location (on the heap) that is capable of holding an integer, then we assign the value 17 to that location (we write 17 to the location, in effect), then we alter the value in that location (we alter the contents of the location in exactly the same way we would alter the contents of the location of an "ordinary" variable), and finally we output the value at that location (we read from the location, in effect):

int* p = new int;  //Obtain a new location
*p = 17;           //Put the value 17 into that location
*p = *p + 5;       //Modify the value in that location
cout << *p;  //Display the value in that location

When a pointer variable p points to a variable (or object) of a structured type, such as a struct type (or a class type), the situation is somewhat more complex. For example, if we have

ClubMember* p = new ClubMember;

then, as before, *p points at (or, "refers to") the entire club member. But what if we want to refer to just one of the fields of the value of type ClubMember, say the "name" field for example? A good guess would be

*p.name

since if cm is an "ordinary" ClubMember variable, cm.name would be the correct syntax. However, this does not work because the . operator has a higher precedence than the * operator, and so we would have to use the following syntax instead:

(*p).name

Because this is at best awkward and inconvenient, C++ provides an alternate syntax using the "arrow operator", which, for the above example, would look like this:

p->name

This notation should be preferred when dealing with data or function members of struct or class variables, as in the following example.

ClubMember* p = new ClubMember;
p->name = "John";
p->age = 32;
p->balance = 100.00;
cout << p->name << " owes $" << p->balance << endl;

Pointers and arrays, including "pointer arithmetic"

There is an "intimate" connection between arrays and pointers in C++ (and C), which permits pointers to be used as an alternate and sometimes more convenient way of gaining access to array elements and traversing some or all of the elements of an array.

First, the name of an array is also a pointer to the first element of that array. That is, the name of an array may be treated as a (const) value which is equal to the address of the first element of that array. This means, for example, if we have the array

int a[] = { 1, 2, 3, 4, 5 };

then the two statements shown below are equivalent.

int* p_int = a;
int* p_int = &a[0];

In the context of arrays, C++ and C programmers often perform "pointer arithmetic". Given below are some examples of the typical kinds of expressions one often sees, and which you may find useful to employ yourself. Note in particular the use of the increment/decrement operators.

int* p_int = &a[2]; //p_int points at the third element of a
cout << *(p_int+2) << endl;//Display the fifth element of a
cout << *(p_int-2) << endl;//Display the first element of a
++p_int; //p_int now points at the fourth element of a
p_int++; //p_int now points at the fifth element of
--p_int; //p_int now points at the fourth element of a
p_int--; //p_int now points at the third element of a

It's important to realize the pointer arithmetic in the context of arrays is "smart", in the sense that when the pointer is incremented (or decremented), the resulting pointer value is the address of the next (or previous) value of the array, whether the type of the array is int, or double, or ClubMember. An analogous statement holds if an integer value is added to or subtracted from a pointer value.

The above discussion involved pointers being used in the context of static arrays, but of course we can also have dynamic arrays. A dynamic int array of size 5 would be obtained with the statement

int* p_int = new int[5]

after which, the third element of the array may be referred to by either of the following expressions:

p_int[2]
*(p_int+2)

Returning no-longer-required memory to the heap, and memory leaks

When the memory acquired from the heap during the running of a program, and "pointed to" by a pointer variable in the program, is no longer required by the program, the program should "return the memory to the heap". Doing so simply means that your program is behaving like a "good citizen", since otherwise it might cause a potentially serious "memory leak". If this happened, it would mean that your program had laid claim to some memory that it no longer was using, and perhaps could no longer even access, and other software running on the computer could not access it either, so long as your program continued to run. A situation like this could actually cause your computer to "run out of memory".

The syntax for returning memory to the heap differs, depending on whether the memory being returned contains a simple variable or an array. The following examples illustrate the difference.

int* p_int = new int; //Declare and initialize p_int
*p_int = 17;          //Assign value of 17 to the dynamic memory
delete p_int;         //Return the dynamic memory to the heap

In this case the "deleted" memory contains the value 17, but this value can no longer be accessed by the program. Note that the syntax is potentially misleading. It would appear that we are "deleting p", but in fact p has gone nowhere. It's the memory that p points at that we are "deleting" (i.e., returning to the heap), not p itself. Note that here we are dealing with a pointer that points at a simple variable. If the pointer points at an array, the syntax is given in the next example.

int* p_int = new int[5]; //Declare the dynamic array
....  //Work with the dynamic array
delete [] p_int; //Return the dynamic array's memory to the heap

Use of `const` in pointer definitions

In using the const keyword with a pointer definition, there are essentially four possibilities, as illustrated below. The thing pointed at can be made constant, or the the pointer itself, or both, or neither. In the illustrations below, "OK" means the corresponding line of code would compile, given the previous definitions of i and/or j, while "not OK" means that the line would not compile. We depart from our usual convention here and place a space on either side of the * in the definition of p, to emphasize the position of const.

Here are the four cases:

If const is not used at all, then (of course) both the pointer itself, as well as the thing being pointed to, can be altered.

int i = 1;
int j = 2;
int * p = &i;
p = &j; //OK (the pointer value can be altered)
*p = 6; //OK (the value pointed at can also be altered)

If const is placed before the type, the effect is to make the value pointed at a constant value.

int i = 1;
int j = 2;
const int * p = &i;
p = &j; //OK     (the pointer value can be altered)
*p = 6; //not OK (the value pointed at cannot be altered)

If const is placed before the pointer variable name, the effect is to make the pointer itself a constant.

int i = 1;
int j = 2;
int * const p = &i;
p = &j; //not OK (the pointer value cannot be altered)
*p = 6; //OK     (the value pointed at can be altered)

If const is placed before the type name and before the pointer variable name, the effect is to make both the value pointed at, and the pointer itself, constant values.
```
int i = 1;
int j = 2;
const int * const p = &i;
p = &j; //not OK (the pointer value cannot be altered)
*p = 6; //not OK (the value pointed at also cannot be altered)
```

Actually, the effect is the same if const is placed after the type name, instead of before. Thus the following two lines are equivalent (see Case 2 above):

const int * p = &i;
int const * p = &i;

And the following two lines are also equivalent (see Case 4 above):

const int * const p = &i;
int const * const p = &i;

In fact, the keyword const can be placed either before or after the type name in the declaration of an "ordinary" named constant as well. That is, although you most often see the first of the following two alternatives, the second is equally valid:

const int SIZE = 10;
int const SIZE = 10;

Problems with Normal Pointers

Some Issues with normal pointers in C++ are as follows:

Memory Leak: This occurs when memory is repeatedly allocated by a program but never freed. This leads to excessive memory consumption and eventually leads to a system crash.
Dangling Pointer (a pointer variable that does not contain a pointer value pointing at something): This occurs when an object is de-allocated from memory without modifying the value of the pointer.
Wild Pointer: A pointer that is declared and allocated memory but never initialized to point to any valid object or address.
Data Inconsistency: Data inconsistency occurs when some data is stored in memory but is not updated in a consistent manner.
Buffer Overflow: When a pointer is used to write data to a memory address that is outside of the allocated memory block. This leads to the corruption of data which can be exploited by malicious attackers.

For a discussion of C++ smart pointers see here.