C++ Reference Material | Linked Nodes

Linked Nodes

Once you understand thoroughly the basic notion of a pointer variable, how to initialize it (statically or dynamically), and how to dereference it, there is not a great deal that's really new to learn about the use of pointers in the context of linked structures. There are details, though, and that is always where the devil hides.

Definition of a node

A node is a structured variable (or object) containing at least one field whose type is a pointer type.

Linked nodes and pictures

Since a node has at least one field (and may have several, or even many such fields) whose type is a pointer type, the value of that field may be (and often is) the address of another node of the same type. If so, the first node is said to be "linked to" the second one. Or, the two nodes are said to be "linked". Because many such fields are possible in a single node, and because the "linking" may go in "either direction", many different configurations of linked nodes are possible, including sequences (or lists), trees, and graphs.

The wonderful thing to remember about linked nodes is that no matter how complicated your nodes are, and no matter what you may be doing with them, there is always a nice picture you can draw to show you exactly what is happening. This is a fact worth keeping in mind and exploiting whenever you need it.

Sequences of linked nodes (often called "linked lists")

One of the most common uses of nodes with a single link field is to form sequences of linked nodes, i.e., sequences of nodes in which there is a first node with a link field "pointing to" the second node, which in turn has a link field pointing to the third node, and so on. Of course, each node also contains some data. In addition there must be an "external" pointer-to-Node (a pointer "outside" the sequence) which points to the first Node of the sequence, and the link in the last Node of the sequence must have the special value nullptr so that any routine traversing the sequence will be able to tell when it has arrived at the end.

We will refer to such a structure as a "sequence of linked nodes" (as we have done in the previous paragraph), but you will often see the term "linked list" used to refer to exactly the same kind of structure. We prefer the former term because it is more descriptive of the actual structure, and use the latter term to refer to a linked-node implementation of ADTList.

Node definitions in C++

In C++ we may define a Node data type having a single field which is a pointer data type in at least two ways. Here is an example of the first way:

typedef int DataType;
struct Node
{
    DataType data;
    Node* link;
};

There are a couple of things to note about this definition:

The Node struct contains two fields which may be viewed, conceptually, as the "data field" and the "link field". In this particular definition the first (data) field is just an int. However, because the type of the field is given as a typedef (DataType), we may make the data field of arbitrary complexity without changing the Node definition. We could, of course, make the link field more general as well, but for the moment we are only interested in nodes with a single link field.
The definition is self-referential, since the link field contains a "pointer-to-self".

Here is an alternate form of the definition, which makes use of a forward declaration of the struct data type Node:

typedef int DataType;
struct Node;
typedef Node* NodePointer;
struct Node
{
    DataType data;
    NodePointer link;
};

The forward declaration of a data type is analogous to the extern declaration of a variable, which is also a kind of forward declaration. In either case, the declaration simply says (to the compiler), "There is such a thing, and you will see its definition later, so don't worry about it now."

Node manipulations in C++ (Draw the pictures!)

Creation of a sequence of linked nodes follows a pattern something like the following. First, we create an initial node for the sequence with this statement:

Node* mySequence = new Node;

Then we establish a "current pointer" which we will "move along" to help us deal with whatever is happening at the "current node":

Node* current = mySequence;

Next we put some data into the current (first) node:

current->data = 25;

If we only wanted a sequence of one node (hardly a "real" sequence) we would now do this:

current->link = nullptr;

But if we want another node, we create that new node and attached to the current (first) one as follows:

current->link = new Node;

Then we move our "current pointer" along to point at the second node (where all the action will now take place) with the statement shown below. This statement is a very important "C++ idiom" for "moving a pointer along" and you should stare at it for a bit until you have committed it to memory.

current = current->link;

Now that our current pointer is pointing at the second node, we can use it to put some data into this node, and then attach yet another node if we so desire. We can continue to add nodes and fill them with data for as long as we like. When we have as many as we need, or simply tire of the tedium, we must make sure that the link field of the last node attached to the sequence has been set to nullptr to indicate the end of the sequence. Thus we will have our mySequence pointer pointing at the first node of the sequence and a nullptr pointer in the last node to tell us that the sequence has ended, if we are looking at each node of the sequence for any reason. A typical loop for processing a sequence of linked nodes thus has the following generic form:

Node* current = mySequence;
while (current != nullptr)
{
    Process(current);
    current = current->link;
}