General Reference Material | Building A Binary Search Tree Template Class

Building A Binary Search Tree Template Class

What is a binary search tree (BST)?

A binary search tree is a binary tree in which each node contains a key, all keys are unique (for simplicity) and for which the following criterion is satisfied: For each node in the tree, all keys in the nodes of its left subtree must be less than the key in the node, and all keys in the nodes of its right subtree must be greater than the key in the node. This is called the binary search tree property, and is a powerful and useful property for a tree to have. For example, it turns out (and this should be clear if you think about the property for a moment), that an inorder traversal of a binary search tree visits the nodes in ascending key order.

What kind of structure can be used to store a binary search tree?

All we need is a "garden variety" node with one field for data, which must either be able to serve as a key (i.e., the data must be of a type for which operator< is defined) and the usual two fields for pointers to the root of the left child and right child, respectively. That is, we want a structure like

struct TreeNode
{
    TreeNode* left;
    DataType  data;
    TreeNode* right;
};

in which DataType represents whatever kind of data we wish to store in the tree. Actually, because we want to have a template class, we really want this:

<typename DataType>
struct TreeNode
{
    TreeNode* left;
    DataType  data;
    TreeNode* right;
};

Notes on the Implementation of ADTBinarySearchTree (BinarySearchTree<T>)

First of all, we can use an underlying structure like that shown above as the basis for constructing a Binary Search Tree template class. But we should not overlook the fact that there may well be other structures we could also use; it's just that the above structure is kind of a "natural" choice.

Because we want to practice information hiding, and therefore don't want to reveal in our ADT interface the fact that we are indeed using the above structure to store each node in the tree, we shall have to make use of "auxiliary" or "helper" functions behind the scenes to do the actual work of dealing with the nodes of the tree.

Our template class will have as its single private data member a pointer to the root node of a tree whose nodes are of the above kind. Here are some of the operations that we might wish to implement, and some notes on each:

Default constructor (the only one we will have)

This will create an empty BST and just requires setting the root node pointer to nullptr.

Destructor

Here we need to be careful to deallocate (return to the heap) all storage that has been allocated on the heap to store the tree nodes. Questions that arise include: Can we traverse the tree in any order we like when we're deallocating the node storage? Is there a "best" order?

Copy constructor

This will be invoked whenever we need to make a copy of an already-existing BST. Note that new storage on the heap will be required for the copy.

Overloaded assignment operator

This will be invoked whenever we need to assign to an already existing object variable a new value. Note that we should first check to see if a value is being assigned to itself, in which case nothing needs to be done. If an actual assignment is to take place, then we must be sure to deallocate all storage for the BST that is being "overwritten", as well as to create new storage on the heap for the new BST.

Test if a BST is empty

This is very easy. All we have to do is check to see if the object data member (a pointer to the root node of the BST) points to nullptr.

Insert a value into a BST

In implementing this operation we will

always insert a new node as a leaf
maintain the "BST property" as we insert each new node
not be concerned about the shape of the tree (i.e., we will not attempt to achieve or maintain a "balanced" or "bushy" tree as we insert each new node)

The body of this function will just be a call to another (recursive) helper function that does the actual insertion into the tree. This function is, of course, not a member function, unlike the insert() function itself. Here is some pseudocode for this helper function, which will require a node pointer and the item to be inserted in its parameter list:

if this node is empty
    Put the value into this node
else if the value is < value in this node
    Put the value into the left subtree of this node
else if the value is > value in this node
    Put the value into the right subtree of this node

We should also observe at this point that the insert() operation that is used to populate an entire tree is the same one that is used to insert one additional item at a time, should the user decide to do this.

Write all values in a BST to an output stream

There are several things to consider in implementing this operation:

First, the presence of a an output stream parameter would mean that the output could be sent to a file as well as the screen.
Second, the user may want to have the tree contents displayed in any of several different orders, on demand. How to implement this? Well, we could add a parameter to the operation interface itself. This would mean that the decision of what order to display in would have to be made outside the ADT. On the other hand, we could also have this operation ask the user what order is desired, i.e., we could have the ADT itself determine from the user what to do.
In any case, once again the actual display will be accomplished by helper functions performing the appropriate kind of traversal.
Preorder, inorder and postorder we know all about, but what about "level-order, left to right"? Here's an idea for implementing that option: Rather than recursion, use an iterative approach that employs a queue as an auxiliary structure and whose pseudocode goes something like this:
```
Put the root node of the tree into the queue
while the queue is not empty
    Remove the first item from the queue and display its value
    Put left child of node just removed from queue into queue
    Put right child of node just removed from queue into queue
```
And perhaps the most useful kind of display of all would be one that actually shows some kind of "tree shape", so that the user can actually see what the tree looks like, where each node containing a data value is located, how the nodes are positioned relative to one another, and how the shape and size of the tree varies over time as various operations are performed on it.

Find a value in a BST

Because in our (simplified) BST there is nothing in a node except the key value, a simple search operation like this is not of much practical use (for retrieving additional information associated with a key, for example), and serves only as a way of determining whether a particular value is, or is not, in the tree. As usual, the actual "retrieving" will be done by an auxiliary helper function. Here is some pseudocode for this function, which, you will note, bears a marked resemblance to the pseudocode for the auxiliary function used by the Insert operation (and also, by the way, a marked resemblance to the binary search algorithm):

if this node is empty
    found is false
else if the value sought is < value in this node
    Retrieve the value sought from the left subtree
else if the value sought is > value in this node
    Retrieve the value from the right subtree
else
    found is true
    Copy the value found into the tree into the parameter

We may want to implement other useful search operations (for the maximum and/or minimum value in a BST, for example). To do so, we need to have some way (perhaps any way) of traversing the entire tree while examining the values to see if they satisfy a particular criterion. The question that needs to be answered is this: Should we have a separate traversal for each criterion, or one traversal routine to which we pass another routine which tests a value to see if it meets the required criterion.

Make an existing BST empty

The only difference between this and the class destructor is that this member function gets called explicitly by the class client, while the destructor does its work silently when the class object goes out of scope.

Delete a value from a BST

Implementation of this operation will require some care. At the highest level, there are two steps:

Find the location of the item to be deleted
Perform the actual deletion

Let us assume the existence of a find (or "retrieve") operation (see below) and then make use of that operation as a "helper" here. As for the actual deletion, there are three cases to consider, depending on the status of the node to be deleted:

If it's a leaf (i.e., it has no children), the deletion simply involves setting the appropriate link of its parent to nullptr and then disposing of the node itself.
If it has only one child, then we have to make the pointer from the parent of the node we intend to delete "skip" the deleted node and point at its child, and then dispose of the deleted node.
If it has two children, this is the most complicated case, and we choose to handle it as described in the paragraph below.

What element could we replace the deleted node with that would maintain the BST property? Either the logical predecessor or the logical successor will do, but we choose to go with the predecessor. In fact, rather than actually delete the given node, we copy the value of its predecessor into the node to be deleted, and then delete the predecessor node. The reason this makes sense is because of where the predecessor node is located. [The predecessor of a node is the largest value in the node's left subtree, which is the node "furthest to the right" in that subtree.]

Process all values in a BST

For operations such as these (squaring all integers, say, or converting all letters to upper case), we also need to have a way of traversing the entire tree. The question that needs to be answered is analogous to the one above: Should we have a separate traversal for each kind of change, or one traversal routine to which we pass another routine which determines what is to be done to each value during the traversal.

Report tree statistics of various kinds

Find the size of a BST

if the tree is empty
    return 0
else
    return 1 + left-subtree count + right-subtree count

Count the leaves in a BST

if the tree is empty
    return 0
else if both subtrees are empty
    return 1
else
    return left-subtree count + right-subtree count

Find the height (or depth) of a BST
```
if the tree is empty
    return -1
else
    return 1 + max(left-subtree depth, right-subtree depth)
```
This operation to find the height of the tree (or, equivalently, the depth of the tree, since both are the maximum length of any path from the root to a leaf) is a little more interesting. Note, in this connection, that the height of any node is one more than the maximum of the heights of the two subtrees of that node. In this case you will find it convenient to give an empty tree a "height" of -1, so that a tree with one node in it will have a height of 0.
This is a case in which, when writing the algorithm, it seems better to write the general part of the recursion first, and deduce from it what you want the stopping condition to be, since if you start by trying to deduce the stopping condition there may be a tendency to leap to the conclusion that the value should be zero, which is not what you want. This is a bit unusual, since writing recursive routines more often proceeds the other way.

Balance a BST

As we know, a binary search tree may or may not be balanced, and in the worst case may be nothing more than a linked list with each node having an extra (redundant) link in it. However, it is always possible to convert a binary search tree that is not balanced into one that is. There are many different schemes for doing this, some of them very complicated and quite difficult to implement correctly.

At the highest level, as you know, the reason we want "balance" or "bushiness" in our binary search trees is to guarantee O(log n) search times. We present one simple (but not necessarily efficient) algorithm for achieving balance. Some others are mentioned briefly in the following section.

Here is some pseudocode for building a balanced binary search tree:

Place all values into a vector
Sort values in the vector
Insert middle value from vector into BST as root (at level 0)
Insert middle values of two remaining subvectors as
   children of the level 0 node (root)
Insert the middle values of the four remaining subvectors
   as children of the level 1 nodes
Continue until all values have been inserted

Note that although this pseudocode is not stated in "standard" recursive form, you should see that it is clearly recursive and is best implemented that way.

Note that this algorithm involves placing all the values into an auxiliary container and performing a sort on them before beginning to build the BST. Once the tree is built, though, we have a balanced BST and can be assured of O(log n) searches, as long as there are no deletions from, or additional insertions into, the tree.

But suppose that, in addition to searching, we are also inserting into and deleting from, the tree. Then we can expect, over time, that the shape of the tree might well become "unbalanced" to a greater or lesser degree, which might, in turn, result in degradation in search times. How can we remedy this problem? That is, how can we re-balance a BST if necessary?

Well, it is, after all, a BST. So, at any time we can delete all the values and place them back into a vector. If we do this via an inorder traversal, the keys will automatically be in ascending order. Thus we won't have to sort, but we can then use the rest of the above algorithm to build a new BST which will again be balanced.

Binary Tree Rotations

Note that the above balancing method works "globally", i.e., it takes the entire tree and (potentially at least) rearranges all the values in it, in order to achieve balance. There are, of course, other methods that work this way as well. Still other methods work "locally" to maintain the balance in an already balanced tree when a new node is added or deleted. Many of these methods involve performing what are called "rotations" to reposition one or more nodes within the tree while maintaining the binary search tree property and also maintaining or improving the tree balance. Here is a Wikipedia discussion.

Here are the major features of binary search tree rotations:

There are essentially four kinds of rotations, each of which is said to be performed "around a node":
1. A simple left rotation, in which the node is moved to the position of its left child, and the node's right child is moved to where the node itself was; also the left subtree of the node's right child becomes the right subtree of the node
2. A simple right rotation, in which the node is moved to the position of its right child, and the node's left child is moved to where the node itself was; also the right subtree of the node's left child becomes the left subtree of the node
3. A double left rotation, which consists of a simple left rotation around the left child of the node, followed by a right rotation around the node itself
4. A double right rotation, which consists of a simple right rotation around the right child of the node, followed by a left rotation around the node itself
Note that left and right rotations are symmetric operations.
Nodes not contained within the subtree of the node around which the rotation is performed are unaffected by the rotation.
Any rotation maintains the binary search tree property.
Any rotation takes only constant time.

Other balancing methods for BSTs

Some of the other tree-balancing (or, more generally, tree restructuring) schemes that you might see references to, and which we list here just so you'll know what the names refer to, if and when you do encounter them, include:

AVL trees (devised by two Russians, G. M. Adel'son-Vel'skii and E. M. Landis)
An AVL tree is a BST which is empty or which satisfies the following two properties:

The heights of the left and right subtrees of the root differ by at most one.
The left and right subtrees are again AVL trees.

The algorithm for balancing AVL trees works locally, i.e., by performing rotations to maintain the AVL property as nodes are added or deleted. Although the algorithm is historically important, to quote the text by Mark Weiss, which was used for the course several years ago, "Better balanced search trees have since been discovered, so implementing an AVL tree is not worthwhile."

Red-black trees
A red-black tree is a BST which is empty, or in which each node is colored red or black, the root is black, and the following two properties are satisfied:

Black condition The number of black nodes must be the same on all paths from the root to a node with one child, or with no children.
Red condition If a node is red, its parent exists and is black.

Both red-black trees and AVL trees provide O(log n) searching, but red-black trees tend to be faster. As a matter of interest, the STL associative containers are generally implemented using red-black trees.

Splay trees
A splay tree is one kind of self-organizing (or self-restructuring) tree. The idea in a self-organizing tree is that not all elements are accessed with the same frequency. For example, if an element on the twenty-fifth level of a tree is used only infrequently, then the execution of the entire program is not severely impaired by accessing this level. On the other hand, if the same element is being continually accessed, it makes considerable difference whether that element is on the twenty-fifth level or close to the root. So, the strategy in self-adjusting trees is to restructure the tree only by moving up the tree those elements that are used more often, creating a kind of "priority tree". The frequency of node access can be determined in various ways (putting a counter field in each node, for example).

A very interesting thing about splay trees is this: Even though they can become highly unbalanced, so that a single access to some particular node can be very expensive, one can still show that, over a long sequence of accesses, they are not at all expensive, and in fact they are guaranteed to be similar in performance to AVL trees. To show this we would need to use something called amortized algorithm analysis, which is analogous to those insurance calculations in which a few expensive cases are averaged with in with many less expensive cases to obtain overall excellent performance (in this case, over a long sequence of operations on the BST). Also, remember that the STL "amortizes" the amount of work done in reallocating storage space for a full vector that wants to add an element by getting more space than is actually needed (in case more elements need to be added later).