Building A Binary Search Tree Template Class
A binary search tree is a binary tree in which each node contains a key, all keys are unique (for simplicity) and for which the following criterion is satisfied: For each node in the tree, all keys in the nodes of its left subtree must be less than the key in the node, and all keys in the nodes of its right subtree must be greater than the key in the node. This is called the binary search tree property, and is a powerful and useful property for a tree to have. For example, it turns out (and this should be clear if you think about the property for a moment), that an inorder traversal of a binary search tree visits the nodes in ascending key order.
All we need is a "garden variety" node with one field for data, which must either be able to serve as a key (i.e., the data must be of a type for which operator< is defined) and the usual two fields for pointers to the root of the left child and right child, respectively. That is, we want a structure like
struct TreeNode { TreeNode* left; DataType data; TreeNode* right; };in which DataType represents whatever kind of data we wish to store in the tree. Actually, because we want to have a template class, we really want this:
<typename DataType> struct TreeNode { TreeNode* left; DataType data; TreeNode* right; };
First of all, we can use an underlying structure like that shown above as the basis for constructing a Binary Search Tree template class. But we should not overlook the fact that there may well be other structures we could also use; it's just that the above structure is kind of a "natural" choice.
Because we want to practice information hiding, and therefore don't want to reveal in our ADT interface the fact that we are indeed using the above structure to store each node in the tree, we shall have to make use of "auxiliary" or "helper" functions behind the scenes to do the actual work of dealing with the nodes of the tree.
Our template class will have as its single private data member a pointer to the root node of a tree whose nodes are of the above kind. Here are some of the operations that we might wish to implement, and some notes on each:
nullptr
.nullptr
.The body of this function will just be a call to another (recursive) helper function that does the actual insertion into the tree. This function is, of course, not a member function, unlike the insert() function itself. Here is some pseudocode for this helper function, which will require a node pointer and the item to be inserted in its parameter list:
if this node is empty Put the value into this node else if the value is < value in this node Put the value into the left subtree of this node else if the value is > value in this node Put the value into the right subtree of this node
We should also observe at this point that the insert() operation that is used to populate an entire tree is the same one that is used to insert one additional item at a time, should the user decide to do this.
Preorder, inorder and postorder we know all about, but what about "level-order, left to right"? Here's an idea for implementing that option: Rather than recursion, use an iterative approach that employs a queue as an auxiliary structure and whose pseudocode goes something like this:
Put the root node of the tree into the queue while the queue is not empty Remove the first item from the queue and display its value Put left child of node just removed from queue into queue Put right child of node just removed from queue into queue
And perhaps the most useful kind of display of all would be one that actually shows some kind of "tree shape", so that the user can actually see what the tree looks like, where each node containing a data value is located, how the nodes are positioned relative to one another, and how the shape and size of the tree varies over time as various operations are performed on it.
if this node is empty found is false else if the value sought is < value in this node Retrieve the value sought from the left subtree else if the value sought is > value in this node Retrieve the value from the right subtree else found is true Copy the value found into the tree into the parameterWe may want to implement other useful search operations (for the maximum and/or minimum value in a BST, for example). To do so, we need to have some way (perhaps any way) of traversing the entire tree while examining the values to see if they satisfy a particular criterion. The question that needs to be answered is this: Should we have a separate traversal for each criterion, or one traversal routine to which we pass another routine which tests a value to see if it meets the required criterion.
Let us assume the existence of a find (or "retrieve") operation (see below) and then make use of that operation as a "helper" here. As for the actual deletion, there are three cases to consider, depending on the status of the node to be deleted:
nullptr
and then disposing of the node itself.What element could we replace the deleted node with that would maintain the BST property? Either the logical predecessor or the logical successor will do, but we choose to go with the predecessor. In fact, rather than actually delete the given node, we copy the value of its predecessor into the node to be deleted, and then delete the predecessor node. The reason this makes sense is because of where the predecessor node is located. [The predecessor of a node is the largest value in the node's left subtree, which is the node "furthest to the right" in that subtree.]
if the tree is empty return 0 else return 1 + left-subtree count + right-subtree count
if the tree is empty return 0 else if both subtrees are empty return 1 else return left-subtree count + right-subtree count
if the tree is empty return -1 else return 1 + max(left-subtree depth, right-subtree depth)This operation to find the height of the tree (or, equivalently, the depth of the tree, since both are the maximum length of any path from the root to a leaf) is a little more interesting. Note, in this connection, that the height of any node is one more than the maximum of the heights of the two subtrees of that node. In this case you will find it convenient to give an empty tree a "height" of -1, so that a tree with one node in it will have a height of 0.
This is a case in which, when writing the algorithm, it seems better to write the general part of the recursion first, and deduce from it what you want the stopping condition to be, since if you start by trying to deduce the stopping condition there may be a tendency to leap to the conclusion that the value should be zero, which is not what you want. This is a bit unusual, since writing recursive routines more often proceeds the other way.
As we know, a binary search tree may or may not be balanced, and in the worst case may be nothing more than a linked list with each node having an extra (redundant) link in it. However, it is always possible to convert a binary search tree that is not balanced into one that is. There are many different schemes for doing this, some of them very complicated and quite difficult to implement correctly.
At the highest level, as you know, the reason we want "balance" or "bushiness" in our binary search trees is to guarantee O(log n) search times. We present one simple (but not necessarily efficient) algorithm for achieving balance. Some others are mentioned briefly in the following section.
Here is some pseudocode for building a balanced binary search tree:
Place all values into a vector Sort values in the vector Insert middle value from vector into BST as root (at level 0) Insert middle values of two remaining subvectors as children of the level 0 node (root) Insert the middle values of the four remaining subvectors as children of the level 1 nodes Continue until all values have been inserted
Note that although this pseudocode is not stated in "standard" recursive form, you should see that it is clearly recursive and is best implemented that way.
Note that this algorithm involves placing all the values into an auxiliary container and performing a sort on them before beginning to build the BST. Once the tree is built, though, we have a balanced BST and can be assured of O(log n) searches, as long as there are no deletions from, or additional insertions into, the tree.
But suppose that, in addition to searching, we are also inserting into and deleting from, the tree. Then we can expect, over time, that the shape of the tree might well become "unbalanced" to a greater or lesser degree, which might, in turn, result in degradation in search times. How can we remedy this problem? That is, how can we re-balance a BST if necessary?
Well, it is, after all, a BST. So, at any time we can delete all the values and place them back into a vector. If we do this via an inorder traversal, the keys will automatically be in ascending order. Thus we won't have to sort, but we can then use the rest of the above algorithm to build a new BST which will again be balanced.
Note that the above balancing method works "globally", i.e., it takes the entire tree and (potentially at least) rearranges all the values in it, in order to achieve balance. There are, of course, other methods that work this way as well. Still other methods work "locally" to maintain the balance in an already balanced tree when a new node is added or deleted. Many of these methods involve performing what are called "rotations" to reposition one or more nodes within the tree while maintaining the binary search tree property and also maintaining or improving the tree balance. Here is a Wikipedia discussion.
Here are the major features of binary search tree rotations:
Some of the other tree-balancing (or, more generally, tree restructuring) schemes that you might see references to, and which we list here just so you'll know what the names refer to, if and when you do encounter them, include:
AVL trees (devised by two Russians, G. M.
Adel'son-Vel'skii and E. M. Landis)
An AVL tree is a BST which is empty or which satisfies the
following two properties:
The algorithm for balancing AVL trees works locally, i.e., by performing rotations to maintain the AVL property as nodes are added or deleted. Although the algorithm is historically important, to quote the text by Mark Weiss, which was used for the course several years ago, "Better balanced search trees have since been discovered, so implementing an AVL tree is not worthwhile."
Red-black trees
A red-black tree is a BST which is empty, or in which each
node is colored red or black, the root is black, and the following two
properties are satisfied:
Both red-black trees and AVL trees provide O(log n) searching, but red-black trees tend to be faster. As a matter of interest, the STL associative containers are generally implemented using red-black trees.
Splay trees
A splay tree is one kind of self-organizing (or self-restructuring)
tree. The idea in a self-organizing tree is that not all elements are
accessed with the same frequency. For example, if an element on the
twenty-fifth level of a tree is used only infrequently, then the
execution of the entire program is not severely impaired by accessing
this level. On the other hand, if the same element is being continually
accessed, it makes considerable difference whether that element is on
the twenty-fifth level or close to the root. So, the strategy in
self-adjusting trees is to restructure the tree only by moving up the
tree those elements that are used more often, creating a kind of
"priority tree". The frequency of node access can be determined in
various ways (putting a counter field in each node, for example).
A very interesting thing about splay trees is this: Even though they can become highly unbalanced, so that a single access to some particular node can be very expensive, one can still show that, over a long sequence of accesses, they are not at all expensive, and in fact they are guaranteed to be similar in performance to AVL trees. To show this we would need to use something called amortized algorithm analysis, which is analogous to those insurance calculations in which a few expensive cases are averaged with in with many less expensive cases to obtain overall excellent performance (in this case, over a long sequence of operations on the BST). Also, remember that the STL "amortizes" the amount of work done in reallocating storage space for a full vector that wants to add an element by getting more space than is actually needed (in case more elements need to be added later).