General Reference Material | Notes on Binary Search Trees

Notes on Binary Search Trees

What is a binary search tree (BST)?

A binary search tree is a binary tree in which each node contains a key, all keys are unique (for simplicity) and for which the following criterion is satisfied:

For each node in the tree, all keys in the nodes of its left subtree must be less than the key in the node, and all keys in the nodes of its right subtree must be greater than the key in the node.

This is called the binary search tree property, and is a powerful and useful property for a tree to have. For example, it turns out (and this should be clear if you think about the property for a moment), that an inorder traversal of a binary search tree visits the nodes in ascending key order.

What kind of structure can be used to store a binary search tree?

Here is one possibility for a suitable structure that we can use for a "nested" class in Java, if we limit our binary search trees to contain characters (i.e., our binary search tree class will not be generic):

private static class TreeNode
{
    private char data;      //The data in this node.
    private TreeNode left;  //Reference to left subtree.
    private TreeNode right; //Reference to right subtree.

    /**
    * Constructor.
    * Creates a node containing the input character as data.
    * References to the left and right subnodes will be null by default.
    */
    public TreeNode(char c)
    {
        data = c;
    }
}

Notes on Binary Search Tree Implementation

Rather than go to the trouble of creating an entire BinarySearchTree class, our demo program will just start with an empty tree (a private static root node variable set to null) and use whatever methods are required to build a tree on that root and perform the necessary tasks.

Insert a value into a BST

In implementing this operation we will

always insert a new node as a leaf
maintain the "BST property" as we insert each new node
not be concerned about the shape of the tree (i.e., we will not attempt to achieve or maintain a "balanced" or "bushy" tree as we insert each new node)

This method can be implemented recursively, like most of the others we want for our tree, but let's go with an iterative version, for which we have the following pseudocode:

if tree is empty
    Create a new node at the root, put the data into it and return
while (true) //A potentially infinite loop but not really since you're
             //going to look down the tree and must come to the end
    if newItem < data in the current node
        if left child is null
            Create a new node in the left child, put the data into it and return
        else
            Move down to the root of the left subtree
    else //newItem > data in the current node (equality not possible)
        if right child is null
            Create a new node in the right child, put the data into it and return
        else
            Move down to the root of the right subtree

Write all values in the binary search tree to the standard output

There are several things to consider in implementing this operation:
Preorder, inorder and postorder we know all about, but what about "levelorder, left to right"? Here's an idea for implementing that option:

Rather than recursion, use an iterative approach that employs a queue as an auxiliary structure and whose pseudocode goes something like this:

Put the root node of the tree into the queue
while the queue is not empty
    Remove the first item from the queue and display its value
    Put left child of node just removed from queue into queue
    Put right child of node just removed from queue into queue

Find a value in a BST (or check if a BST "contains" a given value)

Because in our (simplified) BST there is nothing in a node except the key value, a simple search operation like this is not of much practical use (for retrieving additional information associated with a key, for example), and serves only as a way of determining whether a particular value is, or is not, in the tree. As usual, the actual "retrieving" will be done by an auxiliary helper function. Here is some pseudocode for this function, which, you will note, bears a marked resemblance to the binary search algorithm):

if this node is empty
    found is false
else if the value sought is < value in this node
    Retrieve the value sought from the left subtree
else if the value sought is > value in this node
    Retrieve the value from the right subtree
else
    found is true

Report tree statistics of various kinds

Find the size of a BST

if the tree is empty
    return 0
else
    return 1 + left-subtree count + right-subtree count

Find the height (or depth) of a BST
```
if the tree is empty
    return -1
else
    return 1 + max(left-subtree depth, right-subtree depth)
```
This operation to find the height of the tree (or, equivalently, the depth of the tree, since both are the maximum length of any path from the root to a leaf) is a little more interesting. Note, in this connection, that the height of any node is one more than the maximum of the heights of the two subtrees of that node. In this case you will find it convenient to give an empty tree a "height" of -1, so that a tree with one node in it will have a height of 0.
This is a case in which, when writing the algorithm, it seems better to write the general part of the recursion first, and deduce from it what you want the stopping condition to be, since if you start by trying to deduce the stopping condition there may be a tendency to leap to the conclusion that the value should be zero, which is not what you want. This is a bit unusual, since writing recursive routines more often proceeds the other way.

Balance a BST

As we know, a binary search tree may or may not be balanced, and in the worst case may be nothing more than a linked list with each node having an extra (redundant) link in it. However, it is always possible to convert a binary search tree that is not balanced into one that is. There are many different schemes for doing this, some of them very complicated and quite difficult to implement correctly.

At the highest level, as you know, the reason we want "balance" or "bushiness" in our binary search trees is to guarantee O(log n) search times. We present one simple (but not necessarily efficient) algorithm for achieving balance. Some others are mentioned briefly in the following section.

Here is some pseudocode for building a balanced binary search tree:

Place all values into a linear structure
Sort values in the linear structure
Insert middle value from linear structure into BST as root (at level 0)
Insert middle values of two remaining linear sub-structures as
   children of the level 0 node (root)
Insert the middle values of the four remaining sub-structures
   as children of the level 1 nodes
Continue until all values have been inserted

Note that although this pseudocode is not stated in "standard" recursive form, you should see that it is clearly recursive and is best implemented that way.

Note that this algorithm involves placing all the values into an auxiliary container and performing a sort on them before beginning to build the BST. Once the tree is built, though, we have a balanced BST and can be assured of O(log n) searches, as long as there are no deletions from, or additional insertions into, the tree.

But suppose that, in addition to searching, we are also inserting into and deleting from, the tree. Then we can expect, over time, that the shape of the tree might well become "unbalanced" to a greater or lesser degree, which might, in turn, result in degradation in search times. How can we remedy this problem? That is, how can we re-balance a BST if necessary?

Well, it is, after all, a BST. So, at any time we can delete all the values and place them back into a vector. If we do this via an inorder traversal, the keys will automatically be in ascending order. Thus we won't have to sort, but we can then use the rest of the above algorithm to build a new BST which will again be balanced.