General Reference Material | Binary Expression Trees

Binary Arithmetic Expressions and Expression Trees

The discussion on this page is driven by the considerations necessary for the development of a command-line calculator that is capable of reading in and evaluating any fully-parenthesized expression with positive-integer-only operands.

What is a binary expression tree?

A binary expression tree is a binary tree that stores a binary expression (such as an arithmetic expression, but not necessarily an arithmetic expression) in such a way that each leaf node contains an operand of the expression, and each interior node contains an operator of the expression.

What kind of structure can be used to store a binary expression tree?

The first problem that arises in this context is the fact that although we only have one tree, the kind of data is not the same in each node of that tree. This is because the leaves contain operands, while the internal nodes contain operators. This is a classic situation in which a union can be used, since a tree node which is a union may, for example, hold just an operand (i.e., an integer value), while another union (of the same type) may hold an operator and two pointers to two other child nodes.

Here is a suitable struct type for the nodes of a binary expression tree. The node type contains a field which is of union type. The enum type is convenient for labelling what kind of data is in the tree node.

enum TagType {INT_ONLY, SUB_NODE};
struct Node
{
    TagType tag;
    union
    {
        int intValue;
        struct
        {
            Node* left;
            char op;
            Node* right;
        };
    };
};

You should draw some pictures of expression trees with each node being an appropriate thing of the above type.

How do we get a binary expression into a binary expression tree?

The first question here is, "From where?", and we answer it by supposing that we are reading the expression in as a sequence of command-line parameters, and we make the following further assumptions, each of which is designed to simplify the details, while still requiring the essential analysis:

First, it will not matter whether any, or how much, whitespace separates the operands, operators and/or parentheses of the expression on the command line.
Second, the expression will either be a single positive integer value, or will be fully parenthesized. More explicitly, and more helpfully, this means we shall only require our calculator to be able to deal with expressions that may be defined recursively as follows: A valid expression is either
- A positive integer value
- An expression of the form (a op b), where a and b are both positive integer values (or one or both or them are again of the form (a op b)), and op is one of +, -, * or /.

Given that we are able to read such a valid expression from a stream (the keyboard, a file stream, or a string stream, for example), we may use the following pseudocode for a recursive input-and-build routine that is given a node and attempts to find something to place into it, and in so doing reads the entire expression and builds the corresponding tree:

Algorithm expressionFound
-------------------------
Input: the stream from which an expression will be read
       an empty tree node to hold the expression
Output: if the read is successful, tree node contains either a positive integer,
         or valid sub-tree information (operator plus two pointers)
Return value: true if successful, false otherwise
---------------------------------------------------------------
if a positive integer is read
    Put that integer into the node
else if a left parenthesis is read then
    if expressionFound finds another valid expression then
        if a valid operator is read then
            if expressionFound finds another valid expression then
                if a right parenthesis is read
                    Put the expression into the node

How do we evaluate the expression in a binary expression tree?

The first thing to note in this context is that the value of the expression is completely determined by the structure of the tree. That is, given the binary expression tree, we can evaluate the expression it contains without having to worry about any precedence rules, or where parentheses might have been located when the expression was written in the usual infix form that we use for such expressions.

So, suppose a valid binary expression, as defined above, has been loaded into its corresponding valid binary expression tree, using the routine whose pseudocode is given above. Then the recursive definition of such a tree also provides us with a convenient recursive algorithm for the evaluation of the expression contained in the tree. The "value" of any "node" in the tree is interpreted to be

The value of the integer in the node, if the node does contain an integer.
Otherwise, it is the value of the node which is the left child of the given node, combined with the value of the node which is the right child of the given node (in that order) using the operator that is in the given node.

This interpretation gives us this pseudocode for the evaluation routine, valueOf, say,

Algorithm valueOf(someNode)
if someNode contains an integer
    return the value of that integer
else
    return valueOf(someNode's left Child) op valueOf(someNode's right child)
endif

from which it follows that the value of the complete binary expression contained in a binary expression tree will be valueOf(root), where "root" is the root node of that tree.

Three ways to write and evaluate binary expressions

There are three common forms in that one may use to write and evaluate binary arithmetic (or other binary) expressions, including

infix form, in which an operator comes between its two operands (which, of course, is the "usual form" that we use to write most of our arithmetic expressions on a daily basis)
prefix form, in which an operator comes before its two operands
postfix form, in which an operator comes after its two operands

Here are some examples of each:

Infix Form	Fully Parenthesized Infix Form	Prefix Form	Postfix Form	Value
`a / b`	`(a / b)`	`/ a b`	`a b /`
`a + b - c`	`((a + b) - c)`	`- + a b c`	`a b + c -`
`a + b / c`	`(a + (b / c))`	`+ a / b c`	`a b c / +`
`(a + b) / c`	`((a + b) / c)`	`/ + a b c`	`a b + c /`
`(6 + 2) * 3 / (9 - 7)`	`(((6 + 2) * 3) / (9 - 7))`	`/ * + 6 2 3 - 9 7`	`6 2 + 3 * 9 7 - /`	`12`
`2 * 3 + 15 / (5 + 7)`	`((2 * 3) + (15 / (5 + 7)))`	`+ * 2 3 / 15 + 5 7`	`2 3 * 15 5 7 + / +`	`7`

Let's look at how we evaluate expressions in each of these forms.

First, we know, or we should know, how to evaluate expressions in infix form, since this is the form we use in our everyday lives (if our everyday lives involve expression evaluation).

Here is the pseudocode for a simple recursive algorithm which evaluates an expression in prefix form:

Algorithm valueOfPrefixExpression(prefixExpression)
-------------------------------------------------
Input: a valid positive-integer arithmetic expression in prefix form
Return value: the value of the prefix expression
--------------------------------------------------------------------
if next token is an integer
    return the integer
else
    Read an operator, say op
    firstOperand gets valueOfPrefixExpression(remainingExpression)
    secondOperand gets valueOfPrefixExpression(remainingExpression)
    return firstOperand op secondOperand
endif

Finally, here is the pseudocode for a simple algorithm which reads and finds the value of an expression in postfix form. It makes use of an "auxiliary stack".

Algorithm ReadAndReturnValueOf(postfixExpression)
-------------------------------------------------
Input: a valid positive-integer arithmetic expression in postfix form
Return value: the value of the postfix expression
---------------------------------------------------------------------
while more tokens (operands or operators) to read
    Read next token
    if token is an operand
        Push token onto stack
    else if token is an operator ("op", say)
        Pop stack and put value in temp2
        Pop stack and put value in temp1
        Push the value of temp1 op temp2 back onto stack
    endif
endwhile
Pop value from stack and return it

It is very useful and informative to draw a picture of this process, with time proceeding from left to right and a picture of the stack and its contents after each token has been read and processed. See here for an example of such a diagram.

The prefix and postfix forms of any expression may be evaluated unambiguously, unlike the infix form, which needs (or at least, may need) additional input in the form of a precedence table, and/or one or more pairs of parentheses, to be properly evaluated.

Relationships between binary expression trees and the prefix, postfix and infix forms

The relationship between the binary expression tree form of a binary expression and these other forms may be summarized as follows:

Performing an inorder traversal of a binary expression tree and recording the node contents produces the "infix form" of the expression stored in the tree.
Performing a preorder traversal of a binary expression tree and recording the node contents produces the "prefix form" of the expression stored in the tree.
Performing a postorder traversal of a binary expression tree and recording the node contents produces the "postfix form" of the expression stored in the tree.

An obvious question seems to arise at this point: Given an expression in one of these three forms, how can we find either of the other two forms which are equivalent to this first one? This can be done, of course, but a full investigation would take us too far afield.

Additional notes on implementing a calculator for fully-parenthesized arithmetic expressions

Let's suppose we want to implement a simple, four-function calculator capable of reading in, from the command-line, an arbitrarily complex, but fully parenthesized, arithmetic expression and then computing the value of that expression and displaying both the expression and its value on the following line. No matter how much or how little whitespace appears in the input, the display of the expression must be consistently formatted, and followed by an equal sign with a space on either side, and finally by the value of the expression. This is to be the output if the input expression is valid and can thus be evaluated. If this is not the case, the calculator must simply output the following message line:

Bad input! Could not compute value.

As a program, the calculator "terminates" after each calculation, and each calculation of a new expression requires a new run of the program.

If such a calculator program had to be capable of dealing properly with any kind of valid arithmetic expression we might throw at it, the program itself would have to be quite complex and would require more programming effort than is necessary for the fully-parenthesized expression case.

So, to make our lives easier, we shall make some simplifying assumptions:

First, we shall assume only integer arithmetic. This essentially means that the only numbers allowed as input are integers, and division is restricted to integer division.
Second, we shall assume that all operators are binary. So, for example, "unary minus" is not permitted, and hence we will have to write an expression like
```
-3 * (8 + 5)
```
explicitly as
```
(0 - 3) * (8 + 5)
```
Third, we shall avoid the problem of operator precedence by requiring all input expressions to be "fully parenthesized" to indicate explicitly the order in which operations are to be performed. For example, the fully-parenthesized form (and only valid form, except that the amount of internal whitespace could vary) of the above expression is this:
```
((0 - 3) * (8 + 5))
```
Fourth, extra (redundant) parentheses will not be allowed either. So, for example, this is not valid:
```
((0 - 3) * ((8 + 5)))
```

The effect of these (somewhat limiting) assumptions is that our notion of an "arithmetic expression" may be stated quite concisely in the (recursive) definition which follows.

An arithmetic expression (in our current and restricted context) is either

A positive integer, or
Something of the form
```
(left op right)
```

in which:

The parentheses are necessary.
op is one of +, -, *, or /
Each of left and right is again an arithmetic expression.

With the above definition and assumptions, you should be able to see, for example, that none of the following three input lines is valid:

(3 + 7 * 2)   Not fully parenthesized
6 * 5         Not fully parenthesized
((3 + 3))     Contains redundant parentheses

On the other hand, you should also agree that each of the following expressions is valid:

26
((3 + 7) * 2)
(3 + (7 * 2))
(6 * 5)
(3+3)

For each expression, the calculator would build a suitable binary expression tree to hold the expression, and use this expression tree both to evaluate the expression, and to output the expression, along with its value, in the manner indicated below.

Use of a recursively-defined binary expression tree to hold each input expression means that each routine dealing with this structure (for input, for evaluation, and for output) should also be recursive. Furthermore, the calculator must ignore all extra and irrelevant blank space(s) on the input line, and the output routine must display the input expressions in a way that uses blank spaces consistently, as illustrated by the following sample input/output:

Sample input expressions   Corresponding sample output
123                        123 = 123
(15 * 3)                   (15 * 3) = 45
(15   *3 )                 (15 * 3) = 45
(3+(7*2))                  (3 + (7 * 2)) = 17
((44 - 3)/((2 + 5)*3))     ((44 - 3) / ((2 + 5) * 3)) = 1

In each case, compare the spacing in the input and the output, and note the consistent spacing that appears in the output, however the input may have been formatted.