General Reference Material | The Binary Search Algorithm

The Binary Search Algorithm

Binary Search: Basic Idea and Key Assumption

This search requires this very important criterion to be satisfied or the algorithm cannot be guaranteed to work:

The data values being searched must be sorted.

The idea of this search is to use the fact that the data is sorted as follows: At each stage look at the middle value of all remaining data and note that either that's the value you're looking for, or you can restrict further searches to one side or the other of that middle value, thereby eliminating at least half of the remaining data at each stage.

Binary Search: Name, Input and Output

Name: SearchBinary

Input:

sequence, a sequence of values which all have the same data type
targetValue, a value of the same data type as the values in sequence, and whose location in sequence is being sought
firstPosition and lastPosition, two positions in sequence which determine the range of values within sequence that will be searched, and which satisfy firstPosition <= lastPosition

Output:

sequence, unchanged
targetFound, a boolean quantity indicating whether the target value has, in fact, been found
targetPosition, a position value which is the first position in sequence where the target value has been located, provided targetFound is true (Note that targetPosition is undefined if targetFound is false.)

Binary Search: Pseudocode

Algorithm SearchBinary(sequence, targetValue, firstPosition, lastPosition,
                       targetPosition, targetFound)
--------------------------------------------------------------------------
while more values to look at and target value not found
  if target value < current middle value
    Look in lower portion of remaining values
  else if target value > current middle value
    Look in upper portion of remaining values
  else
    Target value has been found at current middle value
endwhile

Note that the three tests which are done to see if the target value has been found, and if not, to determine which way to go, can be done in 3! = 6 different orders.

Note as well that if the algorithm finds the target value, there may also be other instances of that target value in the sequence.

Binary Search: Performance

The binary search algorithm is, in general, a O(log n) algorithm. That is, the algorithm has "logarithmic time complexity".

As is the case with any search algorithm, the best-case performance of binary search is O(1), since the target value might be the first one examined.

The binary search algorithm is a bit unusual in that its worst-case performance is still very good, and easier to calculate than its average case performance, both of which are O(log n). To show that the worst-case performance of binary search is O(log n) we proceed as follows.

We begin by noting that on each pass of the algorithm we eliminate at least half of the remaining values. We say "at least half" for the following reasons.

If the number of remaining values is odd, then on the next pass we eliminate the middle value and all those values on one side or the other of that middle value, which is more than half of the values in either case.
If the number of remaining values is even, the middle value will be the last value in the lower half of the values and we eliminate either the lower half of the values, or the top half of the values plus that middle value. That is, we eliminate either exactly half of the values, or half of the values plus one.

Note as well that on each pass of the algorithm, at most two comparisons are performed.

Now, let's assume we have n data values, and also assume that k is the smallest positive integer for which n <= 2^k. Then in the case of equality we know that

On pass 1 we eliminate at least 2^k-1 values (i.e., 1/2 of 2^k values).
On pass 2 we eliminate at least 2^k-2 values (i.e., 1/2 of 2^k-1 values).
...
On pass k-1 we eliminate at least 2^k-(k-1) = 2¹ = 2 values (i.e., 1/2 of 2² values).
On pass k we eliminate at least 2^k-k = 2⁰ = 1 value (i.e., 1/2 of 2¹ values).
This still leaves one value which has to be examined on a final pass if the target value has not yet been found.

Hence we see that the maximum number of passes of the algorithm that will be required to find the target value or determine that it is not present is k+1.

And if n < 2^k (i.e., we have strict inequality rather than equality), then the maximum number of passes of the algorithm is still bounded above by k+1.

Now, we have assumed that

2^k-1 < n <= 2^k, for some k

and because the logarithm function is an increasing function we also have

k-1 < log₂ n <= k, for that k

which means that k = ceil(log₂ n).

Finally, since there are at most two comparisons per iteration (pass) of the algorithm, and at most k+1 passes of the algorithm, it follows that there are at most 2*(k+1) = 2*(ceil(log₂ n) + 1) comparisons for the complete binary search. Or, in other words, the search has a time complexity bound of O(log n).

Note that we are using Java/C++ notation for the "ceiling function" in the above paragraph.

Note also that log₂ n is also sometimes written lg n, or even log n, provided we assume a base of 2, rather than the normal base of e that is usually assumed for log n. A base of 2 is in fact what we do assume in most of these scenarios involving algorithm analysis. And furthermore, when we are talking about the logarithm function as a measure of complexity the base is irrelevant since the relationship between two logarithmic functions with different bases is just a multiplicative constant, which never appears in a "big Oh" or other complexity measure in any case.

So, the worst-case scenario for the binary search algorithm is O(log n). But don't forget that the data values must be sorted in advance!