Binary Search: Basic Idea and Key Assumption

This search requires this very important criterion to be satisfied or the algorithm cannot be guaranteed to work:

The data values being searched must be sorted.

The idea of this search is to use the fact that the data is sorted as follows: At each stage look at the middle value of all remaining data and note that either that's the value you're looking for, or you can restrict further searches to one side or the other of that middle value, thereby eliminating at least half of the remaining data at each stage.

Binary Search: Name, Input and Output

Name: SearchBinary

Input:

Output:

Binary Search: Pseudocode

Algorithm SearchBinary(sequence, targetValue, firstPosition, lastPosition,
                       targetPosition, targetFound)
--------------------------------------------------------------------------
while more values to look at and target value not found
  if target value < current middle value
    Look in lower portion of remaining values
  else if target value > current middle value
    Look in upper portion of remaining values
  else
    Target value has been found at current middle value
endwhile

Note that the three tests which are done to see if the target value has been found, and if not, to determine which way to go, can be done in 3! = 6 different orders.

Note as well that if the algorithm finds the target value, there may also be other instances of that target value in the sequence.

Binary Search: Performance

The binary search algorithm is, in general, a O(log n) algorithm. That is, the algorithm has "logarithmic time complexity".

As is the case with any search algorithm, the best-case performance of binary search is O(1), since the target value might be the first one examined.

The binary search algorithm is a bit unusual in that its worst-case performance is still very good, and easier to calculate than its average case performance, both of which are O(log n). To show that the worst-case performance of binary search is O(log n) we proceed as follows.

We begin by noting that on each pass of the algorithm we eliminate at least half of the remaining values. We say "at least half" for the following reasons.

Note as well that on each pass of the algorithm, at most two comparisons are performed.

Now, let's assume we have n data values, and also assume that k is the smallest positive integer for which n <= 2k. Then in the case of equality we know that

Hence we see that the maximum number of passes of the algorithm that will be required to find the target value or determine that it is not present is k+1.

And if n < 2k (i.e., we have strict inequality rather than equality), then the maximum number of passes of the algorithm is still bounded above by k+1.

Now, we have assumed that

2k-1 < n <= 2k, for some k

and because the logarithm function is an increasing function we also have

k-1 < log2 n <= k, for that k

which means that k = ceil(log2 n).

Finally, since there are at most two comparisons per iteration (pass) of the algorithm, and at most k+1 passes of the algorithm, it follows that there are at most 2*(k+1) = 2*(ceil(log2 n) + 1) comparisons for the complete binary search. Or, in other words, the search has a time complexity bound of O(log n).


Note that we are using Java/C++ notation for the "ceiling function" in the above paragraph.

Note also that log2 n is also sometimes written lg n, or even log n, provided we assume a base of 2, rather than the normal base of e that is usually assumed for log n. A base of 2 is in fact what we do assume in most of these scenarios involving algorithm analysis. And furthermore, when we are talking about the logarithm function as a measure of complexity the base is irrelevant since the relationship between two logarithmic functions with different bases is just a multiplicative constant, which never appears in a "big Oh" or other complexity measure in any case.


So, the worst-case scenario for the binary search algorithm is O(log n). But don't forget that the data values must be sorted in advance!