Wednesday, October 8, 2014

Binary Search Trees - Search

Note - The code examples used in this post are located at my GitHub account - Binary Search Trees.

Imagine you were hired to develop an algorithm for finding a name in a phonebook. 

The simplest way to achieve this would be to start from the beginning of the phonebook and systematically flip through every page until we have found the name. Whilst this will certainly provide the right answer every time, is it the most efficient? 

Think what would happen if we had to find someone with the surname of “Jerry”, we’d start from A and have to flip through the phonebook to J to find it. If our phonebook contained millions of names, this could take quite a while. 

The structure of the data in a phonebook is special because it has the benefit of being ordered alphabetically. We can take advantage of this to make our search run faster. 

Now imagine that we begin our search for “Jerry” in the middle of the phonebook at the letter M. We know that “Jerry” must be in the left half of our phonebook since J comes before M, so we find the midpoint in the left half of the phonebook (A and M) and start the search process again. Each time we don’t find the name we continue the search on the left or right side, depending on where we are in the phonebook and the name that we want to find. If the name we want comes before the letter we are currently searching, then we search the left side and vice versa. 

With these steps we will reach two eventual outcomes -
  1. We will find the name in the phonebook.
  2. We will run out of phonebook space to search.
Because we discard half of the remaining phonebook each time we split left or right, if the name isn’t in the phonebook we will keep on eliminating parts of the phonebook to search until there is nothing. When we get to this point, it means that the name isn’t in the phonebook.

Starting in the middle, doing a comparison and continuing our search in either the left or right subspace if we do not find the item, is what we call a binary search