Wednesday, October 8, 2014

Binary Search Trees - Search

Note - The code examples used in this post are located at my GitHub account - Binary Search Trees.

Imagine you were hired to develop an algorithm for finding a name in a phonebook. 

The simplest way to achieve this would be to start from the beginning of the phonebook and systematically flip through every page until we have found the name. Whilst this will certainly provide the right answer every time, is it the most efficient? 

Think what would happen if we had to find someone with the surname of “Jerry”, we’d start from A and have to flip through the phonebook to J to find it. If our phonebook contained millions of names, this could take quite a while. 

The structure of the data in a phonebook is special because it has the benefit of being ordered alphabetically. We can take advantage of this to make our search run faster. 

Now imagine that we begin our search for “Jerry” in the middle of the phonebook at the letter M. We know that “Jerry” must be in the left half of our phonebook since J comes before M, so we find the midpoint in the left half of the phonebook (A and M) and start the search process again. Each time we don’t find the name we continue the search on the left or right side, depending on where we are in the phonebook and the name that we want to find. If the name we want comes before the letter we are currently searching, then we search the left side and vice versa. 

With these steps we will reach two eventual outcomes -
  1. We will find the name in the phonebook.
  2. We will run out of phonebook space to search.
Because we discard half of the remaining phonebook each time we split left or right, if the name isn’t in the phonebook we will keep on eliminating parts of the phonebook to search until there is nothing. When we get to this point, it means that the name isn’t in the phonebook.

Starting in the middle, doing a comparison and continuing our search in either the left or right subspace if we do not find the item, is what we call a binary search

Binary Search Algorithm - 

To breakdown our binary search into a sequence of steps for our algorithm is described below - 

The search space is the remaining part of the phonebook that we can search in. The algorithm starts with the search space being the entire phonebook. 

The target value is the name that we wish to find (i.e. “Jerry”).
The current value is the name in the middle of the search space.

  1. Check to see if the size of the search space is greater than 0. If it is 0, we can stop our search since this means that the name that we’re looking for is not contained in our phonebook.  
  2. Compare the target value with the current value.
  3. If the target value matches the current value, we have found the name in the phonebook so we can stop and return otherwise we need to keep on searching.
  4. If the target value comes alphabetically before the current value, we continue our search on the left side of the phonebook.
  5. If the target value comes alphabetically after the current value, we continue our search on the right side of the phonebook.
  6. Once we’ve selected a direction to continue the search, we remove the side that we are not going to search from the search space (i.e. If we continue our search to the left, the search space should only contain the names to the left of the current middle and vice versa). We then update the current value to be the middle of our recently updated search space and go back to step 1
Figure 1 illustrates the binary search algorithm described above.


Figure 1 - Using binary search on a phonebook for Jerry

Binary search is an example of a “divide and conquer” algorithm. This is because at the end of each cycle, we divide the search space and only go down the path that leads us closer to the potential answer. 

This is evident in Figure 1, where from searching for the item "Jerry" we eliminate one side of the search space at each stage.

Example - Stepping through Figure 1 with our binary search algorithm

  • Step 1 -
    • Target value - Jerry
    • Current value - Mark (start at the middle of the array)
    • The target value does not equal the current value so we need to continue searching.
    • The target value comes alphabetically before the current value so we want to continue our search in the space to the left of our current value. We can safely discard everything to the right of the current value since we know that the target value cannot come after the current value.
    • The search space is reduced to everything to the left of the current value.

  • Step 2 -
    • Target value - Jerry
    • Current value - Dave (Middle element in the updated search space)
    • The target value does not equal the current value so we need to continue searching.
    • The target value comes alphabetically after the current value so else want to continue our search in the space to the right of our current value. Again we can safely discard everything to the left of the current value since we know that Jerry cannot come alphabetically before Dave.
    • The search space is reduced to now be everything to the right of the current value.

  • Step 3
    • Target value - Jerry
    • Current value - Jerry
    • The target value is equal to the current value so we can stop our search and return the values to say that our target value exists in our phonebook.

Binary Searches will only work on ordered data - 

As mentioned in the introduction, we can take advantage of the ordered nature of a phonebook to perform a faster search. In a binary search, the direction that we choose to go at the end of each step (if we have not found the value), depends on this ordering as it defines the relationship between the current value and the target value

We can prove that a binary search fails on data that isn’t ordered, try stepping through the algorithm to find “Tim” in Figure 2.


Figure 2 - Binary Search on an unordered array won’t succeed.

In Figure 2, if we search for the name Tim with our binary search algorithm, we’ll realise quite quickly that we run into a problem when we perform the target value and current value comparison when we choose a direction to go. The algorithm dictates that because Tim comes alphabetically after Jerry, we’re supposed to continue our search in the right subspace, however Tim actually exists in the left therefore causing our algorithm to fail. Therefore, to be able to perform a binary search on a set of elements, they must be ordered.

Binary Search Tree = Binary Search + Binary Tree - 

As you may have guessed, if we combine the node structure of a binary tree (a node comprising of a value and links to two children) with a binary search we’ll end up with a binary search tree. All we need to do is to ensure that our binary search tree structure follows the same rules that allows a binary search to go either left or right

This means, all values to the left of the current node are smaller than the value contained by the current node and all values to the right of the current node are larger than the value contained by the current node


Figure 3 - A binary search tree node (values to the left are smaller, values to the right are larger)

If we ensure that this relation is maintained throughout the entire tree, we’ll always have a valid binary search tree.

We can prove this by applying the binary search algorithm on the following binary search tree. -



Figure 4 - The phonebook as a Binary Search Tree

In Figure 4, if we start from the root (top of the tree) and use our algorithm, we can find any value in the tree using those steps. Additionally, a value is seen as not in the tree if we get to the bottom of one path branch. 

This means that to conduct a search on a binary search tree, all we’re required to do is to implement the binary search algorithm on the binary tree

Implementing the Binary Search Algorithm on a BST - (Recursive)


We’ll build the code for our algorithm with the steps that we defined earlier.

Step 1 - Base Case

As with all recursive methods, we’ll need a base case, a condition that lets our method know when to stop and turn back. In our binary search algorithm, our base case will be when the search space is 0. The search space will be 0, when we cannot go either left or right in our tree meaning that we’ve reached the bottom. 

Step 2 - Compare the current value with the target value 

In this step, we’ll compare the value contained in the current node to our target value. If it matches we can stop our method and return the appropriate result. 

Step 3 - Choosing a direction to go 

If our code makes it to this step, it means that we need to continue searching for the target value in our tree by either going down the left subtree or the right subtree. The direction we choose to go is as follows -

  • If the target value is less than (or comes before) the current value, we go to the left subtree.
  • If the target value is greater than (or comes after) the current value, we go to the right subtree.

And that is it, we’ve simplified our binary search steps for our binary search tree into only three steps. The code extends from the previous tutorial on Binary Search Tree’s.

/**
* Method to recursively search the BST for a value.
* @param currentNode - The current node.
* @param value - The value to find.
* @return - The {@link Node} holding the value, null
* if not found. 
*/
private Node search(Node currentNode, int value) {
    if (currentNode == null) { // End of the tree, value
         return null;  // is not found.
    }
    if (currentNode.value == value) {
         return currentNode; // The current node contains 
        // the value we want so return.
    }
    if (value < currentNode.value) {
       // The value we want is less than the value in the
       // current node, so search its left subtree (child).
         return search(currentNode.leftChild, value); 
    }
    else {
         return search(currentNode.rightChild, value);
    }
}
Figure 5 - The Binary Search on a Binary Tree.

Complexity - 

The worst case time complexity for searching for an item in a binary search tree is O(logn). The reason for this is that at each stage  of the algorithm we divide the search space in half which equates to a logarithmic time algorithm. 

Summary - 

Binary search is a divide and conquer algorithm in the sense that it divides the search space in half at the end of each cycle. It is a quick searching algorithm that runs faster than a linear search even in its worst case.

Unfortunately, a binary search is only applicable on datasets where the data is ordered as the search leverages this relationship between the items to find the result. 

A binary search tree maintains this relationship through its children where the all children in the left subtree are smaller than the current and all values in the right subtree are greater than the current

Exercises - 

  1. Modify the Binary Search Tree code to create the phonebook with names rather than integers.
  2. Implement the Binary Search Algorithm using iteration. 

No comments:

Post a Comment