Sunday, September 28, 2014

Binary Search Trees

Note - The code examples used in this post are located at my GitHub account - Binary Search Trees.

A binary search tree or BST is a node based data structure that contains the following properties in each node -
  • A key or value - The item contained within the node that can be used in comparisons.
  • A pointer to the left subtree - A reference to the children in its left subtree.
  • A pointer to the right subtree - A reference to the children in its right subtree.



Figure 2 - A Binary Search Tree Node.

Each node in the tree comprises of the following properties - 
  1. All nodes in its left subtree contain key/values that are smaller than the current key/value that is stored in the node. 
  2. All nodes in its right subtree contain key/values that are larger than the current key/value that is stored in the node.
  3. If the current node is a right child of its parent, the key/value in the current node will be larger than that of its parent.
  4. If the current node is a left child of its parent, the key/value in the current node will be smaller than that of its parent. 
A tree comprised of a single node is still a binary search tree. A larger tree can be formed by joining a bunch of individual nodes together so long as they still abide by the properties that are defined above. For example, the illustration below shows a BST built with 7 individual nodes (numbered 1 through to 7). It is a BST because it still maintains the left/right child and parent relationship properties.



Figure 3 - A Binary Search Tree composed of many individual nodes.

It is called a binary search tree because we can determine if a node exists in the tree by using the steps from a binary search. We will go into more detail about how this works in a followup post.

A BST is seen as a recursive data structure because it uses the same node structure repeatedly to build the tree in addition to all nodes containing the same properties. We can leverage these relationships to allow us to perform some pretty nifty operations on these trees with a small amount of code. 

Stacks and Queues - An Introduction

Previously we've discussed about two types of containers for storage of data, arrays and linked structures. To expand a little further, we'll introduce two abstract data types that can be built from linked structures or arrays, stacks and queues.

An abstract data type (ADT) provides a definition rather than the concrete implementation. It defines what the data structure or data type can and cannot do (operations it can perform, and constraints of the structure). ADT's are generally implemented using existing data structures or data types. 

Stacks and queues are examples of ADT's. Conceptually each provide the description of the operations it can perform, and the limitations as to how they are supposed to operate. 

Stacks - 

The stack data structure functions as literally as the name suggests. Imagine at a dockyard, a crane stacking a number of shipping containers one on top of another. 


Figure 1 - Stacks of shipping containers 

Figure 1 shows a bunch of shipping containers all stacked on top of another. If we treat each column of containers as a single stack, we begin to notice that we're unable to remove the containers towards the bottom without removing the top most containers first. This is exactly how the ADT stack works.

A stack functions as a "Last in First Out" (LIFO) data structure, meaning the last item to be placed onto the stack is the first item taken out. Returning to the shipping container example, it means that we cannot remove the bottom most container until we've removed all the others above it. 

The process of adding and removing items from a stack is called push and pop.
  • Push (x, s) - Is the equivalent of adding an item onto the stack. This method definition means "to push item x onto the stack s".

  • Pop (s) - Is the equivalent of removing the top most item from the stack. This method definition means "to pop the top most item from stack s".

Push and pop are the two most basic operations that are performed on any stack. There is a third operation top (also known as peek), that is sometimes defined. 
  • Top/Peek (s) - Is the equivalent of asking "which item is at the top of the stack". It returns the requested item (if any) without removing it from the stack.


Queues - 

The queue data structure is just as literal as the definition for a stack. Imagine a single filed line of people at a bank like the illustration below.



Figure 2 - A queue of people in a line

Figure 2 illustrates exactly what a queue data structure represents. The person at the front of the line is the first one out of the line, all new people must follow the rules and line up at the end, no queue jumping or asking for favours. 

A queue functions as a "First in First Out" (FIFO) data structure. The two main operations to add and remove items from a queue are called enqueue and dequeue.

  • Enqueue (x, q) - Is the equivalent of "insert item x at the end of the queue q" or telling someone to line up at the end of the line. 

  • Dequeue(q) - Is the equivalent of "remove and return the first item from the queue q", or having the bank teller say "next person please!".

These are the two most basic operations on a queue, however like a stack, there is also a third operation that can also be implemented, front (or peek).
  • Front/Peek(q) - Is the equivalent of looking to see who's at the front of the queue, without actually removing them from the line. 

Summary - 

Stacks and queues can be implemented with both arrays and linked structures. To determine which structure is best to use is dependant on the situation.

If the we know the size of the stack or queue in advance, then we can use an array, otherwise it would be better to use a linked list since it is more dynamic when adding additional items. 

Stacks are useful to use when retrieval order of items does not really matter like in batch jobs. It is also useful for reversing a particular order of items since we can take advantage of the "Last in First Out" property of a stack.

Queues are useful when we want to control the waiting times for particular items. A queue allows us to minimise the maximum waiting time.  It is also useful if ordering of items is important.