Saturday, March 15, 2014

Algorithm Analysis (Big Oh)

The theoretical analysis of algorithms is one of the trickier topics to get the hang of when it comes to learning the fundamentals of computer science, however once you understand the basic concepts, with practice it begins to get easier.

The Basics:

The RAM model (Random Access Machine) is a form of algorithm analysis where we simplify how many steps it takes to perform particular computational tasks. For example, 

  • Conditional statements (if, switch/case) and method calls are counted as 1 step.
  • Loops (for, while) are counted by the maximum number of times they can loop.
  • Memory access is counted as 1 step.
The idea is to use a simple computation model that does not consider other factors such as language used, processor speed, memory access speed or memory space. By using this simple model, we can focus solely on the algorithm itself by calculating it’s speed by counting the number of steps it takes to step through. 

Best, Worst and Average Case Complexity:

Complexity analysis is used to understand how good or bad an algorithm is over all instances (combinations) of data that the algorithm can accept. 


Algorithms can perform better or worse depending on the order of the data fed in, as already ordered data will perform better in sorting algorithms as there is nothing to do, whereas reversed ordered data would result in poorer performance in selection sort as it is required to iterate over the data numerous times. 



Figure 1 - Best, Worst and Average Case Complexity

  • Worst Case Complexity - Is the maximum number of steps an algorithm may take on any input size of n. It is the highest curve and points in the graph.

  • Best Case Complexity - Is the minimum number of steps an algorithm may take on any input size of n. It is the lower curve and points in the graph.

  • Average Case Complexity - Is the average number of steps an algorithm may take on any input size of n. In the figure above, it is the curve between the highest and lowest points.
Analysing an algorithms worst case complexity yields the most useful results as we ignore additional complexities such as ordered input or other conditional factors that may enhance an algorithms best or average case complexity. By analysing the worst case, we know the maximum number of steps that an algorithm may take over any order of input. 

Big Oh, Big Omega and Big Theta Notation:

It is difficult to calculate exact values when analysing best, worst and average case time complexities for an algorithm. Therefore they are usually represented as numerical functions f(n).

The exact time complexity can be difficult to resolve due to various external factors. For example, a binary search runs faster on odd sized arrays (n = 2-1, where k >= 0) as it can divide the array evenly. To calculate exact complexity we’d require too much detail which is not how the RAM model is designed to work.

To keep it simple, we use “Big Oh Notation” as it removes the finer details out that do not impact the analysis of the algorithm.

Big Oh notation ignores the differences between multiplicative constants in numerical functions, which means if 2 machines run the same algorithm and the run time complexity is:

Machine 1 - f(x) = 2n

Machine 2 - g(x) = n

We consider them as identical functions in big oh analysis. This is an assumption that we make because both machines implement the same algorithm, and the difference in the functions could be attributed to the language used or processor speed. These external factors tell us nothing about the algorithm itself, therefore it is safe to ignore these constant factors.

Definitions:

In the following section please note that n0 means “that for sufficiently large n” or “for some really large input of n items (approaching infinity)”.

It might be confusing to see something like n2 = O(n3) but the trick here is to not think that = means “equals”, but rather in terms of “it is one of the functions” or “within the bounds of”.

Big Oh (O):

Is where f(n) = O(g(n)), if and only if there are positive constants nand c, such that to the right of n0, the value of f(n) always lies on or is below c * g(n). In other words, "there is some constant of c” that will always make c * g(n) larger than f(n).

Big Oh defines the upper bounds of a function, meaning that at the very worst, the function f(n) will take O(g(n)) time. 



 Figure 2 - In Big Oh, f(n) must always be below c * g(n)


Examples:

f(n) 3n2 - 100n + 6 = O(n2) because  c * n2 >  3n2 - 100n + 6 where c = 3.
f(n) 3n2 - 100n + 6 = O(n^3) because c * n3 > 3n2 - 100n + 6 where c = 0.1
f(n) 3n2 - 100n + 6 != O(n) because c * n can never be greater can 3n2 - 100n + 6 irrespective of the values of n or c. 

This means that the functions that satisfy the condition f(n) = O(g(n)), have a factor c * g(n) that grows at a faster rate than f(n).

Big Omega (Ω):

Is where f(n) = Ω (g(n)), if and only if there are positive constants nand c, such that to the right of n0, the value of f(n) always lies on or above c * g(n). In other words this means that f(n) will always be greater than c * g(n). 

Figure 3 - In Big Omega, f(n) must always be above c * g(n)

Examples:

f(n) 3n2 - 100n + 6 = Ω(n2) because 
  • c * n2 < 3n2 - 100n + 6 where c = 2
f(n) 3n2 - 100n + 6 != Ω(n3) because 
  • c * n3 < 3n2 - 100n + 6 where c = 1. (g(n) cannot be smaller than f(n) as n3 will always grow faster than n2)
f(n) 3n2 - 100n + 6 = Ω(n) because 
  • c * n2 > 3n2 - 100n + 6 where c = 1. (Any value of c will result in a smaller g(n)) 
Big Theta (Θ):

Is where f(n) = Θ(g(n)), if and only if there are positive constants nand c1 and c2, such that to the right of n0, the value of f(n) always lies between c1 * g(n) and c2 * g(n). In other words, f(n) must satisfy both O(g(n)) and Ω(g(n)).

Figure 4 - In Big Theta, f(n) must always be between c1 * g(n) and c2 * g(n)

Examples:

f(n) 3n2 - 100n + 6 = Θ(n2) because 
  • c1 * n2 > 3n2 - 100n + 6  and c2 * n2 < 3n2 - 100n + 6 where c1 = 3 and c2 = 2.
f(n) 3n2 - 100n + 6 != Θ(n3) because 
  • it cannot fulfil the lower bound condition as n3 will always grow faster than n2 no matter the value of c.
f(n) 3n2 - 100n + 6 != Θ(n) because 
  • n2 grows faster than n, therefore it cannot satisfy the upper bound condition no matter the value of c.
Growth Rates:

The growth rate of an algorithm describes the factor by which an algorithm takes to run as the input size (n) increases. en an input size of n. 

Growth rates of general algorithms are as follows (slowest to fastest growing):

1 << log(n) << n << nlog(n) << n^2 << n^3 << n!

  • Algorithm growth rates (time taken) when the size of the input (n) is:
  • Roughly the same when n = 10
  • n! algorithms are useless when n >= 20. 
  • 2n algorithms have a good operating range but become impractical once n > 40. 
  • n2 algorithms are usable up to n = 10,000 but slow significantly when that value increases past that. 
  • n and nlogn algorithms remain practical with input up to 1 billion. 
  • logn algorithms are always good.
To understand the importance of growth rates in algorithm analysis, imagine each n takes 1 second to execute, the time taken by certain growth rates (n!, 2n) quickly grow out of control.

Dominance Relations:

A dominance relation describes the class which a function f(n) belongs to. To write the dominance relation we denote it with a >>, so if we wish to say that n! functions dominate n2 functions, we write it as n! >> n2

Basically a dominance relation describes what type of functions grow faster than others. 

The following classes of functions in order of least dominant to most dominant. 
  • Constant Function (1) - The cost of adding, printing, there is no dependance on the input.
  • Logarithmic Function (logn) - Like binary search, it grows very slowly but faster than constant functions. 
  • Linear Function (n) - The cost of looking at each item in an input stream.
  • Super Linear Function (nlogn) - Quick sort or merge sort, grows slightly faster than linear but still quick.
  • Quadratic Functions (n2) - Insertion, selection sorts. The cost of nested loops or looking at pairs of items.
  • Cubic Functions (n3) - Enumerating through all triples of items in an n element universe. 
  • Exponential Functions (cn
  • Factorial Functions (n!) - Generating permutations or ordering of n items, TSP and optimisation algorithms.
When we're working with dominance relations within functions, we can eliminate the lesser dominant functions from f(n) as they will become less significant as n -> ∞.

Working with Big Oh

Adding Functions: 

We can add functions together to determine the most dominant relation (largest/grows fastest)
  • O(f(n)) + O(g(n)) = O(max(f(n), g(n)) 
  • Ω(f(n)) + Ω(g(n)) = Ω(max(f(n), g(n)) 
  • Θ(f(n)) + Θ(g(n)) = Θ(max(f(n), g(n)
When adding functions together, the run time is whichever of the two functions we are adding together that grows the fastest that is the most dominant. 

Multiplying Functions: The process of appending the function together 
  • O(f(n)) x O(g(n)) = O(f(n) x (g))
  • Ω(f(n)) x Ω(g(n)) = Ω(f(n) x g(n)) 
  • Θ(f(n)) x Θ(g(n)) = Θ(f(n) x g(n))
A good rule of thumb for Big Oh is that the worst case running time usually follows from the largest number of times each nested loop can iterate. In worst case analysis we can ignore early termination conditions since we’re concerned about the upper bounds, what we want to know is that if the algorithm ran completely on the worst type of input that would yield the worst case running time. 

Logarithms:

Are inverse exponential functions and grow slowly.
  • bx = y is equivalent to x = logb(y)
  • blogby = y
Binary search is the process of dividing the search space by half each time, meaning that the number of times a search is required to run is also halved each time. This is equal to O(log2n) run time. 

We can multiply logarithms together with the following rules:
  • loga(xy) = loga(x) + loga(y) - (Multiplying logarithms is the same as the sum of logs)
  • loga(nb) = b * loga(n)

Properties of Logarithms:

Base 2 - (log2n) - Binary logarithm, used whenever we repeatedly halve something or double like in binary search. (Calculator (lg(x)))
Base e - log(e) - Natural log where e = 2.71828…. The inverse of this is exp(x) = ex. (Calculator (ln(x)))
Base 10 - log(x) - Common logarithm. (Calculator Log)

logab = logcb/logca - converting the base of logan to base c involves dividing by logca.

We generally ignore the base when conducting Big Oh analysis. 

Binary Trees: 

A binary tree is a node with at most 2 leaves. At a height of 1, it can have 2 leaves, at a height of 2 it can have 4 leaves. 

How many leaves can a binary tree of h have?
  • Leaves double as the heigh increases by 1
  • n leaves = 2h where h = height. 
    • This means that height = log2n where 2 is the number of leaves
    • More generally if d = number of leaves per node, then n = dh and h = logdn.
This is worked out by using the logarithm transpositions above. We can use this to determine the height and number of leaves of n-ary trees.

Summary:

Whilst it can be initially confusing to see what Big Oh analysis does, it can be quite useful for comparing algorithms and determining which one would be better to use in particular instances. 

Dominance relations can assist in deciding which algorithm to use as an algorithm that runs in linear time (n) is much faster than an algorithm that runs in factorial time (n!) for large data sets. 

Even though some functions will dominate others, we still need to consider the size of the input data when choosing the correct algorithm. A very complex algorithm that runs in linear time (n) will run just as quickly as a potentially much simpler factorial algorithm (n!), if the size of the data is small, so there is some consideration to not over engineer algorithms. 

Questions: 

1. Do the following functions satisfy their complexity? (Yes/No)
     a. Is 3n = O(2n)
     b. Is 3n Ω(2n)
     c. Is n2 + 3n + 4 = Θ(6n + 7)

2. For the following statements explain
     a. If an algorithm takes O(n2) worst case time, is it possible that it takes O(n) on some input?
     b. If an algorithm takes O(n2) worst case time, is it possible that it takes O(n) on all inputs?




1 comment: