Saturday, August 10, 2013

Arrays

Code examples for this entry are located here - Arrays

Arrays are probably one of the most important yet basic data structures that we use. By definition, an array is a data structure that is used to represent a group of items that can be accessed by an index. In other words, we can imagine an array as the mailboxes along a street, where the number at the front of the mailbox represents the index and the street represents the size of the array.


Figure 1. An array represented as a group of mailboxes with index starting at 0

When we create an array, we usually give it a predefined size, which determines the maximum number of items we can store within it.

In programming languages such as Java and C, array indices always start from 0 rather than 1, which might cause some confusion initially. All this means is that to access the nth item in the array, we look for it in position n - 1.

Using our mailbox example and referring to Figure 1, the number of mailboxes is the size of our array, which in the above example is 4. The numbers on the mailbox themselves represent the array indices, which in this instance begins at 0 and goes to 3. This means that to get to the 2nd mailbox, we look for it at the n - 1 position, where n = 2. Subsequently to get to the 3rd mailbox, it is at position 2 and so forth.  

Continuing with our mailbox example, if we want to add or remove items from any mailbox, we access it by the same way in which we look for it. Once we have found it, we can easily add and remove items from our mailbox. 


Arrays are very flexible data structures and can be used in a variety of different ways. A few of the advantages and disadvantages of arrays are outlined below.

Advantages:
  • Any item within the array can be accessed in constant time by its index - O(1)
  • Arrays contain only data, meaning that there is no wasted space used by links to other parts of memory.
  • Memory addresses used by arrays are successive providing fast data access.

Disadvantages:
  • Arrays are not always dynamic. In some languages, arrays are required to be created with a predefined size, therefore if we run out of space we will need to create a new array and copy the old data over.
  • Inserting and deleting an item from an arbitrary index in the array requires a shift of O(n) items on average to create or remove the space for the item, where n is the size of the array. 
  • It may be computationally costly to find continuous space in memory to store a very large array. 
  • Arrays generally come in two types, static and dynamic. Static arrays are arrays that are defined before the program is compiled, whereas dynamic arrays are created during runtime and can be deleted from memory. Although they are created during the programs runtime, dynamic arrays cannot be resized.  

Dynamic Arrays:

Although we cannot resize a dynamic array, what we can do is create an additional larger array and copy the items over and then return the larger array. The following sequence of steps outlines how we can leverage this operation to continuously add items to an array. 

1. Create a new array that is larger than the original array.
2. Copy the data from the original array over to the new array.
3. Free the memory that was taken by the old array and return the new array. 

Freeing the memory used by the old array is important because we should always free objects which we no longer wish to use. If we do not do this, each time we enlarge our array we keep more and more references to the old arrays we no longer use or can access which will eventually cause our program to run out memory.

How much is too much?
When we create this new larger array, it is important to consider how large we wish this new array to be, because it can be seen as computationally expensive to constantly have to go through the sequence of creating, copying, deleting and returning arrays. It might be logical to create an array that is large enough to never call this resizing sequence, but then we'd be wasting lots of space, however we also do not want to create a new array that is too small so that we always have to resize. 

A general rule of thumb when determining a new array size would be to double the size of the old array, from m to 2m. For n number of items in our array, at most we only perform this doubling operation log2n times.

Following the creation of the new array, we will need to copy the items from the old array into the new one. Each item is moved at most 2n times. For sufficiently large n, the total amount of work done to manage a dynamic array is O(n)

Implementation of Dynamic Arrays:

The simplest way to implement a dynamic array is to construct it by using a fixed size and dividing it into 2 parts, the first which is the actual space you want to store your items in, and the second is reserved space to allow for future expansion. By doing so we can add or remove elements from the end of the array in constant time by using that reserved space. 

Figure 2. The logical size of our array vs its capacity.

In Figure 2 above, we have an array of size 4, but we've divided it into 2 parts, the logical size and capacity. The yellow denotes the current number of items we have in the array. The logical size will overlap and point to the first available free slot in the array, which in the above example is at index 2.

Logical Size: Is the number of items that we currently have within our array.

Capacity: Is the maximum size of our array including the reserved space. If we exceed this capacity, we must resize our array.

To avoid incurring incurring costs by constantly resizing the dynamic array, we usually increase the size by a factor of 2 and use the reserved space for future expansion. When we insert an item into the array, we need to first check whether we have enough space to insert into, if we do we simply insert, however if we don't, we must perform the array resizing. 


Inserting an item to the end of a dynamic array:

A possible method to insert an item to the end of an dynamic array might look like this:


Figure 3. Method to insert an item at the end of an array

Following through the code in Figure 3, when the logical size is equal to the capacity it means that our array is full and we are required to make it bigger. We do this by entering inside the condition statement, where we increase the capacity by a factor of 2. 

We then call the method resizeArrayAndCopyItems, which resizes the array to the new capacity and copies all the items over and returns us a new array. For now we'll assume that the memory for the old array is freed within this method. 

The next step is to simply insert the item into the array index denoted by the logicalSize and then increment its value so that it points to the next empty slot in our array.

By doubling the array capacity each time it is full, we ensure that inserting n elements takes a constant O(n) time overall. 

Removing items from a dynamic array:
Just like how we can allocate additional memory when an array becomes full, we can do the opposite and de-allocate memory if our array becomes too empty. This usually occurs when we remove a large amount of items from our array and wish to save some space. This is called packing and it allows us to resize an array to exactly the size of the number of items that are currently in there. 

Of course, just like how we don't want to increase the array size all the time, we also do not want to decrease the size all the time either. The easiest method to follow is to do the opposite of what we did to increase, so when the logical size is half of the capacity we resize the array so the capacity is now equal to the logical size.

The following snippet of code is an example of how this removal is done and also to de-allocate memory when the logical size drops below half the capacity.


Figure 4. Removing an item from the end of the array. Notice the similarities to the insert method.

In the above code snippet, there is a slight issue when we combine both the adding and removing methods together. If we increase the capacity when its full by 2 and reduce the capacity by half when its half full, there will be an instance when we're in a adding and removing sequence where we'll be resizing upon each add and remove. Consider the following sequence:


Figure 5. Logical Size is 3, Capacity is 4.

In Figure 5, there are 3 items currently in the array and the capacity is 4. We perform a removal first and then check to see if the logical size is half the capacity. 

Figure 6. Logical Size is 2, Capacity is 4.

Once we've removed, our logical size is 2 and the capacity is 4, meaning that we'll need to resize.



Figure 7. Logical Size is 2, Capacity is 2.

Now that we've resized, the logical size and capacity are the same. But what if our next operation is to add an item to the array. Since our array is full, we'll need to increase the capacity size by 2 again, which means we'll also need to copy the items over.


Figure 8. Logical Size is 3, Capacity is 2.


You'll notice that we're seemingly back at the start again if we add an item. If we remove the item again, we'll have to perform another resize. Thus if we keep on adding and removing, it'll become extremely expensive to constantly have to resize. A way to get around this issue is to resize the array only when the logical size is about 1/3 of the capacity and then reduce the capacity by half. This way, even if we perform an add on the next step, we will avoid having to immediately resize the array again. It might cost a little bit of additional space but it'll save time by avoiding the constant resizing. 

Aside from packing, there is another term call trimming. Trimming allows us to trim the array to the exact space that it occupies wasting no memory. This can be used when we've modified the array and wish to leave it finalised. 

Another term called trimming allows us to allocate exactly the space for the array so we waste no memory. This can be used when we've populated it and want to leave it as is.

Inserting into a desired index in an dynamic array:

Because we can search an array by its index, it means that we can also insert items to any index in the array. The method in which to do so is similar to the insertion method mentioned above, just that we now need to perform an additional check to see that we have at least logical size + 1 space. This is because we'll need room to insert the element into its desired index i and also to shift all the elements from i to logical size - 1 over to the right.

Basically we want to clear a space to insert the item into the desired position. However if we want to insert it at the end, we don't need to shift any items. 


The algorithm to insert an item to a desired index in an array:

1. Find the index in which we want to insert the item into.
2. Shift the items from index i to logicalSize - 1, 1 position to the right.
3. Pack or trim the array if required. 




Figure 9. Code snippet to insert an item into a particular index of an array.

Removing from a desired index in an dynamic array:
The removal method works basically the same as the insertion, except that instead of removing the item specifically, setting it to an invalid value or otherwise, we can simply copy all the items 1 spot to the left meaning that the item is overwritten rather than removed, and then we just reduce the logicalSize of the array by 1. After that we can perform some packing to free up some additional memory.

The algorithm follows these steps:

1. Find the index of the element we want to remove
2. Shift elements from i to size - 1, 1 position to the left
3. Pack the array to reduce the memory size.



Figure 10. Code snippet to remove an item from a particular index of an array.


Dynamic array performance:

  • A dynamic array has performance similar to a normal array with the benefit of being able to add/remove elements. 
  • Getting or setting a value at a particular index takes constant time O(1) 
  • Iterating over the array in order takes linear time O(n)
  • Inserting or deleting an element in the middle of the array takes linear time O(n)
  • Inserting or deleting an element at the end of an array takes constant time O(1)
Comparison with Linked Lists:


  • Comparing dynamic arrays with linked lists, dynamic arrays have a faster indexing time (constant vs linear) and typically faster iteration due to locality of reference. However dynamic arrays are slower at insertion or deletion in an arbitrary location where they require linear time compared to link lists that can do it in constant time.
  • Linked lists also have the advantage of being able to be stored in fragmented memory as they do not require to be stored contiguously, as their data structure has a pointer to the next memory address location. 


Summary:

Arrays are very simply yet powerful data structures for storing information. The ability to search an array by index allows for easy addition and removal of items. Notes worth taking from this section are:
  • Constant time access to get/set an item at a particular index.
  • With languages where array indices start at 0, the nth item is located at position n - 1.
  • To efficiently enlarge a dynamic array, expand it by a factor of 2 when the logical size reaches capacity.
  • To efficiently reduce the size of a dynamic array, reduce the capacity by half only when the logical size is 1/3 of the capacity to avoid having to constantly resize when adding/removing consecutively. 
  • When inserting into a selected index, make sure that there is logical size + 1 space still available as we need to shift all items to the right of the selected index across to accommodate the insertion. 
  • When removing from a selected index, we can just shift all the items on the right of the selected index across to the left by 1 and reduce the logical size.

Exercise: 
1. In a chosen language, implement a dynamic array by using primitives (no API's). The dynamic array must be able to insert and remove items from the end of the array and resize accordingly.


2. Expand the functionality from the previous question to allow for the insertion and removal from a given index. 

No comments:

Post a Comment