Pages

Sunday, February 7, 2016

A Simple Heapsort and Heap data structure (Priority Queues)

To understand Heapsort we need to spend a little bit of time understanding the Heap data structure. So lets try to implement a simplified version Heap and later build a Heap Sort based on it

Priority queues are often implemented over the Heap data structure i.e Heap data structure has the property that it serves the next element which is of the highest priority.

Note: The code below is not optimized/clean, it is written for easier understanding and verbose in nature to help the reader grasp the algorithm and the data structure.

Heap data structure : 

Heap is a specialize binary tree data structure that satisfies the following properties
  1. Heap is a complete binary tree : A binary tree is said to be complete, if all its levels are filled except for the deepest level.  i.e deepest level of the tree may not be complete since it does not have enough nodes in the tree, say we have only 2 node in a tree, in this case the root node itself will have only one child while the other child is empty.
  2.  Heap Property : Any parent node in the tree is larger (or can be configured to be smaller) than its children's - Hence the root node has the largest (or smallest) of all the elements in the tree
    1. If largest element is stored in the root node it is called max heap
    2. If smallest element is stored in the root node it is called a min heap

Heap Implementation :

Usually Heap is implemented over arrays. In such a case here are additional properties of the heap, Given any node i  i.e where i is the index of a node in the array
  1. Parent of node i is calculated using the formula  = (i - 1) / 2
  2. Left child of node i is calculated using the formula  =  2 * i + 1
  3. Right child of node i is calculated using the formula  = 2 * i + 2
So lets implement this first as methods so that its easier to use in our code later
    private int getLeftChildIndex(int nodeIndex) {
        return 2 * nodeIndex + 1;
    }
    private int getRightChildIndex(int nodeIndex) {
        return 2 * nodeIndex + 2;
    }
    private int getParentIndex(int nodeIndex) {
        return (nodeIndex - 1) / 2;
    }

Now let us setup the basic structure of the Java class before we get into the complicated methods. The above methods which we wrote, will be eventually folded into this class. So for the purpose of understanding lets go over, section by section of the code.

Basic Java class setup of Heap :

public class Heap { 
    private int[] data;              // We are going to store integers in our heap data structure
    private int heapSize;        // A counter which stores the current size of heap 
    public Heap(int size) {    // A consumer/user of the API create a Heap of specified size
        data = new int[size];
        heapSize = 0;
    } 
    public int getMinimum() {     // Returns the element at the root (min heap)
        if (isEmpty()) {                   // This method does NOT delete the root/minimum
            throw new HeapException("Heap is empty");
        } else {
            return data[0];
        }
    } 
    public boolean isEmpty() {   // A simple method to test if the heap is empty
        return (heapSize == 0);
    } 
}

Inserting a element into the Heap :

Inserting into the Heap can be simplified as follows
  1. Insert the new element into the end of the heap
  2. Since the inserted new element might break the heap property (i.e the parent node has the smallest or largest element compared to its children) - We need to fix the tree, so we call shiftUp() method which recursively fixes the tree by traversing up the tree
Here is an implementation of insert method
    public void insert(int value) {
        if (heapSize == data.length) {
            throw new HeapException("Heap's underlying storage is overflow");
        } else {
            heapSize++;
            data[heapSize - 1] = value; 
            shiftUp(heapSize - 1);
        }
    }

The shiftUp() operation :

Shifting up is done as follows (Note: its a recursive function)
  1. Compare the value of the given node with its parent's node value.
  2. If they do NOT satisfy the heap property swap them ( In our case of min heap, parent node value should be less than the current node's value)
  3. Continue recursively moving up the tree i.e do shiftUp() on the parent
Here is an implementation of the above explanation
    private void shiftUp(int nodeIndex) {
        int parentIndex, tmp; 
        if (nodeIndex != 0) {
            parentIndex = getParentIndex(nodeIndex); 
            if (data[parentIndex] > data[nodeIndex]) {
                tmp = data[parentIndex];                       //Swapping code
                data[parentIndex] = data[nodeIndex];
                data[nodeIndex] = tmp; 
                shiftUp(parentIndex);                           //Move up the tree
            }
        }
    } 

Deleting the min/max from Heap : 

Since we are implementing a min heap lets call it removeMin() - Removing a element from the heap can be simplified as follows
  1. Remove the root node - The root node is the first element in the array
  2. Take the last element from the array and copy it as the root node
  3. Since moving the last element as new root node might break the heap property we call another recursive function called shiftDown() - which fixes the tree
Here is an implementation for the same
    public void removeMin() {
        if (isEmpty()) {
            throw new HeapException("Heap is empty");
        } 
        data[0] = data[heapSize - 1];
        heapSize--; 
        if (heapSize > 0) {
            shiftDown(0);
        }
    }

The shiftDown() operation :

Shifting down is done as follows (Note: its a recursive function)
  1. Case 1 : If current node has no children, shifting is over
  2. Case 2 : If current node has one child, check, if heap property is broken, if yes then swap current node's value and child value, continue shifting down the child
  3. Case 3 : If current node has two children: find the smallest of them - If heap property is broken, then swap current node's value and selected child value continue shifting down the child.
Here is an implementation of the above explanation, The implementation has the below code blocks

  1. Given a node index find its left and right child
  2. Make sure that left child and right child are within the bounds 
    1. We may have created a heap of 10 elements/size, but inserted only 2 elements - So check  the index of left and right child are within the limits of the current heap size (if not they don't exist)
  3.  Given two nodes i.e left and right child, find the smallest of them and assign it to a variable (minIndex in the below code)
    1. Note : We may NOT always have 2 children because of the above statement  i.e children out of range or beyond current heap size, take care of this edge case.
  4. Swap the current node value with the minimum value we just found, only if it violates the heap property (In our min hash case current node value is greater than minimum value we swap)
  5. Recursively fix the tree by moving down the tree

    private void shiftDown(int nodeIndex) {
        int minIndex; 
        int leftChildIndex = getLeftChildIndex(nodeIndex);
        int rightChildIndex = getRightChildIndex(nodeIndex);
  
        if (rightChildIndex >= heapSize) {          // Find the minIndex i.e smallest child
            if (leftChildIndex >= heapSize) {        // if both child exists else assign left/right
                return;                                               // child as minIndex child
            } else {
                minIndex = leftChildIndex;
            }
        } else {
            if (data[leftChildIndex] <= data[rightChildIndex]) {
                minIndex = leftChildIndex;
            } else {
                minIndex = rightChildIndex;
            }
        }
     
        if (data[nodeIndex] > data[minIndex]) {
            int tmp = data[minIndex];                        //Swap code if heap property is violated
            data[minIndex] = data[nodeIndex];
            data[nodeIndex] = tmp; 
            shiftDown(minIndex);                            //Recurse, continue fixing the tree
        }
    }

Heapsort Implementation :  

 The easiest way to implement heap sort is as follows
  1. Insert ALL the elements into the Heap 
  2. Keep removing the mininum element from the heap until it becomes empty
Here is an implementation for the same
    public void doHeapSort(int array[]) {
        for (int value : array) {     //Insert all elements into the Heap
            insert(value);
        } 
        while (!isEmpty()) {        //Print the elements - which will be in sorted order
            System.out.println(getMinimum());
            removeMin();
        }
    }

Optimizations : 

Now that you have learned the data structure and the algorithm, I am sure there is plenty of resources on the web that gives you a concise/optimized implementations of Heap sort which you'll have no trouble understanding.

There are better optimized implementations available rather than inserting and deleting like we did to understand Heap sort, I'll let you probe further on this. 

Heap sorts have a time complexity of  O(n log n) 

Complete Heapsort Java Code :

You can download the full java file from here HeapSort.java

No comments:

Post a Comment