Pages

Sunday, September 4, 2016

A Simple Quicksort

Let's try to understand the Quicksort algorithm, as always, our intent here is, NOT to write the most optimal implementation of Quicksort, but to dissect the algorithm and learn it, so that we will be able to remember it for a long time.

Quicksort is a very efficient sorting algorithm invented by C.A.R. Hoare, It has two phases
  1. The partition phase and
  2. The sort phase.
Most of the work is done in the partition phase, it divides the array into two subarrays with elements less than pivot in lower subarray and elements greater than pivot in upper subarray. The sort phase simply sorts the two smaller problems that are generated in the partition phase.

Don't bother, if you don't get the above description, we'll get into the details, by the end of this article, you'll be able to implement the Quicksort algorithm on your own.

The tricky part about the algorithm, is to understand the partition phase correctly. So let's spend some time first isolating the problem and implementing the partition phase first.

Quicksort Partition Phase :

The problem statement for this phase is simple, We are given an array and an index in the array. All we need to do is, order the elements in the array such that
  1. All elements less than value of the index element, is in the left side of the given index.
  2. All elements that are greater than the value of the index, is in the right side of the given index.
Say we have an array like,  array = [9, 4, 2, 6, 5, 8, 3, 11, 0] and we are given an index = 4  (Note : index starts from 0-zero ), so we have

array[index] = 5 the value, in the above array for given index 4.

Now the partition phases say's we have to move
  1. All elements, less than 5, to left side of the index(4). 
  2. All elements, greater than 5 to right side of the index(4).

Step 1 : Choose a pivot

In the above we said an index is given to us, one way to arrive at an index/pivot is as below. In process let's also initialize some more indexes/variables, we'll be using in the rest of the code.

Let's says, an array is given to us as

        int array[] = {9, 4, 2, 6, 5, 8, 3, 11, 0};
  1. Let's choose middle of the array as our partition/pivot index, see below snippet
  2. We initialize the start of the array to the variable 'lowerIndex' (initial value)
  3. We initialize end of the array to the variable 'higherIndex' (initial value)
        int pivotIndex = array.length / 2; 

        int lowerIndex = 0; 
        int higherIndex = array.length - 1; 

Here is the pictorial representation of the above array with the pivot index highlighted.

Step 2 : Find an element greater the pivot value

Lets start from beginning of array, find an element greater than 5. Do not go beyond the pivot index(4), i.e. search from index 0, to 3. So, we have the below index called 'lowerIndex' identified.

Now lowerIndex has the correct value, initially we had set it to 0 the beginning of the array - It happened by coincidence that element at 0 is greater than pivot index in this example - However it need not be the case always.



Let's implement the above step 2 as a function

    /*
     * Starts the search from startIndex, increments start index until it finds
     * an element greater than element at pivot
     */
    private int findElementHigherThanPivot(int array[], int startIndex, int pivotValue) {
        while (array[startIndex] < pivotValue) {
            startIndex++;
        }
        return startIndex;
    }

Step 3 : Find an element lesser the pivot value

Let's do the similar stuff for endIndex, start the search from end of the array, find an element lesser than 5,  Do not go beyond the pivot index(4), i.e. search from index 8, to 5



Let's implement the above step 3 as a function

    /*
     * Starts the search from endIndex, decrements end index until it finds
     * an element less than element at pivot
     */
    private int findElementLowerThanPivot(int array[], int endIndex, int pivotValue) {
        while (array[endIndex] > pivotValue) {
            endIndex--;
        }
        return endIndex;
    }

Step 4 : Swap conditionally

Now we have identified 2 indexes (Note : We are dealing with indexes in the below code and NOT the value in the indexes) 
  1. An element greater than the pivot index (from step 2 - called lowerIndex = 0)
  2. An element lesser than the pivot index (from step 3 - called higherIndex = 8)
We swap only if lowerIndex is less than the higherIndex, a safeguard or an exit which tell us when to stop doing the above processes. (0 < 8 - Hence we swap)



After swap, we increment the lowerIndex by one and decrement higherIndex value by one (again these are initial values for next iteration like we did in step 1)

Here is the code snippet that does this step.

   //The below code uses 'i' as lowerIndex and 'j' as higherIndex
   //Find the element that is higher than pivot from lowerIndex

   int i = findElementHigherThanPivot(array, i, pivotValue);
   //Find the element that is lower than pivot from higherIndex

   int j = findElementLowerThanPivot(array, j, pivotValue);

   // Simple swap code
   // if the lowerIndex value is less than higherIndex value, swap

   if (i <= j) {

      int tmp = array[i];
      array[i] = array[j];
      array[j] = tmp;

      i++;
      j--;
   }

Step 5 : Repeat until we have covered all the elements of the array

Continue step 2, 3, & 4 until all the elements less than 5 is in lower part of the array and all elements greater than 5 are in upper part of the array.



The pseudo code for this step doing in the repeated fashion is as below. Again the below code uses 'i' as lowerIndex and 'j' as higherIndex

int i = 0; //lowerIndex
int j = array.length - 1; //higherIndex
while (i <= j) { //Do step2 i = findElementHigherThanPivot(array, i, pivotValue);
//Do step3 j = findElementLowerThanPivot(array, j, pivotValue); //Do step4 - Swap code if (i <= j) {
//swap goes here } //End if } //End while



Here is the final state of the array - We have no more elements to which satisfy our step4 condition, so the loop just finds NO more elements to swap - so, it completes after iterating without any changes to array.


Quicksort Partition Code:

Having covered all the sections - Here is the complete sample partition code of Quicksort - Its nothing but all above snippets put together as a function and some indexes being passed as parameters to the function rather than we computing.

    /*
     * Sample Partition phase of quicksort simplified partition phase 
     * for easier understanding.
     */
    private int samplePartition(int[] array, int lowerIndex, int higherIndex) {

        int i = lowerIndex, j = higherIndex;

        // choose the middle element of the array as pivot
        int pivotIndex = (lowerIndex + higherIndex) / 2;
        int pivotValue = array[pivotIndex];

        // Printing debug information : For understanding
        System.out.println("Partitioning around index : "
                + pivotIndex + " with value : " + pivotValue);

        System.out.println("(Elements less than : " + pivotValue + ")-----then "
                + pivotValue
                + " then----(Elements greater than : " + pivotValue + ")");

        //Loop as long as the lowerIndex is less than higherIndex
        while (i <= j) {

            //Find the element that is higher than pivot from lowerIndex
            i = findElementHigherThanPivot(array, i, pivotValue);

            //Find the element that is lower than pivot from higherIndex
            j = findElementLowerThanPivot(array, j, pivotValue);

            // Simple swap code
            // if the lowerIndex value is less than higherIndex value, swap
            if (i <= j) {

                int tmp = array[i];
                array[i] = array[j];
                array[j] = tmp;

                i++;
                j--;
            }
        }

        return i;
    }

Why not return pivotIndex ?

In all the above code, we return the last value of "i" as the pivot index and NOT the pivotIndex which we calculated as initially as 

        int pivotIndex = (lowerIndex + higherIndex) / 2; 

I leave this as an exercise for you to work out the above steps say for a sample array like

        int sampleArray2[] = {7, 5, 5, 2, 1};

I am sure, if you workout the indexes on a paper you'll get the reason behind it. You can also download the code attached below and try it.

Quicksort : The Sort Phase

Having completed the partition phase, now lets look at the quicksort algorithm itself,  Here is a simplified excerpt from the wikipedia

  1. Pick an element, called pivot, from the array.
  2. Partitioning: the array (we have covered what partition does - so skipping the details)
    1. After partition we can think it as , We have 2 sub arrays, one array with smaller values than pivot and other with larger value than pivot.
  3. Recursively apply the above steps to the two sub-arrays  i.e. 
    1. Apply Quicksort for elements with smaller values and 
    2. Apply Quicksort for the sub-array of elements with greater values.

Let's implement the above algorithm - The only condition added to the code is to make sure we have a exit condition for the recursive function, rest is direct mapping of the above algorithm text to code.

    private void quicksort(int[] array, int lowerIndex, int higherIndex) {

        if (lowerIndex < higherIndex) {

            int pivot = partition(array, lowerIndex, higherIndex);

            //Sort the lower part
            quicksort(array, lowerIndex, pivot - 1);
            //Sort the upper part
            quicksort(array, pivot + 1, higherIndex);
        }
    }

Complexity of Quicksort

In quicksort, in the partition phase we pick an element called the pivot in each step and re-arrange the array in such a way that all elements less than the pivot now appear to the left of the pivot, and all elements larger than the pivot appear on the right side of the pivot.

So what is the complexity of partition phase its : O(n) - Since we need to iterate over all the elements of the array to move/swap elements as compared with the pivot.

Its is important to note that once partition is complete we say a pivot element is the right place of the array,  the position of this pivot element in the array will remain unchanged for all subsequent iterations.

Best case & Worst case

In quick sort, the best case scenario occurs when we choose an pivot element that splits the given array into 2 equal subarrays, i.e for an array of 'n' elements. The partition phase produces two subarrays of length n/2.

The worst case scenario occurs when partition is done such as one sub array is of size 1, and other subarray is of size (n - 1).

Quicksort's performance is dependent on the pivot selection in the partition phase. The most naive way to choose a pivot is to, just choose the first element in the array as the pivot. This results in worst case behavior if the data is already sorted (the first element will always be the minimum in this case). In our code, we had chosen the pivot to be the middle element of the array.

The complexity works out to :  for worst case and  O(nlogn) for best case. For the exact derivations please check this link. For more details on general complexity analysis please check out this article on how to find the complexity of the given code.

Quicksort Complete Code

Here is the complete code of the sample QuickSort. The code has 2 versions of Quicksort implemented one which is used in this article and other a cleanup and optimized version which you typically see on all over the web with comments for easier understanding.

/*
 * Copyright CodenLearn.com
 *
 * License : http://www.apache.org/licenses/LICENSE-2.0
 *
 * Quicksort is a very efficient sorting algorithm invented by C.A.R. Hoare.
 * It has two phases:
 *
 * 1. the partition phase and
 * 2. the sort phase.
 * 
 * Most of the work is done in the partition phase, it divides the array
 * into two subarrays with elements less than pivot in lower subarray
 * and elements greater than pivot in upper subarray.
 *
 * The sort phase simply sorts the two smaller problems 
 * that are generated in the partition phase.  
 *
 *  Complexity Avg case : O(nlgn) - Worst case O(n pow 2)
 */
public class QuickSort {

    /*
     * Starts the search from startIndex, increments start index until it finds
     * an element greater than element at pivot
     */
    private int findElementHigherThanPivot(int array[], int startIndex, int pivotValue) {
        while (array[startIndex] < pivotValue) {
            startIndex++;
        }
        return startIndex;
    }

    /*
     * Starts the search from endIndex, decrements end index until it finds
     * an element less than element at pivot
     */
    private int findElementLowerThanPivot(int array[], int endIndex, int pivotValue) {
        while (array[endIndex] > pivotValue) {
            endIndex--;
        }
        return endIndex;
    }

    /*
     * Sample Partition phase of quicksort simplified  partition phase 
     * for easier understanding.
     */
    private int samplePartition(int[] array, int lowerIndex, int higherIndex) {

        int i = lowerIndex, j = higherIndex;

        // choose the middle element of the array as pivot
        int pivotIndex = (lowerIndex + higherIndex) / 2;
        int pivotValue = array[pivotIndex];

        // Printing debug information : For understanding
        System.out.println("Partitioning around index : "
                + pivotIndex + " with value : " + pivotValue);

        System.out.println("(Elements less than : " + pivotValue + ")-----then "
                + pivotValue
                + " then----(Elements greater than : " + pivotValue + ")");

        //As long as the lowerIndex is less than higherIndex
        while (i <= j) {

            //Find the element that is higher than pivot from lowerIndex
            i = findElementHigherThanPivot(array, i, pivotValue);

            //Find the element that is lower than pivot from higherIndex
            j = findElementLowerThanPivot(array, j, pivotValue);

            // Simple swap code
            // if the lowerIndex value is less than higherIndex value, swap
            if (i <= j) {

                int tmp = array[i];
                array[i] = array[j];
                array[j] = tmp;

                i++;
                j--;
            }
        }

        return i;
    }

    private void sampleQuicksort(int[] array, int lowerIndex, int higherIndex) {

        if (lowerIndex < higherIndex) {

            int pivot = samplePartition(array, lowerIndex, higherIndex);

            //Sort the lower part
            sampleQuicksort(array, lowerIndex, pivot - 1);
            //Sort the upper part
            sampleQuicksort(array, pivot + 1, higherIndex);
        }
    }

    /*
     * A clean version of partition implementation - same as samplePartition().
     * Having done samplePartition(), Now it should be easier to understand
     * the below partition() method used in quicksort
     */
    private int partition(int[] array, int lowerIndex, int higherIndex) {

        int i = lowerIndex, j = higherIndex;

        int tmp;

        // choose the middle element of the array as pivot
        int pivot = array[(lowerIndex + higherIndex) / 2];

        //As long as the lowerIndex is less than higherIndex
        while (i <= j) {

            //Find the element that is higher than pivot from lowerIndex
            while (array[i] < pivot) {
                i++;
            }

            //Find the element that is lower than pivot from higherIndex
            while (array[j] > pivot) {
                j--;
            }

            // Simple swap code
            // if the lowerIndex value is less than higherIndex value, swap
            if (i <= j) {

                tmp = array[i];
                array[i] = array[j];
                array[j] = tmp;

                i++;
                j--;
            }
        }

        return i;
    }

    /*
     * Quicksort is a very efficient sorting algorithm invented by C.A.R. Hoare.
     * It has two phases:
     *
     * 1. the partition phase and
     * 2. the sort phase.
     * 
     * Most of the work is done in the partition phase, it divides the array
     * into two subarrays with elements less than pivot in lower subarray
     * and elements greater than pivot in upper subarray.
     *
     * The sort phase simply sorts the two smaller problems 
     * that are generated in the partition phase.  
     */
    private void quicksort(int[] array, int lowerIndex, int higherIndex) {

        if (lowerIndex < higherIndex) {

            int pivot = partition(array, lowerIndex, higherIndex);

            //Sort the lower part
            quicksort(array, lowerIndex, pivot - 1);
            //Sort the upper part
            quicksort(array, pivot + 1, higherIndex);
        }
    }

    /*
     * Utility function - Given an array just prints it to output/console
     */
    public static void dumpArray(int array[]) {

        StringBuilder output = new StringBuilder();
        for (int value : array) {
            output.append(value).append(",");
        }
        System.out.println("Array as string : " + output);
    }

    public static void main(String[] args) {

        QuickSort quickSort = new QuickSort();

        int sampleArray1[] = {9, 4, 2, 6, 5, 8, 3, 11, 0};
        int sampleArray2[] = {7, 5, 5, 2, 1};

        int sampleArray3[] = {9, 3, 1, 2, 1, 0, 8, 7, 6, 5};
        int sampleArray4[] = {9, 9, 5, 9, 9, 0, 0, 0, 0};

        System.out.println("\n\n****** Testcase 1 - Demonstrate Partition ****  ");
        dumpArray(sampleArray1);
        quickSort.samplePartition(sampleArray1, 0, (sampleArray1.length - 1));
        dumpArray(sampleArray1);

        System.out.println("\n\n****** Testcase 1 - Demonstrate return Index ****  ");
        dumpArray(sampleArray2);
        quickSort.samplePartition(sampleArray2, 0, (sampleArray2.length - 1));
        dumpArray(sampleArray2);

        System.out.println("\n\n****** Testcase 3 - Demonstrate QuickSort ****  ");
        dumpArray(sampleArray3);
        quickSort.sampleQuicksort(sampleArray3, 0, (sampleArray3.length - 1));
        dumpArray(sampleArray3);

        System.out.println("\n\n****** Testcase 4 - Cleanedup QuickSort ****  ");
        dumpArray(sampleArray4);
        quickSort.quicksort(sampleArray4, 0, (sampleArray4.length - 1));
        dumpArray(sampleArray4);
    }
}

1 comment: