Friday, September 1, 2023

 Comparison of Sorting Algorithms - in-depth analysis



Let's go through and compare the five most famous algorithms, that we already covered in previous posts:


At the end, we will have a good understanding of the pros and cons of each algorithm and we will also put together one cheat sheet that can be very helpful when preparing for an interview.

Properties that we will be exploring:
  • Time Complexity - Every algorithm has the best, average, and worst case of time complexity.
    The best case is usually when the input array is already sorted, so we need to perform the minimum number of operations to return the sorted array.
    The worst case is when we need to perform a maximum number of operations to sort the input array.
    Average time complexity is when we consider the best and worst cases and we calculate what should be the average of them.
  • Space Complexity - means how much extra space we need in order to perform the sorting
  • Online - This property indicates if the algorithm is online or not. If it is online, that means we don't need to provide the whole array beforehand. During the sorting part, we can keep adding the elements to the array, and the algorithm will just keep going and the algorithm will continue and not get disturbed.
  • Stable - In case there are a couple of the same elements inside the input array, the algorithm that preserves their order is called stable, and the algorithm that switches their places and does not preserve the original order, is called unstable.
  • In-place - An in-place algorithm is an algorithm that does not need extra space and produces an output in the same memory that contains the data by transforming the input 'in-place'

Comparison-based sorting algorithms

All of the above five algorithms plus heap sort are comparison-based algorithms. That means to get to the sorted order output, we need to compare the elements with each other.

It is important to note that all comparison-based sorting algorithms will take at least O(n log n) time complexity to sort the input array or list.


Let's see how these algorithms use comparison to do the sorting process:
  • Bubble sort - Compares the elements to bubble up the maximum to the end
  • Selection sort - Compares the element to find the minimum which then gets placed into the sorted part
  • Insertion sort - Compares the elements to determine the correct position of the current element in the sorted part of the array
  • Merge sort - Compares the elements of two sorted halves to merge them into the final sorted array
  • Quick sort - Compares the elements to partition the unsorted array into two different halves around the pivot
  • Heap sort - Compares the elements during the heapify process to place the elements into the correct position of the sorted array


Sorting algorithms - in-depth analysis

Let's explore the algorithms based on the properties we mentioned above:

Time Complexity

Bubble sort

Worst case O(n^2) - when the input array is sorted in reverse order. That means, we would need to swap every element with all other elements to get it to the end of the array, because the largest element is now at the beginning of the array.

Average time O(n^2) - when the array is not sorted, or partially sorted.

Best time O(n) 
- when the array is already sorted (ASC) - that means after the first iteration - we stop with traversing, because the value of the swapped will remain false, and we don't want to continue with iteration since we know we got an array that is already sorted.



Selection sort

Worst and Average time O(n^2)  - to find the minimum, we need to iterate through the entire array.

Best time O(n^2) - it is still the n^2 because, in the case of the sorted array, we still need to traverse the entire array. Just no swaps will occur.

 

Insertion sort

Worst case and Average case O(n^2) - When the array is not sorted or sorted in reverse order. In that case, a lot of comparisons and swapping will occur.


Best case O(n) - In case the array is already sorted, the insertion sort compares O(n) elements and performs no swaps, and the while inner loop never gets triggered.



Merge sort

Merge sort in all cases - worst, average, and best has a time complexity of O(n log n) as it always divides the array into two halves and takes linear time to merge two halves. No matter if the input array is sorted or unsorted, the same steps and operations will be executed.



Quick sort

Best and average case O(n log n) - The best case scenario for quicksort occurs when the pivot chosen at each step divides the array into roughly equal halves. In this case, the algorithm will make balanced partitions, leading to efficient sorting. On other occasions, the average case is still O(n log n) which makes this algorithm suitable for sorting large sets of data.

Worst case O(n^2) - When elements are sorted, reverse sorted or all elements are the same. This can cause highly unbalanced partitions.



Space Complexity

Bubble sort:
Space complexity is always O(1) - because we are not creating any data structure, hence no additional memory will be taken.

Selection sort:
Space complexity is always O(1) - because we are not creating any data structure, hence no additional memory will be taken.

Insertion sort:
Space complexity is always O(1) - we are not creating new data structures during the sorting process

Merge sort:
Space complexity is O(n) since we are creating a new array, so we need extra memory space to sort the input array. 

Quick sort:
Space complexity is always O(1) - because we don't need any extra memory space since we are not creating any new data structures.



Online and offline

All above mentioned algorithms are offline, except insertion sort.

In case we add more elements to the input array at runtime, the insertion sort will not get disturbed and it will provide the final sorted array as intended.


Stable and Unstable

Bubble sort, Insertion sort, and Merge sort are stable algorithms since they preserve the order of the duplicate elements in the array. 

Selection sort and Quick sort are unstable sorting algorithms.


In-place

Bubble sort, Selection sort, Insertion sort, and Quick sort are in-place algorithms since they don't require any additional space in memory to sort the sequence of elements.

Merge sort is the only comparison-based algorithm that requires additional memory.



How to choose the best sorting algorithm?


The choice of the best sorting algorithm depends on several factors, including the size of the input data, the order of the data, memory usage, stability, performance, etc.

Here are some tips:

  • For small input data, a simple algorithm like insertion sort can work best. However, for larger data sets, more efficient algorithms like quicksort, merge sort, or heapsort are the best choices.
  • If the data is almost sorted, insertion sort works best with O(n) time complexity. If the data is random, quicksort, merge sort, or heapsort can be better options.
  • When memory usage is an important consideration, algorithms like heapsort or quicksort are preferred over merge sort.
  • For sorting linked lists, merge sort is the optimal choice. It is relatively simple to implement and requires O(n logn) time, and O(1) extra space. However, linked lists have slow random-access performance, resulting in poor performance for algorithms such as quicksort and making others like heapsort infeasible.
  • In a parallel computing environment, merge sort is often the preferred choice due to its divide-and-conquer approach. This method divides the input equally at each stage, and each smaller sub-problem is independent of the others. This makes it easy to distribute and process data in parallel across multiple clusters.
  • Quick sort and merge sort can be relatively simple to implement, but heapsort may require a deeper understanding of binary heaps.
  • If stability is a concern, merge sort is the best choice because quicksort and heapsort are not stable sorting algorithms.
  • Quick sort can be cache-friendly due to its in-place sorting property and fewer memory accesses compared to merge sort. Heapsort can also have good cache performance due to its use of binary heaps.
  • Regardless of the order of the data, when guaranteed O(n logn) performance is required, merge sort and heapsort are the best choices for sorting. Quick sort performs exceptionally well on random data with O(n logn) average time complexity, but its performance can degrade on sorted or nearly sorted data [O(n^2) time].


Sorting algorithms cheat sheet


Here is the final cheat sheet where we put everything we covered in this post:







No comments:

Post a Comment