Big O notation is an excellent tool for expressing an algorithm’s efficiency. It allows us to quantify differences, like (binary search) being much faster than (linear search).

By using Big O, you can compare your algorithm to common standards. If your algorithm is classified as slow, you have the opportunity to optimize it—to try and modify it to fall under a faster Big O category. This chapter will show a practical example of writing an algorithm, measuring its Big O, and then optimizing it for a significant speed boost.

Bubble Sort

Before tackling the practical problem, we need to introduce a new Big O category using a classic example: a sorting algorithm. Sorting algorithms solve the problem of taking an unsorted array and arranging its values in ascending order.

Bubble Sort is one of the simple sorts—it’s easy to understand, but not the most efficient. It works by repeatedly looping through the array and comparing adjacent elements, swapping them if they are out of order.

Here are the steps for Bubble Sort:

  1. Compare: Point to two consecutive (next to each other) values in the array (starting with the first two). Compare the left item with the right item.
  2. Swap: If the two items are out of order (left is greater than right), swap them. If they are in order, do nothing for this step.
  3. Move Pointers: Move the pointers one cell to the right to look at the next pair of values.
  4. Complete Pass: Repeat Steps 1 through 3 until you reach the end of the array. This completes one pass-through.
  5. Repeat Passes: Move the pointers back to the beginning and execute another full pass-through (Steps 1 through 4). You keep executing these passes until you complete an entire pass-through in which no swaps were performed. At that point, the array is fully sorted.

Bubble Sort in Action 🎬

Let’s trace the sorting of the array using Bubble Sort. The goal is to move the largest unsorted element to its correct position at the end of the unsorted section in each pass.

First Pass-Through (4 swaps occurred)

  1. Compare and . Out of order Swap. Array:
  2. Compare and . In order No swap. Array:
  3. Compare and . Out of order Swap. Array:
  4. Compare and . Out of order Swap. Array:

The highest unsorted value, 7, has “bubbled” to the end and is now in its correct, sorted position. Since swaps occurred, we continue to the next pass.

Second Pass-Through (2 swaps occurred)

We only compare up to the index before 7, as 7 is sorted:

  1. Compare and . In order No swap. Array:
  2. Compare and . Out of order Swap. Array:
  3. Compare and . Out of order Swap. Array:

The next highest unsorted value, 4, is now in its correct, sorted position. Swaps occurred, so we continue.

Third Pass-Through (1 swap occurred)

We only compare up to the index before 4, as 4 and 7 are sorted:

  1. Compare and . Out of order Swap. Array:
  2. Compare and . In order No swap.

The value 3 is now in its correct, sorted position. Swaps occurred, so we continue.

Fourth Pass-Through (0 swaps occurred)

We only compare the remaining unsorted values.

  1. Compare and . In order No swap. Array:

Since an entire pass-through occurred with no swaps, the array is completely sorted, and the algorithm stops.

Code Implementation: Bubble Sort

def bubble_sort(array):
	unsorted_until_index = len(array) - 1
	sorted = False
	while not sorted:
		sorted = True
		for i in range(unsorted_until_index):
			if array[i] > array[i+1]:
				array[i], array[i+1] = array[i+1], array[i]
				sorted = False
		unsorted_until_index -= 1
	return array

The Python function bubble_sort(array) implements this logic using a while loop for the pass-throughs and a for loop for the comparisons/swaps within each pass.

  • unsorted_until_index = len(array) - 1: This variable marks the rightmost boundary of the unsorted portion of the array. It starts at the last index.
  • sorted = False: This flag controls the main while loop; the algorithm runs as long as the array is not sorted.

Inside the while loop (which represents one pass-through):

  1. sorted = True: The code optimistically assumes the array is sorted at the start of the pass.
  2. The for loop iterates from the beginning up to the unsorted_until_index.
  3. Comparison and Swap: If array[i] > array[i+1], the elements are swapped (array[i], array[i+1] = array[i+1], array[i]), and the sorted flag is immediately set back to False.
  4. unsorted_until_index -= 1: After the pass, the rightmost element has bubbled up and is correctly positioned, so the unsorted boundary is moved one index to the left.

The while loop breaks only when a full pass completes without changing the sorted flag from True back to False, meaning the array is finished.

The Efficiency of Bubble Sort

The Bubble Sort algorithm has two main types of steps: comparisons and swaps. The algorithm is inherently inefficient because it involves nested operations: for every pass-through, the algorithm performs a series of comparisons and swaps across the data. In the worst case, the number of steps grows dramatically as the data increases.

When analyzed mathematically, the growth of steps is said to be quadratic, which is why in Big O notation, Bubble Sort is classified as having an efficiency of: . This is pronounced “Oh of N squared” and is also referred to as quadratic time.

is considered a relatively inefficient algorithm. As the data size () increases, the number of steps increases at a much faster rate than .

On the graph, the line curves sharply upward, contrasting with the simple diagonal line of the much faster (linear time) algorithm.

A Quadratic Problem

Here is a practical situation where you can replace a slow algorithm with a faster one. The problem is to create a function, has_duplicate_value(array), that checks whether an array of ratings (0–10) contains any duplicate numbers.

The Algorithm (Using Nested Loops)

A straightforward way to solve this is using nested loops:

def has_duplicate_value(array):
    for i in range(len(array)):
        for j in range(len(array)):
            if (i != j) and (array[i] == array[j]):
                return True
    return False

This function works by taking every value () and comparing it against every other value () in the array.

To find the Big O, we ask the key question: for values, how many steps will the algorithm take in the worst-case scenario?

  • The worst case is when the array has no duplicates, forcing the code to execute every possible comparison.
  • The outer loop runs times.
  • The inner loop runs times for each iteration of the outer loop.
  • Total steps = comparisons.

This algorithm is classified as . A simple way to spot this inefficiency is that algorithms with nested loops often, but not always, fall into the category.

A Linear Solution

A more efficient implementation of the has_duplicate_value function avoids nested loops by using a separate array to track encountered numbers.

def has_duplicate_value(array):
    existing_numbers = [0] * 11
    for i in range(len(array)):
        if existing_numbers[array[i]] == 1:
            return True
        else:
            existing_numbers[array[i]] = 1
    return False
  1. Tracker Array: The function creates a new, fixed-size array called existing_numbers initialized with zeroes. Since product ratings range from 0 to 10, this array has 11 slots (indices 0 through 10).
  2. Tracking by Index: The code iterates through the input array once. For each number encountered (e.g., the number 3), it looks at the corresponding index in existing_numbers (index 3).
  3. Duplicate Check: Before marking a number as seen, the code checks if the value at that index in existing_numbers is already a 1.
    • If it’s 1, it means the number was seen before (a duplicate exists), and the function immediately returns True.
    • If it’s 0, the code stores a 1 at that index to mark the number as encountered.
  4. If the single loop completes without finding a duplicate, the function returns False.

To determine the Big O efficiency, we analyze the worst-case scenario: when the input array contains no duplicates:

  • In this scenario, the algorithm must complete the entire single loop.
  • The core step (comparison/check) happens once for every element in the input array.
  • For elements in the input array, the loop runs times, resulting in significant steps.

This algorithm, therefore, has an efficiency of O(N) (linear time). This new approach provides a huge speed boost over the first nested loop implementation.

(While this solution is much faster, it uses more memory for the secondary existing_numbers array, which is a space complexity trade-off that will be discussed later.)