In the previous chapter, you learned that the choice of data structure (like an array versus a set) affects code efficiency. In this chapter, you’ll learn that even with a chosen data structure, the algorithm you use is a second major factor in code performance.

An algorithm is simply a set of instructions for completing a specific task. Anything that involves a defined sequence of steps, like preparing cereal, is technically an algorithm. In computing, an algorithm is the set of instructions given to the computer to achieve a task.

It’s common to find two different algorithms that solve the exact same problem. As you saw with the even numbers example, one algorithm can be much faster (take fewer steps) than the other. This chapter will show an example where the speed difference is even more dramatic.

Ordered Arrays

An ordered array is almost the same as a classic array, but with one critical rule: its values must always be kept sorted (in order).

Insertion into an Ordered Array

When you insert a new value into an ordered array (e.g., inserting 75 into ), the process is much slower than inserting into a regular array because you cannot just place it at the end. You must:

  1. Search/Compare: The computer must perform comparisons by checking each existing value one by one to find the correct spot where the new value belongs. This is where the new value is immediately less than the existing value (e.g., 75 is less than 80).
  2. Shift: After the spot is found, the computer must shift the subsequent values to the right to make room.
  3. Insert: Finally, the new value is inserted.

Measuring Insertion Steps

  • In the example provided, inserting 75 took 6 steps for an array of 4 elements.
  • In terms of N (the number of elements), the insertion operation generally takes N + 2 steps (comparisons + shifts + final insertion).
  • The fewest steps occur when the new value is larger than everything else and ends up at the very end. In that case, you still need N comparisons to check every element, plus 1 step for the insertion, totaling N + 1 steps.

While insertion is less efficient for an ordered array than for a classic array, the ordered array is much faster for the searching operation.

Searching an Ordered Array

The standard method for searching any array is linear search, where the computer checks each cell one at a time from beginning to end. When using linear search, an ordered array offers a small advantage over a classic (unsorted) array:

  • Classic Array: If you search for a value that doesn’t exist (like 22 in ), you must check every single element because the value could be anywhere.
  • Ordered Array: If the value doesn’t exist (like 22 in ), you can stop the search early as soon as you find an element that is greater than the value you’re looking for (e.g., stopping at 75, since nothing after it could possibly be 22). Here’s the Python implementation of linear search on an ordered array:
    def linear_search(array, search_value):
    	for index, element in enumerate(array):
    		if element == search_value:
    			return index
    		elif element > search_value:
    			break
    	return None

While this early stopping is helpful, the worst-case scenario for linear search remains the same for both array types:

  • If the value is the last element, or if you search an ordered array for a value that is larger than every element and not present, you still check every cell.
  • For an array with N elements, linear search can take up to N steps.

Although linear search is a simple way to search, it is not the only algorithm available. The big advantage of an ordered array is that its sorted nature allows you to use a much faster searching algorithm called binary search. This new algorithm is so powerful that it will be exponentially faster than linear search.

Binary Search

Binary search is a much faster algorithm for finding a value in an ordered array. The core idea is based on the “guess a number” game, where you always pick the midpoint to eliminate half of the remaining possibilities with every guess.

Because the array is ordered, when you check a central value, you immediately know which half of the array the target value must be in (if it exists). Here’s the complete process:

  1. Start at the Middle: The computer finds the midpoint of the current search range (initially the entire array) and checks the value there.
  2. Compare and Eliminate:
    • If the target value is less than the value at the midpoint, the computer eliminates the midpoint and all elements to its right.
    • If the target value is greater than the value at the midpoint, the computer eliminates the midpoint and all elements to its left.
  3. Repeat: The process is repeated on the newly narrowed, remaining half of the array.

Say we’d like to search for the value 7 inside this ordered array:

This approach is only possible with an ordered array, as a classic unsorted array would provide no clue whether to look left or right.

Here’s an implementation of binary search in Python:

def binary_search(array, search_value):
	lower_bound = 0
	upper_bound = len(array) - 1
	while lower_bound <= upper_bound:
		midpoint = (upper_bound + lower_bound) // 2
		value_at_midpoint = array[midpoint]
		if search_value == value_at_midpoint:
			return midpoint
		elif search_value < value_at_midpoint:
			upper_bound = midpoint - 1
		elif search_value > value_at_midpoint:
			lower_bound = midpoint + 1
	return None

Here’s some details:

  • lower_bound: The starting index of the current range (initially 0).
  • upper_bound: The ending index of the current range (initially the last index).
  • The search runs inside a while loop, which continues as long as the lower_bound is less than or equal to the upper_bound (meaning there is still a possible range to check). Inside the loop:
    1. Find Midpoint: The code calculates the midpoint by averaging the lower_bound and upper_bound: midpoint = (upper_bound + lower_bound) // 2.
    2. Found: If the search_value equals the value_at_midpoint, the index is immediately returned.
    3. Narrow the Range (Lower): If the search_value is less than the value at the midpoint, the target must be in the left half. The upper_bound is updated to one index less than the midpoint (midpoint - 1), eliminating the entire right half.
    4. Narrow the Range (Higher): If the search_value is greater than the value at the midpoint, the target must be in the right half. The lower_bound is updated to one index more than the midpoint (midpoint + 1), eliminating the entire left half.

If the loop finishes because the lower_bound becomes greater than the upper_bound, it means the range has narrowed to zero, and the search value is not present in the array, so the function returns None.

Binary Search vs. Linear Search

While binary search offers little benefit for small arrays, its advantage becomes massive when dealing with large amounts of data.

Linear search is simple: the maximum number of steps it takes is equal to the size of the array (). For 100 values, it takes up to 100 steps; for one million values, it takes up to one million steps.

Binary search is dramatically more efficient because every step eliminates half of the remaining possibilities.

  • For an array of 100 values, binary search only takes a maximum of 7 steps.
  • For 10,000 elements, it takes a maximum of 13 steps.
  • For one million elements, it takes a maximum of 20 steps.

The key pattern with binary search is its unusual efficiency: every time you double the size of the ordered array, the number of steps required for the search only increases by one. This happens because the first inspection instantly cuts that doubled data in half. In contrast, if you double the array size, linear search must double its steps.

This huge performance gap is visible in graphs: the linear search line is a straight diagonal, meaning steps increase proportionally with the data. The binary search line is nearly flat, showing that the steps increase only slightly, even when the data grows exponentially.

The choice of using an ordered array (which enables binary search) involves a crucial trade-off for your application:

  • You gain much faster searching.
  • You accept slower insertion (because it still requires shifting elements, though the initial search can be upgraded to the faster binary search).

You must always analyze whether your software will perform more insertions or more searches to decide if the ordered array is the best fit.