Introduction
Suppose I handed you a dictionary and asked you to find the word "chocolate."
Chances are, you'd flip open the book and whittle your way through until you've reached the "c's" ... "ch's" ... "cho's" ... until finally "chocolate" has appeared.
Now, unfortunately, designing an algorithm that mimics human intelligence is beyond most of our programming capabilities; however, algorithms like binary search allow us to comb through sets of data quickly and efficiently in order to pluck specific data.
Let's break down binary search, a close cousin to merge sort in a way, uncovering everything from its pros and cons and an implementation within C, to a calculation of its time complexity.
So... Why Binary Search?
While at first binary search may seem like a one-size-fits-all type of searching algorithm for its relatively high speed, it also comes with certain drawbacks.
Let's enumerate some of the pros and cons of this searching method.
Advantages:
Faster and more efficient than other searching algorithms
especially for larger datasets
Disadvantages:
Can only be performed on pre-sorted arrays
when the algorithm slices an array in half, the left half is assumed to contain smaller numbers than the right half
Must be performed on data structures that store data in contiguous memory locations (like arrays)
relies on indexing
would not work on linked lists, as to find the middle element would require traversing the entire list (inefficient)
Elements must be comparable in some way
Binary Search in C
Before we jump into a formal implementation of binary search in C, let's try to understand this algorithm on a higher level:
If the array has one or more elements, continue. Else, exit
If the middle element of the array equals target, return the index of that element
If target is less than the middle element of the array, perform binary search on the left half of the array
Else (if target is greater than the middle element), perform binary search on the right half of the array
We can simulate this algorithm using the sorted list [ 3, 5, 8, 9]. Assume we're searching for the number 9.
At the third and final step, 3--the index of 9--is returned.
Now, onto the actual code...
Note that the comments (denoted with //) can be helpful in understanding what each part of the program does!
#include <stdio.h>
// Recursive function that returns the location of target in the provided array
int binarySearch(int arr[], int low, int high, int target)
{
// Checks that the length of the array is at least one
// If not, returns -1
if (high >= low) {
// Calculates the central index of the array
int mid = low + (high - low) / 2;
// Return the desired index if target is in the middle
if (arr[mid] == target)
return mid;
// If target is smaller than the middle element, then perform
// binary search on the left half of the array
if (arr[mid] > target)
return binarySearch(arr, low, mid - 1, target);
// Else, target can only by present in right half of the array
// performs binary search on the right half
return binarySearch(arr, mid + 1, high, target);
}
// Returns -1 if the element is not present in the array
return -1;
}
int main()
{
int arr[] = { 2, 3, 4, 10, 40 };
int n = sizeof(arr) / sizeof(arr[0]);
int target = 10;
int result = binarySearch(arr, 0, n - 1, target);
return 0;
}
Calculating Time Complexity
Computer scientists measure algorithmic efficiency by calculating its time complexity, written using Big-O Notation (worst case scenario) and Big-Omega Notation (best case scenario), on which you can find more about in my CS50's Week 3 blog!
Let's let T(n) represent the time complexity of binary search, where n, the number of elements, tends to infinity...
Best Case Scenario: the desired value lies directly in the middle of the array
Intuitively, if the target is in the center of our array, only one comparison occurs:
if (arr[mid] == target)
return mid;
Thus, we can express the lower bound of binary search as Ω(1).
Worst Case Scenario: the desired value is not present in the array
T(n) = C + T(n/2), where T(n/2) is the time complexity of either the left or right half of the array and C is some constant representing the checks done through conditional statements (see binarySearch() function)
T(n/2) = C + T(n/4)
T(n/4) = C + T(n/8)
T(n) = C + C + T(n/4) → T(n) = C + C + C + T(n/8) → T(n) = 3C + T(n/2³)
T(1) = 1, as we know that if the array size is one, the only element at the middle can only be the desired value (and thus, one check is performed)
T(n) = (x)C + T(n/2ˣ), equation 5 rewritten in terms of some value x
n/2ˣ = 1 → x = log₂(n), finding a value for x at which T(n/2ˣ) = 1
T(n) = C(log₂(n)) + T(n/n) → T(n) = C(log₂(n)) + 1
Now, because the most prominent term is log₂(n), we can express this as Ο(log₂(n)) or simply Ο(log(n)).
Final Thoughts
Overall, binary search is definitely one of the more applicable searching algorithms, especially when you have an efficient way to sort your dataset.
If you're interested in mathematical proofs for the time complexity of different sorting algorithms, consider taking a look at my previous blogs on merge, insertion, and selection sort.
That's all for binary search!
Comments