Although merge sort runs in $\Theta(\lg{n})$ worst-case time and insertion sort runs in $\Theta(n^2)$ worst-case time, the constant factors in insertion sort can make it faster in practice for small problem sizes on many machines. Thus, it makes sense to

coarsenthe leaves of the recursion by using insertion sort within merge sort when subproblems become sufficiently small. Consider a modification to merge sort in which $n/k$ sublists of length $k$ are sorted using insertion sort and then merged using the standard merging mechanism, where $k$ is a value to be determined.

- Show that insertion sort can sort the $n/k$ sublists, each of length $k$, in $\Theta(nk)$ worst-case time.
- Show how to merge the sublists in $\Theta(n\lg(n/k))$ worst-case time.
- Given that the modified algorithm runs in $\Theta(nk + n\lg(n/k))$ worst-case time, what is the largest value of $k$ as a function of $n$ for which the modified algorithm has the same running time as standard merge sort, in terms of $\Theta$-notation?
- How should we choose $k$ in practice?

This is simple enough. We know that sorting each list takes $ak^2 + bk + c$ for some constants $a$, $b$ and $c$. We have $n/k$ of those, thus:

$$ \frac{n}{k}(ak^2 + bk + c) = ank + bn + \frac{cn}{k} = \Theta(nk) $$

This is a bit trickier. Sorting $a$ sublists of length $k$ each takes:

$$ T(a) = \begin{cases} 0 & \text{if } a = 1, \\ 2T(a/2) + ak & \text{if } a = 2^p, \text{if } p > 0. \end{cases} $$

This makes sense, since merging one sublist is trivial and merging $a$ sublists means splitting dividing them in two groups of $a/2$ lists, merging each group recursively and then combining the results in $ak$ steps, since have two arrays, each of length $\frac{a}{2}k$.

I don't know the master theorem yet, but it seems to me that the recurrence is actually $ak\lg{a}$. Let's try to prove this via induction:

**Base**. Simple as ever:

$$ T(1) = 1k\lg1 = k \cdot 0 = 0 $$

**Step**. We assume that $T(a) = ak\lg{a}$ and we calculate $T(2a)$:

$$ \begin{align} T(2a) &= 2T(a) + 2ak = 2(T(a) + ak) = 2(ak\lg{a} + ak) = \\ &= 2ak(\lg{a} + 1) = 2ak(\lg{a} + \lg{2}) = 2ak\lg(2a) \end{align} $$

This proves it. Now if we substitue the number of sublists $n/k$ for $a$:

$$ T(n/k) = \frac{n}{k}k\lg{\frac{n}{k}} = n\lg(n/k) $$

While this is exact only when $n/k$ is a power of 2, it tells us that the overall time complexity of the merge is $\Theta(n\lg(n/k))$.

The largest value is $k = \lg{n}$. If we substitute, we get:

$$ \Theta(n\lg{n} + n\lg{\frac{n}{\lg{n}}}) = \Theta(n\lg{n}) $$

If $k = f(n) > \lg{n}$, the complexity will be $\Theta(nf(n))$, which is larger running time than merge sort.

It's constant factors, so we just figure out when insertion sort beats merge sort, exactly as we did in exercise 1.2.2, and pick that number for $k$.

I'm implemented this in C and in Python. I added selection for completeness
sake in the C version. I ran two variants, depending on whether `merge()`

allocates its arrays on the stack or on the heap (stack won't work for huge
arrays). Here are the results:

```
STACK ALLOCATION
================
merge-sort = 0.173352
mixed-insertion = 0.150485
mixed-selection = 0.165806
HEAP ALLOCATION
===============
merge-sort = 1.731111
mixed-insertion = 0.903480
mixed-selection = 1.017437
```

Here's the results I got from Python:

```
merge-sort = 2.6207s
mixed-sort = 1.4959s
```

I can safely conclude that this approach is faster.

merge-sort = 0.153748 merge-insertion = 0.064804 merge-selection = 0.069240

merge-sort = 0.1067s mixed-sort = 0.0561s

#include <stdlib.h> #include <string.h> #define INSERTION_SORT_TRESHOLD 20 #define SELECTION_SORT_TRESHOLD 15 void merge(int A[], int p, int q, int r) { int i, j, k; int n1 = q - p + 1; int n2 = r - q; #ifdef MERGE_HEAP_ALLOCATION int *L = calloc(n1, sizeof(int)); int *R = calloc(n2, sizeof(int)); #else int L[n1]; int R[n2]; #endif memcpy(L, A + p, n1 * sizeof(int)); memcpy(R, A + q + 1, n2 * sizeof(int)); for(i = 0, j = 0, k = p; k <= r; k++) { if (i == n1) { A[k] = R[j++]; } else if (j == n2) { A[k] = L[i++]; } else if (L[i] <= R[j]) { A[k] = L[i++]; } else { A[k] = R[j++]; } } #ifdef MERGE_HEAP_ALLOCATION free(L); free(R); #endif } void merge_sort(int A[], int p, int r) { if (p < r) { int q = (p + r) / 2; merge_sort(A, p, q); merge_sort(A, q + 1, r); merge(A, p, q, r); } } void insertion_sort(int A[], int p, int r) { int i, j, key; for (j = p + 1; j <= r; j++) { key = A[j]; i = j - 1; while (i >= p && A[i] > key) { A[i + 1] = A[i]; i = i - 1; } A[i + 1] = key; } } void selection_sort(int A[], int p, int r) { int min, temp; for (int i = p; i < r; i++) { min = i; for (int j = i + 1; j <= r; j++) if (A[j] < A[min]) min = j; temp = A[i]; A[i] = A[min]; A[min] = temp; } } void mixed_sort_insertion(int A[], int p, int r) { if (p >= r) return; if (r - p < INSERTION_SORT_TRESHOLD) { insertion_sort(A, p, r); } else { int q = (p + r) / 2; mixed_sort_insertion(A, p, q); mixed_sort_insertion(A, q + 1, r); merge(A, p, q, r); } } void mixed_sort_selection(int A[], int p, int r) { if (p >= r) return; if (r - p < SELECTION_SORT_TRESHOLD) { selection_sort(A, p, r); } else { int q = (p + r) / 2; mixed_sort_selection(A, p, q); mixed_sort_selection(A, q + 1, r); merge(A, p, q, r); } }

from itertools import repeat def insertion_sort(A, p, r): for j in range(p + 1, r + 1): key = A[j] i = j - 1 while i >= p and A[i] > key: A[i + 1] = A[i] i = i - 1 A[i + 1] = key def merge(A, p, q, r): n1 = q - p + 1 n2 = r - q L = list(repeat(None, n1)) R = list(repeat(None, n2)) for i in range(n1): L[i] = A[p + i] for j in range(n2): R[j] = A[q + j + 1] i = 0 j = 0 for k in range(p, r + 1): if i == n1: A[k] = R[j] j += 1 elif j == n2: A[k] = L[i] i += 1 elif L[i] <= R[j]: A[k] = L[i] i += 1 else: A[k] = R[j] j += 1 def merge_sort(A, p, r): if p < r: q = int((p + r) / 2) merge_sort(A, p, q) merge_sort(A, q + 1, r) merge(A, p, q, r) def mixed_sort(A, p, r): if p >= r: return if r - p < 20: insertion_sort(A, p, r) else: q = int((p + r) / 2) mixed_sort(A, p, q) mixed_sort(A, q + 1, r) merge(A, p, q, r)