Problem 9.3

Small order statistics

We showed that the worst-case number $T(n)$ of comparisons used by SELECT to select the $i$th order statistic from $n$ numbers satisfies $T(n) = \Theta(n)$, but the constant hidden by the $\Theta$-notation is rather large. When $i$ is small relative to $n$, we can implement a different procedure that uses SELECT as a subroutine but makes fewer comparisons in the worst case.

  1. Describe an algorithm that uses $U_i(n)$ comparisons to find the $i$th smallest of $n$ elements, where $$ U_i(n) = \begin{cases} T(n) & \text{if } i \ge n/2 \\ \lfloor n/2 \rfloor + U_i(\lceil n/2 \rceil) + T(2i) & \text{otherwise} \end{cases} $$ (Hint: Begin with $\lfloor n/2 \rfloor$ disjoint pairwise comparisons, and recurse on the set containing the smaller element from each pair.)
  2. Show that, if $i < n/2$, then $U_i(n) = n + \O(T(2i)\lg(n/i))$.
  3. Show that, if $i$ is a constant less than $n/2$, then $U_i(n) = n + \O(\lg{n})$.
  4. Show that, if $i = n/k$ for $k \ge 2$, then $U_i(n) = n + \O(T(2n/k)\lg{k})$.

The algorithm

This is a modified version of SELECT. Not only it finds the $i$th order statistic, but it also partitions the array, thus finding the $i-1$ smaller elements.

  1. If $i \ge n/2$, we just use SELECT
  2. Otherwise, split the array in pairs and compare each pair.
  3. We take the smaller elements of each pair, but keep track of the other one.
  4. We recursively find the first $i$ elements among the smaller elements
  5. The $i$th order statistic is among the pairs containing the smaller elements we found in the previous step. We call SELECT on those $2i$ elements. That's the final answer.

Just picking the smaller element of each pair is not enough. For example, if we're looking for the 2nd order statistic and our pairs are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, the answer is in the larger part of the first pair. That's why we need to keep track and later perform SELECT on $2i$ elements.

Steps 1-4 can be implemented in place by modifying the algorithm to put the larger elements of the pairs on the inactive side of the pivot and modifying PARTITION to swap the elements on the inactive side every time it swaps elements on the active side. More details can be found in the Instructor's Manual.

The math

We can prove (b) by induction. This is the step:

$$ \begin{aligned} U_i(n) &= \lfloor n/2 \rfloor + U_i(\lceil n/2 \rceil) + T(2i) \\ &= \lfloor n/2 \rfloor + \lceil n/2 \rceil + \O(T(2i)\lg(\lfloor n/2 \rfloor / i)) + T(2i) \\ &= n + \O(T(2i)\lg(n/i)) + T(2i) \\ &= n + \O(T(2i)\lg(n/i)) \end{aligned} $$

This is a bit more sloppy that doing it with the substitution method, but that feels like grunt work to me at this point.

The other two are obvious:

$$ \begin{aligned} U_i(n) &= n + \O(T(2i)\lg(n/i)) \\ &= n + \O(\O(1)\lg(n/i)) \\ &= n + \O(\lg{n} - \lg{i}) \\ &= n + \O(\lg{n} - \O(1)) \\ &= n + \O(\lg{n}) \end{aligned} $$

$$ \begin{aligned} U_i(n) &= n + \O(T(2i)\lg(n/i)) \\ &= n + \O(T(2n/k)\lg(n/(n/k))) \\ &= n + \O(T(2n/k)\lg{k}) \\ \end{aligned} $$

Again, this reasoning is sloppy, but I don't feel like applying the substitution method.