# Problem 9.3

## Small order statistics

We showed that the worst-case number $T(n)$ of comparisons used by

`SELECT`

to select the $i$th order statistic from $n$ numbers satisfies $T(n) = \Theta(n)$, but the constant hidden by the $\Theta$-notation is rather large. When $i$ is small relative to $n$, we can implement a different procedure that uses`SELECT`

as a subroutine but makes fewer comparisons in the worst case.

- Describe an algorithm that uses $U_i(n)$ comparisons to find the $i$th smallest of $n$ elements, where $$ U_i(n) = \begin{cases} T(n) & \text{if } i \ge n/2 \\ \lfloor n/2 \rfloor + U_i(\lceil n/2 \rceil) + T(2i) & \text{otherwise} \end{cases} $$ (
Hint:Begin with $\lfloor n/2 \rfloor$ disjoint pairwise comparisons, and recurse on the set containing the smaller element from each pair.)- Show that, if $i < n/2$, then $U_i(n) = n + \O(T(2i)\lg(n/i))$.
- Show that, if $i$ is a constant less than $n/2$, then $U_i(n) = n + \O(\lg{n})$.
- Show that, if $i = n/k$ for $k \ge 2$, then $U_i(n) = n + \O(T(2n/k)\lg{k})$.

### The algorithm

This is a modified version of `SELECT`

. Not only it finds the $i$th order
statistic, but it also partitions the array, thus finding the $i-1$ smaller
elements.

- If $i \ge n/2$, we just use
`SELECT`

- Otherwise, split the array in pairs and compare each pair.
- We take the smaller elements of each pair, but keep track of the other one.
- We recursively find the first $i$ elements among the smaller elements
- The $i$th order statistic is among the pairs containing the smaller
elements we found in the previous step. We call
`SELECT`

on those $2i$ elements. That's the final answer.

Just picking the smaller element of each pair is not enough. For example, if
we're looking for the 2nd order statistic and our pairs are `1, 2`

, `3, 4`

,
`5, 6`

, `7, 8`

, `9, 10`

, the answer is in the larger part of the first pair.
That's why we need to keep track and later perform `SELECT`

on $2i$ elements.

Steps 1-4 can be implemented in place by modifying the algorithm to put the
larger elements of the pairs on the inactive side of the pivot and modifying
`PARTITION`

to swap the elements on the inactive side every time it swaps
elements on the active side. More details can be found in the Instructor's
Manual.

### The math

We can prove (b) by induction. This is the step:

$$ \begin{aligned} U_i(n) &= \lfloor n/2 \rfloor + U_i(\lceil n/2 \rceil) + T(2i) \\ &= \lfloor n/2 \rfloor + \lceil n/2 \rceil + \O(T(2i)\lg(\lfloor n/2 \rfloor / i)) + T(2i) \\ &= n + \O(T(2i)\lg(n/i)) + T(2i) \\ &= n + \O(T(2i)\lg(n/i)) \end{aligned} $$

This is a bit more sloppy that doing it with the substitution method, but that feels like grunt work to me at this point.

The other two are obvious:

$$ \begin{aligned} U_i(n) &= n + \O(T(2i)\lg(n/i)) \\ &= n + \O(\O(1)\lg(n/i)) \\ &= n + \O(\lg{n} - \lg{i}) \\ &= n + \O(\lg{n} - \O(1)) \\ &= n + \O(\lg{n}) \end{aligned} $$

$$ \begin{aligned} U_i(n) &= n + \O(T(2i)\lg(n/i)) \\ &= n + \O(T(2n/k)\lg(n/(n/k))) \\ &= n + \O(T(2n/k)\lg{k}) \\ \end{aligned} $$

Again, this reasoning is sloppy, but I don't feel like applying the substitution method.