Problem 11.2

Slot-size bound for chaining

Suppose that we have a hash table with $n$ slots with collisions resolved by chaining, and suppose that $n$ keys are inserted into the table. Each key is equally likely to be hashed in each slot. Let $M$ be the maximum number of keys in any slot after all the keys have been inserted. Your mission is to prove an $\O(\lg{n}/\lg\lg{n})$ upper bound on $\E[M]$, the expected value of $M$.

a. Argue that the probability $\mathcal{Q}_k$ that exactly $k$ keys hash to a particular slot is given by: $$ \mathcal{Q}_k = \left( \frac{1}{n} \right)^k \left( 1 - \frac{1}{n} \right)^{n-k} \binom{n}{k} $$

b. Let $P_k$ be the probability that $M = k$, that is, the probability that the slot containing the most keys contains $k$ keys. Show that $P_k \le n\mathcal{Q}_k$.

c. Use Stirling's approximation, equation (3.18), to show that $\mathcal{Q}_k < e^k / k^k$.

d. Show that there exists a constant $c > 1$ such that $\mathcal{Q}_{k_0} < 1 / n^3$ for $k_0 = c \lg{n} / \lg\lg{n}$. Conclude that $P_k < 1/n^2$ for $k \ge k_0 = c \lg{n} / \lg\lg{n}$.

e. Argue that

$$ \E[M] \le \Pr \left\{ M > \frac{c\lg{n}}{\lg\lg{n}} \right\} \cdot n + \Pr \left\{ M \le \frac{c\lg{n}}{\lg\lg{n}} \right\} \cdot \frac{c \lg{n}}{\lg\lg{n}} $$

Conclude that $\E[M] = \O(\lg{n} / \lg\lg{n})$.

a. Probability of exactly $k$ keys in a slot

That's kinda obvious by the definition. $k$ keys need to be in that slot, each having probability $1/n$, $n-k$ keys need to be in a different slot with $(n-1)/n = 1 - 1/n$ probability, and there are $\binom{n}{k}$ ways to pick the $k$ keys out that collide out of the $n$ keys in total.

b. Probability that the longest slot is $k$

We're looking for an upper bound on the probability that the longest chain is exactly $k$, that is $M = k$. This is less than the probability of $M \ge k$, which in turn is the probability that any of the chains has length $k$, which in turn is $\mathcal{Q}_k$.

If $M_i$ is the number of keys contained in the $i$th element of the table, we have:

$$ \Pr\{M = k\} \le \Pr\{\bigcup_{i=1}^n \left( M_i = k \right)\} \le \sum_{i=1}^n \Pr\{M = k\} = n \mathcal{Q}_k $$

c. Bounding $\mathcal{Q}_k$

From Stirling's approximation:

$$ n! = \sqrt{2 \pi n} \left( \frac{n}{e} \right)^n \left( 1 + \Theta \left( \frac{1}{n} \right) \right) $$

We can get:

$$ n! = \sqrt{2 \pi n} \left( \frac{n}{e} \right)^n \left( 1 + \Theta \left( \frac{1}{n} \right) \right) \ge \sqrt{2 \pi n} \left( \frac{n}{e} \right)^n > \frac{n^n}{e^n} $$

Or simply: $k! > k^k / e^k$.

$$ \begin{aligned} \mathcal{Q}_k &= \left( \frac{1}{n} \right)^k \left( 1 - \frac{1}{n} \right)^{n-k} \binom{n}{k} \\ &= \frac{1}{n^k} \cdot \frac{(n-1)^{n-k}}{n^{n-k}} \frac{n!}{k!(n-k)!} \\ &= \left( 1 - \frac{1}{n} \right)^{n-k} \cdot \frac{n \cdot (n-1) \cdots (n-k+1)}{n^k} \cdot \frac{1}{k!} \\ &\le \frac{ \overbrace{n \cdot (n-1) \cdots (n-k+1) }^\text{k times}}{ n^k } \cdot \frac{1}{k!} && \text{(product of probability)} \\ &\le \frac{ \overbrace{n \cdot n \cdots n}^\text{k times} }{n^k} \cdot \frac{1}{k!} \\ &= \frac{n^k}{n^k} \cdot \frac{1}{k!} \\ &= \frac{1}{k!} \\ &< \frac{e^k}{k^k} \end{aligned} $$

d. More bounds

I got stuck here, consulted the Instructor Manual, and discovered it also did a bunch of hand-waving. Go and consult it if you want, I'll try to summarize how I understood the approach here.

We're looking for $\frac{e^{k_0}}{k^{k_0}} < \frac{1}{n^3}$, also known as $n^3 < \frac{k^{k_0}}{e^{k_0}}$. Taking $\lg$ of each side we get:

$$ 3\lg{n} < k_0(\lg k_0 - \lg e) \\ \Updownarrow \\ 3 < \frac{ k_0(\lg k_0 - \lg e) }{ \lg n } $$

We now plug $k_0$ in to get Lovecraftian:

$$ \begin{aligned} 3 &< \frac{ c \lg n }{ \lg n \lg \lg n } \left( \lg \frac{ c \lg n }{ \lg \lg n } - \lg e \right) \\ &= \frac{ c }{ \lg \lg n } \left( \lg c + \lg \lg n - \lg \lg \lg n - \lg e \right) \\ &= c \left( \frac{\lg c}{\lg \lg n} + \frac{\lg \lg n}{\lg \lg n} - \frac{\lg \lg \lg n}{\lg \lg n} - \frac{\lg e}{\lg \lg n} \right) \\ &= c \left(1 + \frac{ \lg c - \lg e }{ \lg \lg n } - \frac{ \lg \lg n }{ \lg \lg \lg n } \right) \end{aligned} $$

Now for the hand-waving. First, we notice that picking the necessary $c$ depends on the value of $n$. Next, let's call the expression in parentheses $A$ and notice that:

$$ \lim_{n \to \infty} A = 1 $$

Hence, there is a $n_0$ for which if $n \ge n_0$ we have that $A \ge 1/2$ and therefore if we pick $c > 6$ we have $3 < cA$ when $n$ is larger than $n_0$. Next we need to figure it out for $n < n_0$. We notice that $n \ge 3$, because $\lg \lg 2 = \lg 1 = 0$ and that won't work, because we'll be dividing by $0$ in $c$.

So for $3 \le n < n_0$ we choose $\max_{3 \le n < n_{0}}\{ c : 3 < cx \}$, that is, any $c$ that is large enough to satisfy all of the cases. We then pick $c$ to be the larger of that and $6$ and we're done. We're convinced a number like that exists, although we haven't spelled it out.

Now we know that $\mathcal{Q}_k < 1 / n^3$. We can easily conclude, then, than:

$$ P_k \le n \mathcal{Q}_k < n \frac{1}{n^3} = 1/n^2 $$

e. Expectation

This is simpler:

$$ \begin{aligned} \E[M] &= \sum_{k=0}^{n} k \cdot \Pr \{ M = k \} \\ &= \sum_{k=0}^{k_0} k \cdot \Pr \{ M = k \} + \sum_{k=k_0 + 1}^{n} k \cdot \Pr \{ M = k \} \\ &\le \sum_{k=0}^{k_0} k_0 \cdot \Pr \{ M = k \} + \sum_{k=k_0 + 1}^{n} n \cdot \Pr \{ M = k \} \\ &= k_0 \cdot \sum_{k=0}^{k_0} \Pr \{ M = k \} + n \cdot \sum_{k=k_0 + 1}^{n} \Pr \{ M = k \} \\ &= k_0 \cdot \Pr\{ M \le k_0 \} + n \cdot \Pr \{ M > k_0 \} \end{aligned} $$

Which is the long expression we needed to prove.

Taking the last piece, we have:

$$ \Pr \{ M > k_0 \} = \sum_{k=k_0+1}^{n} \Pr\{ M = k \} \le \sum_{k=k_0+1}^{n} \frac{1}{n^2} \le n \frac{1}{n^2} = \frac{1}{n} $$

We know that $\Pr\{M \le k_0\} \le 1$ (probability axiom), so:

$$ \begin{aligned} \E[M] &= k_0 \cdot \Pr\{ M \le k_0 \} + n \cdot \Pr \{ M > k_0 \} \\ &\le k_0 + n \cdot \frac{1}{n} \\ &= \frac{c \lg n}{\lg \lg n} + 1 \\ &= \O(\lg n / \lg \lg n) \end{aligned} $$

How is one supposed to figure this out, I have no idea. Maybe by reading Knuth.