Problem 10.3

Searching a sorted compact list

Exercise 10.3-4 asked how we might maintain an $n$-element list compactly in the first $n$ positions of an array. We shall assume that the keys are distinct and that the compact list is also sorted, that is, key[i] < key[next[i]] for all $i = 1, 2, \ldots, n$ such that next[i] ≠ NIL. We will also assume that we have a variable $L$ that contains the index of the first element on the list. Under these assumptions, you will show that we can use the following randomized algorithm to search the list in $\O(\sqrt{n})$ expected time.
COMPACT-LIST-SEARCH(L, n, k)
  i = L
  while i ≠ ␀ and key[i] < k
      j = RANDOM(1, n)
      if key[i] < key[j] and key[j] ≤ k
          i = j
          if key[i] == k
              return i
      i = next[i]
  if i == ␀ or key[i] > k
      return ␀
If we ignore lines 3-7 of the procedure, we have an ordinary algorithm for searching a sorted linked list, in which index $i$ points to each position of the list in turn. The search terminates once the index $i$ "falls off" the end of the list or once key[i] ≥ k. In the latter case, if key[i] = k, clearly we have found a key with value $k$. If, however, key[i] > k, then we will never find a key with the value $k$, and so terminating the search was the right thing to do.

Lines 3-7 attempt to skip ahead to a randomly chosen position $j$. Such a skip benefits us if key[j] is larger than key[i] and no larger than $k$; in such a case, $j$ marks a position in the list that $i$ would have to reach during an ordinary list search. Because the list is compact, we know that in any choice of $j$ between $1$ and $n$ indexes some object in the list rather than a slot on the free list.

Instead of analyzing the performance of COMPACT-LIST-SEARCH directly, we shall analyze a related algorithm, COMPACT-LIST-SEARCH', which executes two separate loops. This algorithm takes an additional parameter $t$ which determines an upper bound on the number of iterations of the first loop.
COMPACT-LIST-SEARCH(L, n, k)
  i = L
  for q = 1 to t
      j = RANDOM(1, n)
      if key[i] < key[j] and key[j] ≤ k
          i = j
          if key[i] == k
              return i
  while i ≠ ␀ and key[i] < k
      i = next[i]
  if i == ␀ or key[i] > k
      return ␀
  else
      return i
To compare the execution of the algorithms COMPACT-LIST-SEARCH(L, n, k) and COMPACT-LIST-SEARCH(L, n, k, t), assume that the sequence of integers returned by the calls of RANDOM(1, n) is the same for both algorithms.

Suppose that COMPACT-LIST-SEARCH(L, n, k) takes $t$ iterations of the while loop of lines 2-8. Argue that COMPACT-LIST-SEARCH'(L, n, k, t) returns the same answer and that total number of iterations of both the for and while loops within COMPACT-LIST-SEARCH' is at least $t$.

In the call COMPACT-LIST-SEARCH'(L, n, k, t), let $X_t$ be the random variable that describes the distance in the linked list (that is, through the chain of next pointers) from position $i$ in the desired key $k$ after $t$ iterations of the for loop of lines 2-7 have occurred.

Argue that the expected running time of COMPACT-LIST-SEARCH'(L, n, k, t) is $\O(t + \E[X_t])$.

Show that $\E[X_t] \le \sum_{r=1}^n(1 - r/n)^t$. (Hint: Use equation (C.25).)

Show that $\sum_{r=0}^{n-1} r^t \le n^{t+1}/(t + 1)$.

Prove that $\E[X_t] \le n/(t+1)$.

Show that COMPACT-LIST-SEARCH'(L, n, k, t) runs in $\O(t + n/t)$ expected time.

Conclude that COMPACT-LIST-SEARCH runs in $\O(\sqrt{n})$ expected time.

Why do we assume that all keys are distinct in COMPACT-LIST-SEARCH? Argue that random skips do not necessarily help asymptotically when the list contains repeated key values.

This is a very interesting problem.

First, let's note that COMPACT-LIST-SEARCH a number of iterations, less or equal to those of COMPACT-LIST-SEARCH'. If the first version found the element on a random skip-ahead in $t$ iterations, so will the second version. If not, the last $k$ iterations only advanced the pointer until the result was found. Furthermore, none of the last $t - k$ iterations made a skip-ahead (by the definition of $k$. Since the second version does not advance inbetween skip-aheads, it has to perform $k$ additional iterations of its while loop until the result is found.

Note also, that the first version minimizes the number of iterations. That is, $t$ is picked in an optimal way.

Let's move on to the math. The expected running time of COMPACT-LIST-SEARCH' is indeed $\O(t + \E[X_t])$, since it either finds the element in $t$ skip-aheads, or it has to move forward a number of times, equal to the distance to $X_t$. Note that if the key is not present, this distance will either be the successor of that key or the last element of the array, so the analysis still holds.

Let's find the value of the expectation. The probability of having a distance at least $r$ is the probability less than $r$. The probability of having distance less than $k$ when $t = 1$ is $(n-r)/n$, thus:

$$ \Pr\{X_t \ge r\} = \bigg(\frac{n - r}{n}\bigg)^t = \bigg(1 - \frac{r}{n}\bigg)^t$$

That is, one of the RANDOM calls should advance to the desired distance, while the rest should advance to elements before it.

Using the (C.25), we get:

$$ \E[X_t] = \sum_{r=1}^{\infty} \Pr\{X_t \ge r\} = \sum_{r=1}^n \Pr\{X_t \ge r\} = \sum_{r=1}^n \bigg(1 - \frac{r}{n}\bigg)^t $$

The probability of getting distance, larger than $n$ is 0, so that's why we can bound the sum index to $n$.

We can show (d) by approximating the sum with an integral with (A.11):

$$ \sum_{r=0}^{n-1} r^t \le \int_0^n x^t dx = \frac{n^{t+1}}{t+1} $$

This lets us give an upper bound on the expectation:

$$ \begin{aligned} \E[X_t] &= \sum_{r=1}^n \bigg(1 - \frac{r}{n}\bigg)^t \\ &= \sum_{r=0}^{n-1} \bigg(\frac{r}{n}\bigg)^t \\ &= \frac{1}{n^t} \sum_{r=0}^{n-1} r^t \\ &\le \frac{1}{n^t} \cdot \frac{n^{t+1}}{t + 1} \\ &= \frac{n}{t+1} \end{aligned} $$

The expected running time of COMPACT-LIST-SEARCH'(L, n, k, t) is thus:

$$ \O(t + \E[X_t]) = \O(t + n/(t+1)) = \O(t + n/t) $$

Since COMPACT-LIST-SEARCH minimizes this running time, we need to find the minimum of $t + n/t$. The first derivative is $1 - n/t^2$ which is zero at $\sqrt{n}$ and this is a local minimum. It's also the minimum in the interval $[1,n]$.

This makes the expected running time of the first version of the algorithm $\O(\sqrt{n})$.

As for duplicates, we won't be able to conclude (c) if there are duplicates. The algorithm is able to skip ahead only if the value found by RANDOM is greater than the current. For example, if we have a list of 0s and we're looking for a 1, the algorithm will still need to iterate to the end of the list, since it will not skip-ahead at all.