Problem 9.2

Weighted median

For nn distinct elements x1,x2,,xnx_1, x_2, \ldots, x_n with positive weights w1,w2,,wnw_1, w_2, \ldots, w_n such that i=1nwi=1\sum_{i=1}^n w_i = 1, the weighted (lower) median is the element xkx_k satisfying

xi<xkwi<12 \sum_{x_i < x_k} w_i < \frac{1}{2} and xi>xkwi12 \sum_{x_i > x_k} w_i \le \frac{1}{2}

For example, if the elments are 0.1,0.35,0.05,0.1,0.15,0.05,0.20.1, 0.35, 0.05, 0.1, 0.15, 0.05, 0.2 and each element equals its weight (that is, wi=xiw_i = x_i for i=1,2,,7i = 1, 2, \ldots, 7, then the median is 0.10.1, but the weighted median is 0.20.2.

  1. Argue that the median of x1,x2,,xnx_1, x_2, \ldots, x_n is the weighted median of xix_i with weights wi=1/nw_i = 1/n for i=1,2,,ni = 1, 2, \ldots, n.
  2. Show how to compute the weighted median of nn elements in O(nlgn)\O(n\lg{n}) worst-case time using sorting.
  3. Show how to compute the weighted median in Θ(n)\Theta(n) worst-case time using a linear-time median algorithm such as `SELECT` from Section 9.3.

The post-office location problem is defined as follows. We are given nn points p1,p2,,pnp_1, p_2, \ldots, p_n with associated weights w1,w2,,wnw_1, w_2, \ldots, w_n. We wish to find a point pp (not necessarily one of the input points) that minimizes the sum i=1nwid(p,pi)\sum_{i=1}^n w_i d(p,p_i), where d(a,b)d(a, b) is the distance between the points aa and bb.

  1. Argue that the weighted median is a best solution for the 1-dimensional post-office location problem, in which points are simply real numbers and the distance between points aa and bb is d(a,b)=abd(a,b) = |a - b|.
  2. Find the best solution for the 2-dimensional post-office location problem, in which the points are (x,y)(x,y) coordinate pairs and the distance between points a=(x1,y1)a = (x_1, y_1) and b=(x2,y2)b = (x_2, y_2) is the **Manhattan distance** given by d(a,b)=x1x2+y1y2d(a, b) = |x_1 - x_2| + |y_1 - y_2|.

Median and weighted median

If the weights all elements are 1/n1/n, then the sum of the weights of the elements, smaller than the median, is n121n\lfloor \frac{n - 1}{2} \rfloor \frac{1}{n} and the sum of the weights of the larger elements is n121n\lceil \frac{n - 1}{2} \rceil \frac{1}{n}. This satisfies the condition for weighted median. Furthermore, choosing a smaller or greater value will not hold in the condition.

Computing with sorting

  1. Sort the array
  2. Start walking the array from left to right, accumulating the weights of the elements encountered
  3. The first element with accumulated weight w1/2w \ge 1/2 is the weighted median

Computing in linear time

It's a very similar to SELECT, but instead of passing ii, we pass a number around which the weights should partition (initially 1/21/2). We find a good pivot in linear time and we partition around it. When we sum the weights in the lower part of the array and the weights in the upper part. If they fulfill the condition, we have our weighted median.

1-dimensional post-office location problem

I'll present an informal argument, since it is convincing enough. A more formal one can be found in the instructor's manual.

The situation is similar to exercise 9.3.8. Let's assume that we pick the weighted median as the solution and then start moving left or right. As we move away from the weighted median (in any direction), we're moving towards elements with combined weight less than 1/21/2 and away from elements wight combined weight greater than 1/21/2. Every "step" we take, we're increasing the total distance.

2-dimensional post-office location problem with Manhattan distance

The solution is finding (xm,ym)(x_m, y_m) where those are the weighted medians of the xx- and yy- coordinates.

I'm not even going to start proving this formally, since it requires mathematics above my current comfort level. Reasoning informally, by the definition of Manhattan distance, the xx coordinates and the yy coordinates are independent ­ we can rearrange the xx in any way we want, without affecting the yy coordinate of the solution and vice-versa.