Question:  Suppose that we sequentially place $n$ balls into $n$ boxes by putting each ball into a randomly chosen box. It is well known that when we are done, the fullest box has with high probability $(1+o(1))\ln n/\ln \ln n$ balls in it. Suppose instead that for each ball we choose two boxes at random and place the ball into the one which is less full at the time of placement. What will be the maximum occupancy?

Answer: A beautiful theorem of Yossi Azar, Andrei Broder, Anna Karlin, and Eli Upfal asserts that with high probability, the fullest box contains only $\ln \ln n/\ln 2+O(1)$ balls—exponentially less than before.  (The descripion follows Xueliang Li’s mathscinet review.) And here is a link to the paper. Here is a related post “Balls and Bins on Graphs” on Windows on Theory.

This entry was posted in Computer Science and Optimization, Probability, Test your intuition and tagged . Bookmark the permalink.

### 12 Responses to Test Your Intuition (1)

1. Gil says:

Actually, I am not aware of a simple explanation (even a heuristic one) why we go down to log log n. I will be happy to hear such an explanation.

Also, this theorem is related to many developments in theoretical CS and probabilistic combinatorics and remarks on these are most welcomed.

2. kunal says:

There is in fact more to this! Suppose now you throw m balls in to n bins, where m is much larger than n. For one choice, the imbalance is easily seen to grow as \tilde{O}(\sqrt{m/n}).

For the two choice process, the imbalance is in fact independent of n. It stays at O(log log n) w.h.p. See this paper:

Petra Berenbrink, Artur Czumaj, Angelika Steger, and Berthold Vöcking. Balanced allocations: The heavily loaded case. In Proceedings of the 32th ACM Symposium on Theory of Computing (STOC’00), pages 745-754, 2000.

3. Gil Kalai says:

Dear Kunal, indeed this is part of a large story both in TCS and probabilistic combinatorics. I was motivated by a lecture of Michael Krivelevich on “Achlioptas processes”. In these processes you build a random graph by choosing at each round one out of r random edges in order to postpone or embrace a certain graph proprty.

Another connection that comes to mind (which perhaps fits a series of posts under the name: “difficult proofs for easy theorems”) is to find a probabilistic proof that if m is not too large compare to n, it is possible to put n balls into m boxes so that no box contains more than one ball. A simple union bound works for something like $m \ge n^2$ and using Lovasz local lemma you can get it down to $m \ge 6n$ or so. I do not know what is the world record.

Anyway, what I am most curious about is a simple conceptual explanation in a few words for the loglogn in the theorem of Azar, Broder, Karlin, and Upfal.

4. Omer says:

Ori pointed your question out and we came up with some explanation.

Suppose you place the ball in the less populated set. At time t you have $\sim t$ occupied boxes, so the rate of introducing doubly occupied boxes is $(t/n)^2$. This shows that at time $t$ you expect $\approx t^3/n^2$ boxes with two balls. The first appears at time $\approx n^{2/3}$

Repeating the argument, the rate at which boxes with three balls are created is $(t^3/n^3)^2$, so the number of such boxes is $\approx t^t/n^6$. In general, the first box with $k$ balls appears at time $n\cdot n^{-1/(2^k-1)}$. If $k=\log\log n/\log 2$ then this is of order $n$, giving the claimed asymptotics.

More formally, if $X_i(t)$ is the number of boxes with at least k balls at time t then in expectation, $X_i' = (X_{i-1}/n)^2$. If there are $M$ choices at each step the same scheme gives $\log\log n/\log M$.

5. Gil Kalai says:

Many thanks, Omer!

6. arnab says:

Thanks for this post! So, as suggested by the informal explanation given above, is it really true that taking M = log n gives a constant number of balls in the fullest bin with high probability?

7. Ori says:

aranb: taking anything asymptotically less then $\sqrt{n}$ will give you only 1 ball in the fullest bin with high probability, even if you have not choice (bin selected randomly). In the case you have a choice between 2 bins, this goes up to $n^{1/3}$.

8. Ori says:

Gil,
I thought you might be interested in the following paper (http://arxiv.org/abs/0901.4056), by Noga Alon, Eyal Lubetzky and yours truly, in which we consider the above problem under memory constraint.

9. irfan says: