Academic Degrees and Sex

“251 Mercer street” I said, “it is in the village,” “I know that” said the driver “are you going to NYU?” “Yes” I said. I was going to give a lecture at Ricky and Eli’s Tuesday evening’s Geometry Seminar and came earlier to have a chance to chat with the guys. “Let me tell you something” the driver said, “I did not finish elementary school, I left in the 6th grade and started working, and everyday I come home and have sex. My friends who finished elementary school have sex 2-3 times a week, tops. And , you know,” continued the driver “those who went on to finish high school, they have sex maybe once a week, no more.” “I see,” I said. Continue reading

The Simplified Pill Algorithm


Ok so I had to take one pill every day, and fortunately the pill package was marked, it contained 14 pills with labels Sunday, Monday and so on. It started with Sunday, which was the mark for the leftmost pill. The pharmacy gave me 30 pills every month, two packages and 2 separately, which meant that taking the pills would be out of synchronization. 

However, I had it all figured out: All I need is to remember a single number between -3 and 3 for the entire four weeks. Say the starting day is Tuesday. Then I had to remember the number -2. On Tuesday I take the leftmost pill for Sunday, on Wednesday I take the pill for Monday, on Thursday I take the pill for Tuesday,  and so on. If, for example, the starting date is Thursday, then I have to remember +3. The pill for Sunday I take on Thursday, and then the pill for Monday I take on Friday etc. Easy. (I tried to write this number of the package but this didn’t work.)

I can remember several 7 digit phone numbers so, in theory, I did not see any difficulty in remembering one small integer number for an entire month. The math itself was not too challenging.  

In practice it did not work so well. Sometimes I could not remember: either I already took the pill and the magic number is 2 or I did not and it is 3. Other times, I wondered if the magic number +2 or -2.

After several months Continue reading

Sarkaria’s Proof of Tverberg’s Theorem 2


Karanbir Sarkaria

4. Sarkaria’s proof:

Tverberg’s theorem (1965): Let v_1, v_2,\dots, v_m be points in R^d, m \ge (r-1)(d+1)+1. Then there is a partition S_1,S_2,\dots, S_r of \{1,2,\dots,m\} such that \cap _{j=1}^rconv (v_i: i \in S_j) \ne \emptyset.

Proof: We can assume that m=(r-1)(d+1)+1. First suppose that the points v_1, v_2,\dots,v_m belong to the d-dimensional affine space H in V=R^{d+1} of points with the property that the sum of all coordinates is 1. Next consider another (r-1)-dimensional vector space W and r vectors in W, w_1,w_2, \dots, w_r  such that w_1+w_2+\dots +w_r = 0  is the only linear relation among the w_is. (So we can take w_1,\dots w_{r-1} as the standard basis in R^{r-1} and w_r=-w_1-w_2-\dots-w_r. )

Now we consider the tensor product V \otimes W.

Nothing to be scared of: U=V \otimes W can be regarded just as the (d+1)(r-1)-dimensional vector space of d+1  by r-1 matrices. We can define the tensor product  x \otimes y of two vectors x = (x_1,x_2,\dots,x_{d+1}) \in V and y =(y_1,y_2,\dots,y_{r-1}) \in W, as the (rank-one) matrix (x_i \cdot y_j)_{1 \le i \le d+1,1 \le j \le r-1}.

Consider in U the following sets of points:

S_1 = \{v_1 \otimes w_j: j=1,2,\dots r \}

S_2 = \{v_2 \otimes w_j: j=1,2,\dots r \}

S_i = \{v_i \otimes w_j: j=1,2,\dots r \}

S_m = \{v_m \otimes w_j: j=1,2,\dots r \}.

Note that 0 is in the convex hull of every S_i. Indeed it is the sum of the elements in S_j. (And thus 0=1/r(v_i \otimes w_1+ v_i \otimes w_2 + \dots + v_i \otimes w_r). )

By the Colorful Caratheodory Theorem we can  Continue reading

Sarkaria’s Proof of Tverberg’s Theorem 1

Helge Tverberg

Helge Tverberg

Ladies and gentlemen, this is an excellent time to tell you about the beautiful theorem of Tverberg and the startling proof of Sarkaria to Tverberg’s theorem (two parts). A good place to start is Radon’s theorem.

1. The theorems of Radon, Helly, and Caratheodory.

Radon’s theorem

Radon’s theorem: Let x_1,x_2,\dots, x_m be points in R^d, m \ge d+2. Then there is a partition S,T of \{1,2,\dots,m\} such that conv(x_i: i \in S) \cap conv(x_i: i \in T) \ne \emptyset.  (Such a partition is called a Radon partition.)

Proof: Since m>d+1 the points x_1,x_2, \dots, x_m are affinely dependent. Namely, there are coefficients, not all zero,  \lambda_1,\lambda_2,\dots,\lambda_m such that \sum _{i=1}^m \lambda_i x_i = 0 and \sum _{i=1}^m \lambda_i=0. Now we can write S=\{i:\lambda_i >0\} and T = \{i: \lambda_i \le 0\}, and note that

(*) \sum_{i \in S} \lambda_i x_i = \sum _{i \in T} (-\lambda_i) x_i,

and also \sum _{i \in S} \lambda_i = \sum_{i \in T} (-\lambda_i). This last sum is positive because not all the \lambdas are equal to zero. We call it t.

To exhibit a convex combination of x_i: i\in S which is equal to a convex combination in x_i: i \in T just divide relation (*) by t. Walla.

This trick of basing a partition on the signs of the coefficients repeats in other proofs. Take note!

Radon used his theorem to prove Helly’s theorem.

Helly’s theorem

Helly’s theorem: For a family K_1,K_2,\dots, K_n, n \ge d+1, of convex sets in R^d, if every d+1 of the sets have a point in common then all of the sets have a point in common.

Proof: It is enough to show that when n> d+1 if every n-1 of the sets have a point in common then there is a point in common to them all. Let x_k \in \cap\{K_j: 1\le j\le n, j \ne k\}. In words, x_k is a point that belongs to all the K_js except perhaps to K_k. We assumed that such a point x_k exists for every k.

Now we can apply Radon’s theorem: Consider the Radon partition of the points, namely a partition S,T of \{1,2,\dots,m\} such that conv(x_i: i \in S) \cap conv(x_i: i \in T) \ne \emptyset.  Let z \in conv(x_i: i \in S) \cap conv (x_i: i \in T). Since z belongs to the convex hull of  conv (x_i: i \in S) and every x_i for i \in S belongs to every K_j for every j not in S we obtain that z belongs to every K_j for j not in S. By the same argument z belongs to every K_j for j not in T. Since S and T are disjoint, the point z belongs to all K_js. Ahla.


The proof of Helly’s theorem from Radon’s theorem as described on the cover of a book by Steven R. Lay

Caratheodory’s theorem

Caratheodory’s theorem: For S \subset R^d, if x \in conv (S) then x \in conv (R) for some R \subset S, |R| \le d+1.

Like for Radon’s theorem, there is a simple linear algebra proof. We can assume that S is finite; we start with a presentation of x as a convex combination of vectors in S, and if |S|>d+1 we can use an affine dependency among the vectors in S to change the convex combination, forcing one coefficient to vanish.

Without further ado here is Tverberg’s theorem.

2. Tverberg’s Theorem

Tverberg Theorem (1965):Let x_1,x_2,\dots, x_m be points in R^d, m \ge (r-1)(d+1)+1. Then there is a partition S_1,S_2,\dots, S_r of \{1,2,\dots,m\} such that \cap _{j=1}^rconv (x_i: i \in S_j) \ne \emptyset.

So Tverberg’s theorem is very similar to Radon’s theorem except that we want more than two parts! Continue reading

Bad Advice; An Answer to an Old Trivia Question

  A transparency and a lecture using transparencies. (No relation to the advice.)

Our bad, worse and worst advice corner 

Bad  –  When you give a talk with transparencies or computer presentations, don’t go over the content of the transparencies but rather assume that the audience reads and digests them while you develop the matter even further and make comments about their content.

Worse - The same as above, except that the transparencies are very dense and technical and hard to understand.

Worst – The same as above except that the transparencies are handwritten in an unreadable way. 


Answer to Trivia question:

Q: What do Yehuda Agnon, Michael Ben-Or, Ehud Lehrer, Uzi Segal (whose calibration theorem was mentioned in the post on the controversy around expected utility theory), Mike Werman, myself, and quite a few others, all have in common?




Answer: We all, (and altogether more than 20 students) took part in Continue reading

Thomas Bayes and Probability

Decision Theory under Uncertainty

How can we assign probabilities in cases of uncertainty?  And what is the nature of probabilities, to start with?  And what is the rational mechanism for making a choice under uncertainty?

Thomas Bayes lived in the eighteenth century.  Bayes’ famous formula shows how to update probabilities given some new evidence. Following is an example for an application of Bayes’ rule:

Suppose that ninety percent of pedestrians cross a certain crosswalk when the light is green, and ten percent cross it when the light is red. Suppose also that the probability of being hit by a car is 0.1% for a pedestrian who crosses on a green light, but the probability of being hit by a car is 2% for a pedestrian who crosses on a red light. A pedestrian is hit by a car at this particular crossing and brought to the hospital. How likely is it that he crossed on a red light?

Well, to start with (or a priori), only ten percent of the people who cross the crosswalk cross it on a red light, but now that we are told that this person was hit by a car it makes the probability that he crossed illegally higher. But by how much? Bayes’ rule allows us to compute this (a posteriori) probability. I will not describe the mathematical formula, but I will tell you the outcome: the probability that this person crossed on a red light is 2/3.   

The Bayesian approach can be described as follows. We start by assigning probabilities to certain events of interest and, as more evidence is gathered, we update these probabilities. This approach is applied to mundane decision-making and also to the evaluation of scientific claims and theories in philosophy of science.

Bayes’ rule tells us how to update probabilities but we are left with the question of how to assign probabilities in cases of uncertainty to begin with. What is the probability of success in a medical operation? What is the chance of your team winning the next baseball game? How likely is it that war will break out in the Middle East in the next decade? How risky are your stock-market investments? Continue reading

About Conjectures: Shmuel Weinberger

The following paragraph is taken from the original “too personal for publication draft” of an article entitled ” ‘Final values’ of functors” by Shmuel Weinberger for a volume in honor of Guido Mislin’s retirement from ETH. (L’enseignement Mathematique 54(2008), 180-182.) Shmuel’s remarks about making conjectures and the different types of conjectures appear here for the first time. 


“Final Values” of FunctorsProf. G. Mislin


Shmuel Weinberger


A personal letter for, and to, Guido Mislin, a mathematician who always epitomized for me elegance, taste, and precision.


 Guido Mislin      

I have, on very rare occasions, conjectured things I really believe with evidence by analogy and the religious belief in the essential simplicity of the mathematical universe (looked at correctly)[1].  Besides beauty, such a conjecture[2] pragmatically serves as a guide through unknown landscape.  And even an unreliable guide helps to point in the right direction. In fact, an unreliable guide known to be unreliable can be useful indeed.  Sometimes one makes conjectures knowing them to be false, but feeling that their falsity is a deep phenomenon and most of the predictions made with the conjecture as guide will be true[3].  On other times, I have conjectured to lay down the gauntlet:  “See, you can’t even disprove this ridiculous idea.” [4] On yet other times, the conjectures come from daydreams:  it would be so nice if this were true[5].  And, yet on others one makes a conjecture hoping to probe the landscape that other conjectures have already illuminated[6].

The conjecture (and speculations) that I would like to present to you, Guido, is motivated by ideas I learnt from Goodwillie and Weiss, but I think it is also much in the spirit of the way you sometimes approach mathematics and it is somewhere between the last two kinds…


[1] These include the package of conjectures about homology manifolds made in [BFMW], which incidentally flatly contradict other standard conjectures like the Bing-Borsuk conjecture.

[2] Here I have in mind things like the Riemann hypothesis, indeed huge swaths of number theory; in topology and geometry, Thurston’s geometricization conjecture, Baum-Connes conjecture and Novikov conjectures, and so on are examples.

[3] Here I am thinking of the equivariant version of the Borel conjecture, or the stratified version. See [W, Chapter 13].

[4] I think this is what Lott had in mind (although he was usually careful to call it a question) — but I have less concern for my reputation, and am willing to conjecture the opposite of what I believe -when he formulated the zero-in-the-spectrum problem for all complete manifolds.  And it was disproved in [FW].  On the other hand, my dear friend and mentor Donald Newman made a conjecture of this same sort once in rational approximation, only to have it proven by V.A.Popov.

[5] This must be the case for, say, Gromov’s questions about large Riemannian manifolds [G] or the Weinberger conjecture discussed first in print in [D]

[6] The zero in the spectrum problem for universal covers of aspherical manifolds or even uniformly contractible manifolds, also discussed by Lott, is of this sort.  The conjecture of rationality of the difference of twisted APS invariants for homotopy equivalent manifolds was made after realizing that it would follow from the Borel conjecture for torsion free groups.  By now, it’s been proven three times.  [FLW][K][GHW] and [HR]. 


As further afterthoughts Shmuel writes: Continue reading

Would you decide the election if you could?

One mental experiment I am fond of asking people (usually before elections) is this:

Suppose that just a minute before the votes are counted you can change the outcome of the election (say, the identity of the winner, or even the entire distribution of ballots) according to your own preferences. Let’s assume that this act will be completely secret. Nobody else will ever know it.

Will you do it?

You can answer in this poll: 

Impagliazzo’s Multiverse

Update (July 2009): Here are links to a related post on Lipton’s blog, and a conference announcement on Russell’s possible worlds.

On the occasion of Luca’s post on his FOCS 2008 tutorial on average-case complexity here is a reminder of Russell Impagliazzo’s description of the possible universes we live in.

Being at FOCS was a splendid experience and my presentation of the paper of Ehud Friedgut, Noam Nisan and myself was well accepted. (And reported by Rahul Santhanam on “Computational Complexity“) The slides are here; Initially I felt bad for including only one out of three parts of the proof but eventually I had no time to describe any part of the proof.

Russell Impagliazzo’s multiverse – cryptography and complexity

Impaglliazzo’s paper deals with two central topics in theoretical computer science: Computational complexity studies the power of computers and cryptography studies the ability to create codes and to hide information. These two areas added beautiful new scientific insights, and the deep connections between them were described by computer scientist Shafi Goldwasser as “a match made in heaven.” The limited power of computers and the well known question “whether P=NP” are of central importance in Impaglliazzo’s classification,  as are the concepts of “one way functions” and “public keys”.


In this universe computers can quickly solve hard problems.  (In technical terms: P=NP.)  If you can quickly recognize a valid solution to a problem once the latter is presented to you,  (such problems are “in NP”), then you can apply this same method in order to actually find a solution! In this universe almost all optimization problems have a quick and automatic solution. No cryptography is possible and therefore there is no privacy.


Muhammad ibn Musa al-Khwarizmi (the term “algorithm” is called after his name, and not after All Gore as commonly believed)


In this universe Continue reading

Detrimental Noise

 “Imagine there’s no heaven, it’s easy(?) if you try,”   

John Lennon 


Disclaimer: It is a reasonable belief  (look here, and here), and an extremely reasonable working assumption (look  here) that computationally superior quantum computers can be built.   

(This post and the draft will be freely updated) I am struggling to meet the deadline of a week ago for a chapter regarding adversarial noise models for quantum error correction. (Update Nov 20: here is the draft; comments are welcomed. Update April 23: Here is the arxived paper, comments are welcome! ) My hypothetical model is called “detrimental.” (This is a reason substantial math postings are a bit slow recently; but I hope a few will come soon.) This project is quite central to my research in the last three years, and it often feels like running, over my head, after my own tail that might not be there. So this effort may well be a CDM (“career damaging move”) but I like it nevertheless. It is related to various exciting conceptual and foundational issues. 

I do have occasionally a sense of progress, (often followed by a backtrack) and for this chapter, rather than describing detrimental noise by various (counterintuitive) properties as I always did, I think I have an honest definition of detrimental noise. Let me tell you about it. (Here is a recent useful guide: a zoo of quantum algorithms, produced by Stephen Jordan.)

Detrimental noise

Consider a quantum memory with n qubits at a state \rho_0. Suppose that \rho_0 is a tensor product state. The noise affecting the memory in a short time interval can be described by a quantum operation E_0. Lets suppose that E_0 acts independently on different qubits and, for qubit i with some small probability p_iE_0 changes it state to the maximum entropy state \tau_i.

This is a very simple form of noise that can be regarded as basic to understanding the standard models of noise as well as of detrimental noise.

In the standard model of noise, E_0 describes the noise of the quantum memory regardless of the state \rho stored in the memory. This is a quite natural and indeed expected form of noise.

A detrimental noise will correspond to a scenario in which, when the quantum memory is at a state \rho and \rho= U \rho_0, the noise E will be U E_0 U^{-1}. Such noise is the effect of first applying E_0 to \rho_0 and then applying U to the outcome noiselessly.

Of course, in reality we cannot perform U instantly and noiselessly and the most we can hope for is that \rho will be the result of a process. The conjecture is that a noisy process leading to \rho will be subject to noise of the form we have just described. A weaker weaker conjecture is that detrimental noise is present in every natural noisy quantum process. I also conjecture that damaging effects of the detrimental noise cannot be canceled or healed by other components of the overall noise.When we model a noisy quantum system either by a the qubits/gates description or in other ways we make a distinction between “fresh” errors which are introduced in a single computer cycle (or infinitesimally when the evolution is described by a continuous model) and the cumulative errors along the process. The basic insight of fault tolerant quantum computing is that if the incremental errors are standard and sufficiently small then we can make sure that the cumulated errors are as well. The conjecture applies to fresh errors.

 (Updated: Nov 19; sorry guys, the blue part is over-simplified and incorrect; But an emergency quantifier replacement seemed to have helped; it seems ok now)  The definition of detrimental noise for general quantum systems that we propose is as follows:

A detrimental noise of a quantum system at a state \rho commutes with every some non-identity quantum operation which stabilizes \rho.

Note that this description,

Just like for the standard model of noise, we do not specify a single noise operation but rather gives an envelope for a family of noise operations.

In the standard model of noise the envelope \cal D_{\rho} of noise operations when the computer is at state \rho does not depend on \rho. For detrimental noise there is a systematic relation between the envelope of noise operations {\cal D}_\rho and the state \rho of the computer. Namely,

 {\cal D}_{U\rho} = U {\cal D}_\rho U^{-1}.

Why is it detrimental?

Detrimental noise leads to highly correlated errors when the state of the quantum memory is highly entangled. This is quite bad for quantum error-correction, but an even more devastating property of detrimental noise is that the notion of “expected number of qubit errors” becomes sharply different from the rate of noise as measured by fidelity or trace distance. Since conjugation by a unitary operator preserves fidelity-metric, the expected number of qubit errors increases linearly with the number of qubits for highly entangled states.

Here is another little thing from my paper that I’d like to try on you:

A riddle: Can noise remember the future?

Suppose we plan a process and carry it out up to a small amount of errors. Can there be a systematic relation between the errors at some time and the planned process at a later time? Continue reading