Polymath 8 – a Success!

mat2

Yitang Zhang

Update (July 22, ’14). The polymath8b paper “Variants of the Selberg sieve, and bounded intervals containing many primes“, is now on the arXiv. See also this post on Terry Tao’s blog. Since the last update, we also had here at HUJI a beautiful learning seminar on small gaps between primes. James Maynard gave a series of three lectures and additional lectures were given by Zeev Rudnick and Tamar Ziegler.

 

Update (Jan 9, ’14, corrected Jan 10):  Polymath8b have just led to an impressive progress: Goldston, Pintz, and Yıldırım showed that conditioned on the  Elliott-Halberstam conjecture (EHC) there are infinitely many primes of bounded gap below 16. Maynard improved it to 12. Polymath8b have just improved it based on a generalized form of the EHC (proposed in 1986 by Bombieri, Friedlander, and  Iwaniec) further to 8.  [Further update:  6 and there are reasons so suspect that further improvement requires major breakthrough - namely getting over the "parity problem".] The unconditional bound for gaps stands now on 270.

Update: A paper by James Maynard entitled “Small gaps between primes” proved that for  every k there are infinitely many intervals of length f(k) each containing at least k primes. He also reduced the gap between infinitely many pairs of primes to 600. The method is also (said to be) much simpler. Amazing! Similar results were obtained independently by Terry Tao.

Terry Tao launched a followup polymath8b to  improve the bounds for gaps between primes based on Maynard’s results.

Update: Here is the paper: A NEW BOUND FOR GAPS BETWEEN PRIMES by D. H. J. Polymath.

Zhang’s breakthrough and Polymath8

The main objectives of the polymath8 project, initiated by Terry Tao back in June, were “to understand the recent breakthrough paper of Yitang Zhang establishing an infinite number of prime gaps bounded by a fixed constant {H}, and then to lower that value of {H} as much as possible.”

Polymath8 was a remarkable success! Within two months the best value of H that was 70,000,000 in Zhang’s proof was reduced to 5,414. Moreover, the polymath setting looked advantageous for this project, compared to traditional ways of doing mathematics.

The polymath project gave opportunity to a number of researchers to understand Zhang’s proof and the earlier breakthrough by Daniel Goldston, János Pintz, and Cem Yıldırım. It also gave an opportunity to a larger number of mathematicians to get some feeling about the involved mathematics.

The story

Twin primes are two primes p and p+2. The ancient twin prime conjecture asserts that there are infinitely many twin primes. The prime number theorem asserts that there are (asymptotically)  n/log n primes whose value is smaller than a positive integer n, and this implies that we can find arbitrary large pairs of consecutive primes  p and q such that q-p is at most (log p). Until a few years ago nothing asymptotically better was known. Goldston, Pintz, and Yıldırım (GPY), showed in 2005 that there infinitely many pairs of primes p and q such that q-p is O(\sqrt {\log n}). A crucial idea was to derive information on gaps of primes from the distribution of primes in arithmetic progressions. GPY showed that conditioned on the  Elliott-Halberstam conjecture (EHC) there are infinitely many primes of bounded gaps (going all the way to 16, depending on a certain parameter in the conjecture, but not to 2). Yitang Zhang did not prove the EHC but based on further understanding of the situation found a way to shortcut the conjecture and to prove that there are infinitely many primes of with bounded gaps unconditionally!

Here is a very nice 2007 survey article by Kannan Soundararajan on this  general area of research and the GPY breakthrough. (One thing I recently learned is that  Soundararajan is called by friends and colleagues “Sound”. ) This article starts with a very thoughtful and endearing answer to the quastion: “Why do we care at all? After all primes were meant to be multiplied, not subtracted (or added).”

Here is a short list of thoughts (things I learned, things I wish to understand better…) from following (from distance) Polymath8 and related Internet activity.

1) How information on primes in arithmetic progressions leads to information on gaps between primes?

I do not really understand why the information on primes in arithmetic progressions e.g. the Elliott-Halberstam conjecture lead to the conclusion regarding primes with bounded gaps. I would be very happy to get a feeling for it.

2) The three-primes barrier.

Already GPY  tried to extend their methods to show the existence of three primes in a bounded interval of integers. So far, it is not known how to show that intervals of the form [n,n+o(log n)] contain triples of primes infinitely often. Perhaps, to actually solve the twin prime conjecture we will need to get a breakthrough for triples of primes, but maybe not. See also this MO question asked by Noam Elkies.

Update: Here is another interesting MO question Quantitative lower bounds related to Zhang’s theorem on bounded gaps, asked by Eric Naslund. Eric asks: what can be say based on Zhang’s work about the smallest value  of a pair of primes of distance k apart?

3) Cauchy-Schwarz everywhere;

This may sound silly but the way Cauchy-Schwarz (C-S) inequality is used again and again make you wonder again why C-S is it so useful, and why it is mainly C-S which is so useful.

4) Can detailed statistical understanding of primes in sets other than AP  be useful?

In recent years there was much activity (and I also was interested) in Mobius randomness and analogs of the prime number theorem for various more complicated subsets of integers. (E.g., subsets defined by various properties of the digital expansion.) Can understanding of this kind  also be used for the prime-gaps questions?

5) Usefulness of Deligne’s work on Riemann’s hypothesis for functions fields for questions in analytic number theory.

I knew, of course that Deligne famously proved analogs of the Riemann hypothesis for function fields in great generality but I was not aware that these results have applications to “ordinary” analytic number theory. Again, this is something I would be happy to know a little more about. There is a nice recent post on the Riemann hypothesis in various settings on “What’s new”.

6) Parity problem.  (Added Nov 27) There is a difficult “parity problem” which seems to be a difficult obstacle for getting the gap to two. (And for various related goals). Terry Tao wrote about it in 2007 in this post. In polymath8b VII an attempt to cross the “parity barrier” was  made but (as people expected) it turned out that the parity barrier indeed shows up causes this attempt to fail. (Update July 14:) This is further explained in this new post over Tao’s blog.

7) (Added Nov 27) One thing I am curious about is the following. Consider a random subset of primes (taking every prime with probability p, independently, and say p=1/2). Now consider only integers involving these primes. I think that it is known that this system of “integers” satisfies (almost surely) PNT but not at all RH. We can consider the properties BV (Bombieri Vinogradov), or more generally EH(θ) and the quantities H_m. For such systems does BV typically hold? or it is rare like RH. Is Meynard’s implication applies in this generality? Nicely here we can hope even for infinite consecutive primes. Update: after thinking about it further and a little discussion over polymath8b it looks that current sieve methods, and some of the involved statements, rely very strongly on both the multiplicative and additive structure of the integers and do not allow extensions to other systems of “integers.”

 

 

Update (August 23): Before moving to small gaps, Sound’s 2007 survey briefly describes the situation for large gaps. The Cramer probabilistic heuristic suggests that there are consecutive primes in [1,n] which are c(\log n)^2 apart, but not C (\log n)^2 apart where c and C are some small and large positive constants.  It follows from the prime number theorem that there is a gap of at least \log n. And there were a few improvements in the 30s ending with a remarkable result by Rankin who showed that there is a gap as large as \log n times \log \log n \log \log \log \log n (log log log n)^{-2}. Last week Kevin Ford, Ben Green, Sergei Konyagin, and Terry Tao and independently James Maynard were able to  improve Rankin’s estimate by a function that goes to infinity with n.  See this post on “What’s new.”

 

Posted in Mathematics over the Internet, Number theory, Open problems | Tagged , , , , | 11 Comments

Analysis of Boolean Functions – Week 3

Lecture 4

In the third week we moved directly to the course’s “punchline” – the use of Fourier-Walsh expansion of Boolean functions and the use of Hypercontractivity.

Before that we  started with  a very nice discrete isoperimetric question on a graph which is very much related to the graph of the discrete cube.

Consider the graph G whose vertices are 0-1 vectors of length n with precisely r ‘1’s, and with edges corresponding to vertices of Hamming distance two. (Which is the minimal Hamming distance between distinct vertices.) Given a set A of m vertices, how small can E(A, \bar A) be? (We already talked about intersecting families of sets of constant size – the Erdos-Ko-Rado theorem and, in general, it is a nice challenge to extend some of the ideas/methods of the course to constant weight situation.)

And now for the main part of the lecture.

1) Basics harmonic analysis on the discrete cube. We considered the vector space of real functions on the discrete cube \Omega_n and defined an inner product structure. We also defined the p-th norm for 1\le p\le \infty. Next we defined the Fourier-Walsh functions and showed that they form a orthonormal basis. This now leads to the Fourier-Walsh expansion for an arbitrary real function f on the discrete cube f=\sum_{S\subset[n]}\hat f(S)W_S, and we could easily verify Parseval formula.

2) Influence and Fourier. If f is a real function on \Omega_n and f=\sum \hat f(S)W_S its Fourier-Walsh expansion. We showed that I_k(f)=\sum_{S:k \in S}\hat f^2(S). It follows that I(f)=\sum_S\hat f^2(S)|S|. The Fourier-theoretic proof for I(f) ≥ 4 t (1-t) where t=μ(f) now follows easily.

3) Chinchine, hypercontractivity and the discrete isoperimetric inequality.

Next we discussed what will it take to prove the better estimate I(f) ≥ K t log t. We stated Chinchine inequality, and explained why is Chinchine inequality relevant: For Boolean functions the pth power of the p-norm does not depend on p. (It always equals t.) Therefore if t is small, the p-th norms themselves much be well apart! After spending a few moments on the history of the inequality (as told to me by Ron Blei) we discussed what kind of extension do we need, and stated  the Bonami-Gross-Beckner inequality. We use Bonami’s inequality to proof of the inequality I(f) ≥ K t log t and briefly talked about what more does it give.

Lecture 5

1) Review and examples. We reviewed what we did in the previous lecture and considered a few examples: Dictatiorship; the AND function (an opportunity to mention the uncertainty principle,) and MAJORITY on three variables. We also mentioned another connection between influences and Fourier-Walsh coefficients: for a monotone (non decreasing) Boolean function f, I_k(f) = -2\hat f(\{k\}).

2) KKL’s theorem

KKL’s theorem: There is an absolute constant K so that for every Boolean function f, with t=μ(f), there exists k, 1 ≤ k ≤ n such that

I_k(f) \ge K t (1-t) logn/n.

To prove of KKL’s theorem: we repeat, to a large extent, the steps from Lecture 4 (of course, the proof of KKL’s theorem was where this line of argument came from.) We showed that if all individual influences are below 1/sqrt n than I(f) \ge K t(1-t) \log n.

We mentioned one corollary: For Boolean function invariant under a transitive group of permutations, all individual influences are equal and therefore I(f) \ge K t (1-t)\log n.

3) Further problems

In the  last part of the lecture we mentioned seven problems regarding influence of variables and KKL’s theorem (and I added two here):

1) What can be said about balanced Boolean functions with small total influence?

2) What can be said about Boolean functions for which I(f) ≤ K t log (1/t), for some constant K, where t=μ(f)?

3) What can be said about the connection between the symmetry group and the minimum total influence?

4) What can be said about Boolean functions (1/3 ≤ μ(f)≤ 2/3, say) for which \max I_k(f) \le K log n/n.?

5) What more can be said about the vector of influences (I_1(f),I_2(f), \dots I_n(f))?

6)* What is the sharp constant in KKL’s theorem?

7)* What about edge expansions of (small) sets of vertices in general graphs?

8) Under what conditions I(f) \ge n^\beta for β >0.

9) What about influence of larger sets? In particular, what is the smallest t (as a function of n ) such that if \mu(f)=t there is a set S of variables S ≤ 0.3n with I_S(f) \ge 0.9?

(This post  is a short version, I will add details later on.)

Posted in Combinatorics, Computer Science and Optimization, Probability, Teaching | Tagged , , | Leave a comment

Richard Stanley: How the Proof of the Upper Bound Theorem (for spheres) was Found

The upper bound theorem asserts that among all d-dimensional polytopes with n vertices, the cyclic polytope maximizes the number of facets (and k-faces for every k). It was proved by McMullen for polytopes in 1970, and by Stanley for general triangulations of spheres in 1975. This theorem is related to a lot of mathematics (and also computational geometry) and there are many interesting extensions and related conjectures.

Richard Stanley posted (on his  homepage which is full with interesting things) an article describing How the proof of the upper bound theorem for triangulations of spheres was found.

It is interesting how, for Richard, the work oמ face-numbers of polytopes started with his work on integer points in polytopes and especially the Anand-Dumir-Gupta” conjecture on enumeration of “magic squares.” (See this survey article by Winfried Bruns.)  Integer points in polytopes, and face numbers represent the two initial chapters of Richard’s “green book” on Commutative algebra and combinatorics. Both these topics are related to commutative algebra and to the algebraic geometry of toric varieties.

CCA

See also these relevant posts “(Eran Nevo) The g-conjecture II: The commutative algebra connection,), (Eran Nevo) The g-conjecture I, and How the g-conjecture came about. Various results and problems related to the upper bound theorem can be found in Section 2 of my paper Combinatorics with a Geometric Flavor.

Posted in Combinatorics, Convex polytopes | Tagged , | 2 Comments

Simons@UCBerkeley

raghu
Raghu Meka talking at the workshop 

I spend the semester in Berkeley at the newly founded Simons Institute for the Theory of Computing. The first two programs demonstrate well the scope of the center and why it is needed. One program on real analysis in computer science seems to demonstrate very theoretical aspects of the theory of computing and its relations with pure mathematics. The second program is on big data analysis, a hot topic in statistics, machine learning and many areas of science, technology, and beyond. And one surprise is that there is a lot in common to these two areas. Next semester, there will be one special program on quantum Hamiltonian complexity which again may seem in the very far-theoretic side of TOC as it largely deals with computational classes between NP and QMA, (far beyond what we seem relevant to real-life), and yet this study is related to classical efficient algorithms in condensed matter physics, with connections to real-life physics and technology. The other special program in Spring 2014 is on evolution (evolutionary biology and TOC)!

The program I am mainly involved with is on real analysis in computer science. It started with a very successful and interesting workshop Real Analysis in Testing, Learning and Inapproximability. This week (Spet 9-13) there is a “Boot camp on real analysis” with three mini-courses. The amount of activity in the center and around it is too vast to allow any sort of live-blogging but I do hope to share with you a few things over the semester. Two things for now: Kalvin Lab, the building where the institute is located is beautiful! The video coverage of talks, (both the live streaming and the video archives) is of very high quality (reflecting both the excellent equipment and the people that operate it).

It is always a special feeling to witness the first days of the academic year whether it is in Israel or in the US. Many students returning to school, many first-year students, at times, accompanied by their proud parents, and quite a few colleagues who share this excitement and are just as surprised on how younger and younger these students are becoming (and some other colleagues, who this year are proud parents themselves). This time, I spent a few days in Yale and saw the excitement there, and then continued with the same “high” mood on the first days of school here at Berkeley.

Posted in Conferences, Updates | 1 Comment

Analysis of Boolean functions – week 2

Post on week 1; home page of the course analysis of Boolean functions

Lecture II:

We discussed two important examples that were introduced by Ben-Or and Linial: Recursive majority and  tribes.

Recursive majority (RM): F_m is a Boolean function with 3^m variables and F_{m+1} (x,y,z) = F_1(F_m(x),F_m(y),F_m(z)). For the base case we use the majority function F_1(x,y,z)=MAJ(x,y,z).

Tribes: Divide your n variables into s pairwise disjoint sets (“tribes”) of cardinality t. f=1 if for some tribe all variables equal one, and thus T=0 if for every tribe there is a variable with value ‘0’.

We note that this is not an odd function i.e. it is not symmetric with respect to switching ‘0’ and ‘1’. To have \mu=1/2 we need to set t=\log_2n - \log_2\log_2n+c. We computed the influence of every variable to be C log n/n. The tribe function is a depth-two formula of linear size and we briefly discussed what are Boolean formulas and Boolean circuits (These notions can be found in many places and also in this post.).

I states several conjectures and questions that Ben-Or and Linial raised in their 85 paper:

Conjecture 1: For every balanced Boolean function with n variables there is a variable k whose influence is \Omega (\log n/n).

Conjecture 2: For every balanced Boolean function with n variables there is a set S of n/log n variables whose influence I_S(f) is 1-o(1).

Question 3: To what extent can the bound in Conjecture 2 be improved if the function f is odd. (Namely, f(1-x_1,1-x_2,\dots, 1-x_n)=1-f(x_1,x_2,\dots, x_n).)

Our next theme was discrete isoperimetric results.  I noted the connection between total influence and edge expansion and proved the basic isoperimetric inequality: If μ(f)=t then I(f) ≥ 4 t(1-t). The proof uses the canonical paths argument.

Lecture III:

We proved using “compression” that sharp bound on I(f) as a function of t=μ(f). We made the analogy between compression and Steiner symmetrization – a classic method for proving the classical isoperimetric theorem. We discussed similar results on vertex boundary and on Talagrand-Margulis boundary (to be elaborated later in the course).

Then We proved the Harris-Kleitman inequality and showed how to deduce the fact that intersecting family of subsets of [n] with the property that the family of complements is also intersecting has at most 2^{n-2} sets.

The next topic is spectral graph theory. We proved the Hoffman bound for the largest size of an independent set in a graph G.

I mentioned graph-Laplacians and the spectral bound for expansions (Alon-Milman, Tanner)..

The proofs mentioned above are so lovely that I will add them on this page, but sometime later.

Next week I will introduce harmonic analysis on the discrete cube and give a Fourier-theoretic explanation for  the additional log (1/t) factor in the edge isoperimetric inequality.

Important announcement: Real analysis boot camp in the Simons Institute for the Theory of Computing, is part of the program in Real Analysis and Computer Science. It is taking place next week on September 9-13 and has three lecture series. All lecture series are related to the topic of the course and especially:

Posted in Combinatorics, Computer Science and Optimization, Probability, Teaching | Tagged , | Leave a comment

Around Borsuk’s Conjecture 3: How to Save Borsuk’s conjecture

Borsuk asked in 1933 if every bounded set K of diameter 1 in R^d can be covered by d+1 sets of smaller diameter. A positive answer was referred to as the “Borsuk Conjecture,” and it was disproved by Jeff Kahn and me in 1993. Many interesting open problems remain.  The first two posts in the series “Around Borsuk’s Conjecture” are here and here. See also these posts (I,II,III, IV), and the post “Surprises in mathematics and theory” on Lipton and Reagan’s blog GLL.

Can we save the conjecture? We can certainly try, and in this post I would like to examine the possibility that Borsuk’s conjecture is correct except from some “coincidental” sets. The question is how to properly define “coincidental.”

Let K be a set of points in R^d and let A be a set of pairs of points in K. We say that the pair (K, A) is general if for every continuous deformation of the distances on A there is a deformation K’ of K which realizes the deformed distances.

(This condition is related to the “strong Arnold property” (aka “transversality”) in the theory of Colin de Verdière invariants of graphs; see also this paper  by van der Holst, Lovasz and Schrijver.)

Conjecture 1: If D is the set of diameters in K and (K,D) is general then K can be partitioned into d+1 sets of smaller diameter.

We propose also (somewhat stronger) that this conjecture holds even when “continuous deformation” is replaced with “infinitesimal deformation”.

The finite case is of special interest:

A graph embedded in R^d is stress-free if we cannot assign non-trivial weights to the edges so that the weighted sum of the edges containing any  vertex v (regarded as vectors from v) is zero for every vertex v. (Here we embed the vertices and regard the edges as straight line segments. (Edges may intersect.) Such a graph is called a “geometric graph”.) When we restrict Conjecture 1 to finite configurations of points we get.

Conjecture 2: If G is a stress free geometric graph of diameters in R^d  then G is (d+1)-colorable.

A geometric graph of diameters is a geometric graph with all edges having the same length and all non edged having smaller lengths. The attempt for “saving” the Borsuk Conjecture presented here and Conjectures 1 and 2 first appeared in a 2002 collection of open problems dedicated to Daniel J. Kleitman, edited by Douglas West.

When we consider finite configurations of points  we can make a similar conjecture for the minimal distances:

Conjecture 3: If the geometric graph of pairs of vertices realizing the minimal distances of a point-configuration in R^d is stress-free, then it is (d+1)-colorable.

We can speculate that even the following stronger conjectures are true:

Conjecture 4: If G is a stress-free geometric graph in R^d so that all edges in G are longer than all non-edges of G, then G is (d+1)-colorable.

Conjecture 5: If G is a stress-free geometric graph in R^d so that all edges in G are shorter than all non-edges of G, then G is (d+1)-colorable.

We can even try to extend the condition further so edges in the geometric graph will be larger (or smaller) than non-edges only just “locally” for neighbors of each given vertex.

Comments:

1) It is not true that every stress-free geometric graph in R^d is (d+1)-colorable, and not even that every stress-free unit-distance graph is (d+1)-colorable. Here is the (well-known) example referred to as the Moser Spindle. Finding conditions under which stress-free graphs in R^d are (d+1)-colorable is an interesting challenge.

MoserSpindle_700

2) Since a stress-free graph with n vertices has at most dn - {{d+1} \choose {2}} edges it must have a vertex of degree 2d-1 or less and hence it is 2d colorable. I expect this to be best possible but I am not sure about it. This shows that our “saved” version of Borsuk’s conjecture is of very different nature from the original one. For graphs of diameters in R^d the chromatic number can, by the work of Jeff and me be exponential in \sqrt d.

3) It would be interesting to show that conjecture 1 holds in the non-discrete case when  d+1 is replaced by 2d.

4) Coloring vertices of geometric graphs where the edged correspond to the minimal distance is related also the the well known Erdos-Faber-Lovasz conjecture.ERDSFA~1.

See also this 1994 article by Jeff Kahn on Hypergraphs matching, covering and coloring problems.

5) The most famous conjecture regarding coloring of graphs is, of course, the four-color conjecture asserting that every planar graph is 4-colorable that was proved by Appel and Haken in 1976.  Thinking about the four-color conjecture is always both fascinating and frustrating. An embedding for maximal planar graphs as vertices of a convex 3-dimensional polytope is stress-free (and so is, therefore, also a generic embedding), but we know that this property alone does not suffice for 4-colorability. Finding further conditions for  stress-free graphs in R^d that guarantee (d+1)-colorability can be relevant to the 4CT.

An old conjecture of mine asserts that

Conjecture 6: Let G be a graph obtained from the graph of a d-polytope P by triangulating each (non-triangular) face with non-intersecting diagonals. If G is stress-free (in which case the polytope P is called “elementary”) then G is (d+1)-colorable.

Closer to the conjectures of this post we can ask:

Conjecture 7: If G is a stress-free geometric graph in R^d so that for every edge  e of G  is tangent to the unit ball and every non edge of G intersect the interior of the unit ball, then G is (d+1)-colorable.

A question that I forgot to include in part I.

What is the minimum diameter d_n such that the unit ball in R^n can be covered by n+1 sets of smaller diameter? It is known that 2-C'\log n/n \le d_n\le 2-C/n for some constants C and C’.

Posted in Combinatorics, Convexity, Geometry, Open problems | Tagged , , , | Leave a comment

Analysis of Boolean Functions – week 1

Home page of the course.

In the first lecture I defined the discrete n-dimensional cube and  Boolean functions. Then I moved to discuss five problems in extremal combinatorics dealing with intersecting families of sets.

1) The largest possible intersecting family of subsets of [n];

2) The largest possible intersecting family of subsets of [n] so that the family of complements is also intersecting;

3) The largest possible family of graphs on v vertices such that each two graphs in the family contains a common triangle;

4) Chvatal’s conjecture regarding the maximum size of an intersecting family of sets contained in an ideal of sets;

5) Erdos-Ko-Rado Theorem.

Exercise: Prove one of the following

a) The Harris-Kleitman’s inequality

b) (from the H-K inequality) Every family of subsets of [n] with the property that every two sets have non-empty intersection and no full union contains at most 2^{n-2} sets.

More reading: this post :”Extremal combinatorics I: extremal problems on set systems“. Spoiler: The formulation of Chvatal’s conjecture but also the answer to the second exercise can be found on this post: Extremal combinatorics III: some basic theorems. (See also peoblen 25 in the 1972 paper Selected combinatorial research problems by Chvatal, Klarner and Knuth.)

feige

I moved to discuss the problem of collective coin flipping and the notion of influence as defined by Ben-Or and Linial. I mentioned the Baton-passing protocol, the Alon-Naor result, and Feige’s two-rooms protocol.

More reading: this post :” Nati’s influence“. The original paper of Ben-Or Linial:  Collective coin flipping, M. Ben-Or  and N. Linial, in “Randomness and    Computation” (S. Micali ed.) Academic Press, New York, 1989, pp.    91-115.

Posted in Combinatorics, Computer Science and Optimization, Teaching | Tagged , , , | 1 Comment

Open Collaborative Mathematics over the Internet – Three Examples

After much hesitation, I decided to share with you the videos of my lecture: Open collaborative mathematics over the internet – three examples, that I gave last January in Doron Zeilberger’s seminar at Rutgers on experimental mathematics. Parts of the 47-minutes talk is  mathematical, while other parts are about mathematics on the Internet, blogs,  the polymath projects, MathOverflow, etc.

Here is the video of part I (30 minutes) and here is the video of  part II (17 minutes).

I tried to give some homage to Doron’s own lecture style, but when I saw the video, I could not ignore some aspects of my own style – complete indifference between plus and a minus, between x^2 and \sqrt x, between multiplying by something and dividing by the same thing, between subscripts and superscripts, between what are “rows: and what are “columns”, etc., and, in addition, randomly ending English words with an ‘s’ regardless of what English grammar dictates. Apologies. This was a rare occasion of giving a talk about “meta” matters of doing mathematics.

My first (and main) example was Erdős discrepancy’s problem, and I mentioned some experimental aspects, and heuristic arguments. The second example was Möbius randomness,  continued with some comments on MathOverflow, some  of the “goodies” one can earn participating in MathOverflow, and some comments on a debate regarding polymath projects organized by I.A.S a few years ago.  The third example was about mathematically oriented skepticism. This time not about my debate with Aram Harrow and quantum computing skepticism (that I briefly mentioned), but about my “Angry Birds” skepticism. The lecture was a mixture of a blackboard talk and presentation of various Internet site.

Summary and links

Here are the links to the Internet pages I presented with an outline of the lecture:

Video I (Erdős discrepancy problem)

00:00-00:43 Doron’s introduction; 00:43-4:00 On the screen: Erdős discrepancy problems: showing this post (EDP22) from Gowers’s blog. I talked  about the chaotic nature of mathematics on the Internet. Then I explained what are polymath projects, mentioned polymath1 and density Hales-Jewett, other polymath projects, and polymath5.

4:00-12:00 Polymath5, discrepancy of hypergraphs, and The Erdős Discrepancy problem. [6:06 "There is nothing more satisfying in a lecture than seeing attempts by the speaker to move an unmovable blackboard"]; Some basic observations, random signs, Mobius functions, the log n example.

12:00-16:00 Plan for the next fifteen minutes: a) Greedy methods, b)Heuristic approaches, c) Hereditary discrepancy

17:00 – 18:40 Alex Nokolov and Kinal Talwar’s work on hereditary discrepancy. On the screen: Talwar’s post discrepancy to privacy and back on Windows on Theory

18:40 -23:00   Greedy approaches. On the screen:  My question on MathOverflow – on a certain greedy algorithm for Erdős discrepancy problem; The MO answer by ‘rlo'; a follow up question

23:00 – 27:00 What is MathOverflow. On the screen: my old MO page. (Here is the new one.) What is  “MO- reputation,” which “badges” you can earn and for what; Earning “hats” in more advanced site: On the screen:  hat champions from TCS-Stackexchange, winter 2012. (This webpage is no longer available.)

27:00 – 30:00 My heuristic approach for EDP

Video II (Möbius randomness, and angry birds)

On the screen: My MO question on Möbius randomness of the Walsh functions

00:00-02:00 Diversion: The IAS debate on polymath projects between Gowers and Sarnak, and comments  by me and by Noga Alon.

02:00-05:40  Möbius randomness and computational complexity. What is Mõbius randomness (MR);  Peter Sarnak’s Jerusalem talk on MR, his belief regarding the hardness of factoring, and my opinion; The AC^0 prime number conjectures and Walsh functions. My MO question; Ben Green’s MO-answer to one problem and Bourgain’s result answering a second question.

05:40-7:50 A remaining question on MR of Rudin-Shapiro sequences. On the screen – my MO question Mobius randomness of the Rudin-Shapiro sequence; What is a bounty in MathOverflow; Diversion: The power that comes with high reputation on MO. Who won my bounty.

7:50-09:30  My debate with Aram Harrow on my Quantum computers skepticism. On the screen the debates first post and then the post “can you hear the shape of a quantum computer?

09:30-14:30 Example 3: Another mathematically related skepticism – Angry Birds. On the screen this page from Arqade : Most voted questions (a few are very amusing); Introduction to the game “Angry Birds”; My skeptical theory: “Angry Birds” is becoming easier with new versions, and the statistical argument for it. I mentioned the Arqade’s question “Is angry Birds deterministic” that got (then) 143 upvotes, and was a role-model for me.

abd

14:30-17:00 My Arqade question, my hopes, and how it was accepted by the computer-games community;  On the screen: my Arqade question:

Continue reading

Posted in Mathematics over the Internet | Tagged , , , , , , | 1 Comment

Poznań: Random Structures and Algorithms 2013

michal  photo

Michal Karonski (left) who built Poland’s probabilistic combinatorics group at Poznań, and a sculpture honoring the Polish mathematicians who first broke the Enigma machine (right, with David Conlon, picture taken by Jacob Fox).

I am visiting now Poznań for the 16th Conference on Random Structures and Algorithms. This bi-annually series of conferences started 30 years ago (as a satellite conference to the 1983 ICM which took place in Warsaw) and this time there was also a special celebration for Bela Bollobás 70th birthday. I was looking forward to  this first visit to Poland which is, of course, a moving experience for me. Before Poznań I spent a few days in Gdańsk visiting Robert Alicki.

Today (Wednesday)  at the Poznań conference I gave a lecture on threshold phenomena and here are the slides. In the afternoon we had the traditional random run with a record number of runners.

Let me briefly tell you about very few of the other lectures:

Update (Thursday): A very good day, and among others a great talk of Jacob Fox on Relative Szemeredi Theorem (click for the slides from a similar talk from Budapest) where he presented a joint work with David Conlon and Yufei Zhao giving a very general and strong form of Szemeredi theorem for quasi-random sparse sets, which among other applications, leads to a much simpler proof of the Green -Tao theorem.

Mathias Schacht

Mathias Schacht gave a wonderful talk  on extremal results in random graphs (click for the slides) which describes some large recent body of highly successful research on the topic.

Here are two crucial slides, and going through the whole presentation can give a very good overall picture.

ms1

mt2

Vera Sós

Vera Sós gave an inspiring talk about the random nature of graphs which are extremal to the Ramsey property and connections with graph limits. Vera presented the following very interesting conjecture on graph limits.

We say that a sequence of graphs (G_n) has a limit if for every k and every graph H with k vertices the proportion in G_n of induced H-subgraphs among all k-vertex induced subgraphs tend to a limit. Let us also say that (G_n) has a V-limit if for every k and every e the proportion in G_n of induced subgraphs with k vertices and e edges among all k-vertex induced subgraphs tend to a limit.

Sós’ question: Is having a V-limit equivalent to having a limit.

This is open even in the case of quasirandomness, namely, when the limit is given by the Erdos-Renyi model G(n,p). (Update: in this case V-limit is equivalent to limit, as several participants of the conference observed.)

Both a positive and a negative answer to this fundamental question would lead to many further (different) open problems.

Joel Spencer

Joel Spencer gave a great (blackboard) talk about algorithmic aspects of the probabilistic method, and how existence theorems via the probabilistic method now often require complicated randomized algorithm. Joel mentioned his famous six standard deviation theorem. In this case, Joel conjectured thirty years ago that there is no efficient algorithm to find the coloring promised by his theorem. Joel was delighted to see his conjecture being refuted first by Nikhil Bansal (who found an algorithm whose proof depends on the theorem) and then later by Shachar Lovett and  Raghu Meka (who found a new algorithm giving a new proof) . In fact, Joel said, having his conjecture disproved is even more delightful than having it proved.

Based on this experience Joel and I are now proposing another conjecture:

Kalai-Spencer (pre)conjecture: Every existence statement proved by the probabilistic method can be complemented by an efficient (possibly randomized) algorithm.

By “complemented by an efficient algorithm” we mean that there is an efficient(polynomial time)  randomized algorithm to create the promised object with high probability.  We refer to it as a preconjecture since the term “the probabilistic method” is not entirely well-defined. But it may be possible to put this conjecture on formal grounds, and to discuss it informally even before.

Posted in Combinatorics, Conferences, Open problems, Philosophy, Probability | Tagged , | Leave a comment

BosonSampling and (BKS) Noise Sensitivity

Following are some preliminary observations connecting BosonSampling, an interesting  computational task that quantum computers can perform (that we discussed in this post), and noise-sensitivity in the sense of Benjamini, Schramm, and myself (that we discussed here and here.)

BosonSampling and computational-complexity hierarchy-collapse

Suppose that you start with n bosons each can have m locations. The i-th boson is in superposition and occupies the j-th location with complex weight a_{ij}. The bosons are indistinguishable which makes the weight for a certain occupation pattern proportional to the permanent of a certain n by n submatrix of the n by m matrix of weights.

Boson Sampling is a task that a quantum computer can perform. As a matter of fact, it only requires a “boson machine” which represents only a fragment of quantum computation. A boson machine is a quantum computer which only manipulates indistinguishable bosons with gated described by phaseshifters and beamsplitters.

BosonSampling and boson machines were studied in a recent paper The Computational Complexity of Linear Optics of Scott Aaronson and Alex Arkhipov (AA). They proved (Theorem 1 in the paper) that if (exact) BosonSampling can be performed by a classical computer then this implies a collapse of the computational-complexity polynomial hierarchy (PH, for short). This result adds to a similar result achieved independently by Michael J. Bremner, Richard Jozsa, and Dan J. Shepherd (BJS) in the paper entitled: “Classical simulation of commuting quantum computations implies collapse of the polynomial hierarchy,” and to older results by  Barbara Terhal and David DiVincenzo (TD) in the paper Adaptive quantum computation, constant depth quantum circuits and Arthur-Merlin games, Quant. Inf. Comp. 4, 134-145 (2004).

Since universal quantum computers can achieve BosonSampling (and the other related computational tasks considered by TD and BJS), this is a very strong indication for the computational complexity advantage of quantum computers which arguably brings us with quantum computers to the “cozy neighborhood” of NP-hardness.

Noisy quantum computers with quantum fault-tolerance are also capable of exact BosonSampling and this strong computational-complexity quantum-superiority applies to them as well.

Realistic BosonSampling and Gaussian Permanent Estimation (GPE)

Aaronson an Arkhipov considered the following question that they referred to as Gaussian Permanent Approximation.

Problem (Problem 2 from AA’s paper): (|GPE|_{\pm}^2): Given as imput a matrix {\cal N}(0,1)_{\bf C}^{n \times n} of i.i.d Gaussians,together with error bounds ε, δ > o, estimate to within additive error \pm \epsilon n! with probability at leat 1-δ over X, in poly(n,1/\epsilon,1/\delta) time.

They conjectured that such Gaussian Permanent Approximation is computationally hard and showed (Theorem 3) that this would imply that sampling w.r.t. states achievable by boson machines cannot even be approximated by classical computing (unless PH collapses). They regarded questions about approximation more realistic in the context of decoherence where we cannot expect exact sampling.

Scott Aaronson also expressed guarded optimism that even without quantum fault-tolerance BosonSampling can be demonstrated by boson machines for 20-30 bosons, leading to strong experimental evidence for computational advantage of quantum computers (or, if you wish, against the efficient Church-Turing thesis).

Is it so?

More realistic BosonSampling and Noisy Gaussian Permanent Estimation (NGPE)

Let us consider the following variation that we refer to as Noisy Gaussian Permanent Estimation:

Problem 2′: (|NGPE|_{\pm}^2): Given as imput a matrix M= {\cal N}(0,1)_{\bf C}^{n \times n} of i.i.d Gaussians, and a parameter t>0 let NPER (M),  be the expected value of the permanent for \sqrt {1-t^2}M+E where E= {\cal N}(0,t)_{\bf C}^{n \times n}.  Given the input matrix M together with error bounds ε, δ > o, estimate NPER(M) to within additive error \pm \epsilon n! with probability at leat 1-δ over X, in poly(n,1/\epsilon,1/\delta) time.

Problem 2′ seems more relevant for noisy boson machines (without fault-tolerance). The noisy state of the computer is based on every single boson  being slightly mixed, and the permanent computation will average these individual mixtures. We can consider the relevant value for t to be a small constant. Can we expect Problem 2′ to be hard?

The answer for Question 2′ is surprising. I expect that even when t is very very tiny, namely t=n^{-\beta} for \beta <1, the expected value of NPER(M) (essentially) does not depend at all on M!

Noise Sensitivity a la Benjamini, Kalai and Schramm

Noise sensitivity for the sense described here for Boolean functions was studied in a paper by Benjamini Schramm and me in 1999.  (A related notion was studied by Tsirelson and Vershik.) Lectures on noise sensitivity and percolation is a new beautiful monograph by Christophe Garban and Jeff Steif which contains a description of noise sensitivity. The setting extends easily to the Gaussian case. See this paper by Kindler and O’donnell for the Gaussian case. In 2007, Ofer Zeituni and I studied the noise sensitivity in the Gaussian model of the maximal eigenvalue of random Gaussian matrices (but did not write it up).

NS

Noise sensitivity depends on the degree of the support of the Fourier expansion. For determinants or permanents of an n by n matrices the basic formulas as sums of generalized diagonals describe the Fourier expansion,  so the Fourier coefficients are supported on degree-n monomials. This implies that the determinant and the permanent are very noise sensitive.

Noisy Gaussian Permanent Estimation is easy

Noisy Gaussian Permanent Estimation is easy, even for very small amount of noise, because the outcome does not depend at all on the input. It is an interesting question what is the hardness of NGPE is when the noise is below the level of noise sensitivity.

Update (March, 2014) Exploring the connection between BosonSampling and BKS-noise sensitivity shows that the picture drawn here is incorrect. Indeed, the square of the permanent is not noise stable even when the noise is fairly small and this shows that the noisy distribution does not approximate the noiseless distribution. However the noisy distribution does depend on the input!

 

AA’s protocol and experimental BosonSampling

Scott and Alex proposed a simple experiment described as follows : “An important motivation for our results is that they immediately suggest a linear-optics experiment, which would use simple optical elements (beamsplitters and phaseshifters) to induce a Haar-random m \times m unitary transformation U on an input state of n photons, and would then check that the probabilities of various final states of the photons correspond to the permanents of n \times n submatrices, as predicted by quantum mechanics.”

Recently, four groups carried out interesting BosonSampling experiments with 3 bosons (thus for permanents of 3 by 3 matrices.) (See this post on Scott’s blog.)

BKS-noise sensitivity is giving  simple predictions on how things will behave as a function of the number of bosons and this can be tested even with experiments with very small number of bosons. When you increase the number of bosons the distribution will quickly become uniform for the various final states. The correlation between the probabilities and the value corresponding to permanents will rapidly go to zero.

Some follow-up questions

Here are a few interesting questions that deserve further study.

1) Does problem 2′ capture the general behavior of noisy boson machines? To what generality noise sensitivity applies for general functions described by Boson sampling distributions?

(There are several versions for photons-based quantum computers including even an important  model by Knill, Laflamme, and Milburn that support universal quantum computation. The relevance of BKS noise-sensitivity should be explored carefully for the various versions.)

2) Is the connection with noise sensitivity relevant to the possibility to have boson machines with fault tolerance?

3) What is the Gaussian-quantum analog for BKS’s theorem asserting that noise sensitivity is the law unless  we have substantial correlation with the majority function?

4) What can be said about noise-sensitivity of measurements for other quantum codes?

A few more remarks:

More regarding noisy boson machines and quantum fault tolerance

Noisy boson machines and BosonSampling are related to various other issues regarding quantum fault-tolerance. See my added recent remarks (about the issue of synchronization, and possible modeling using smoothed Lindblad evolutions) to my old post on AA’s work.

Noise sensitivity and the special role of the majority function

bks

The main result of Itai, Oded, and me was that a Boolean function which is not noise sensitive must have a substantial correlation with the majority function. Noise sensitivity and the special role of majority for it gave me some motivation to look at quantum fault-tolerance in 2005  (see also this post,) and this is briefly discussed in my first paper on the subject, but until now I didn’t find an actual connection between quantum fault-tolerance and BKS-noise-sensitivity.

Censorship

It is an interesting question which bosonic states are realistic, and it came up in some of my papers and in the discussion with Aram Harrow and Steve Flammia (and their paper on my “Conjecture C”).

A sort of conclusion

BosonSampling was offered as a way to prove that quantum mechanics allows a computational advantage without using the computational advantage for actual algorithmic purpose. If you wish, the AA’s protocol is offered as a sort of zero-knowledge proof that the efficient Church-Turing thesis is false.  It is a beautiful idea that attracted interest and good subsequent work from theoreticians and experimentalists. If the ideas described here are correct, BosonSampling and boson machines may give a clear understanding based on BKS noise-sensitivity for why quantum mechanics does not offer computational superiority (at least not without the magic of quantum fault-tolerance).

Avi’s joke and common sense

Here is a quote from AA referring to a joke by Avi Wigderson: “Besides bosons, the other basic particles in the universe are fermions; these include matter particles such as quarks and electrons. Remarkably, the amplitudes for n-fermion processes are given not by permanents but by determinants of n×n matrices. Despite the similarity of their definitions, it is well-known that the permanent and determinant differ dramatically in their computational properties; the former is #P-complete while the latter is in P. In a lecture in 2000, Wigderson called attention to this striking connection between the boson and fermion dichotomy of physics and the permanent-determinant dichotomy of computer science. He joked that, between bosons and fermions, ‘the bosons got the harder job.’ One could view this paper as a formalization of Wigderson’s joke.”

While sharing the admiration to Avi in general and Avi’s jokes in particular, if we do want to take Avi’s joke seriously (as we always should), then the common-sense approach would be first to try to understand why is it that nature treats bosons and fermions quite equally and the dramatic computational distinction is not manifested at all. The answer is that a crucial ingredient for a computational model is the modeling of noise/errors, and that noise-sensitivity makes bosons and fermions quite similar physically and computationally.

Eigenvalues, determinants, and Yuval Filmus

It is an interesting question (that I asked over Mathoverflow) to understand the Fourier expansion of the set of eigenvalues, the maximum eigenvalue and related functions. At a later point,  last May,  I was curious about the Fourier expansion of the determinant, and for the Boolean case I noticed remarkable properties of its Fourier expansion. So I decided to ask Yuval Filmus about it:

Dear Yuval

 I am curious about the following. Let D be the function defined on {-1,1}^n^2
which associates to every +1/1 matrix its determinant.
What can be said about the Fourier transform of D? It looks to me that easy arguments shows that the Fourier transform is supported only on subsets of the entries
so that in every raw and columns there are odd number of entries. Likely there are even further restrictions that I would be interested to explore.
Do you know anything about it?
best Gil

Yuval’s answer came a couple of hours later like a cold shower:

Hi Gil,

The determinant is a sum of products of generalized diagonals.
Each generalized diagonal is just a Fourier character, and they are all different.

In other words, the usual formula for the determinant *is* its Fourier transform

This reminded me of a lovely story of how I introduced Moni Naor to himself that I should tell sometime.

What else can a quantum computer sample?

The ability of quantum computers to sample (exactly) random complex Gaussian matrices according to the value of their permanents is truly amazing! If you are not too impressed by efficient factoring but still do not believe that QC can reach the neighborhood of NP-hard problems you may find this possibility too good to be true.

I am curious if sharp P reductions give us further results of this nature. For example,  can a QC sample random 3-SAT formulas (by a uniform distribution or by a certain other distribution coming from sharp-P reductions) according to the number of their satisfying assignments?

Can QC sample integer polytopes by their volume or by the number of integer points in them? Graphs by the number of 4-colorings?

Posted in Computer Science and Optimization, Physics, Probability | Tagged , , , | 4 Comments