Category Archives: Probability

More Reasons for Small Influence


Readers of the big-league ToC blogs have already heard about the breakthrough paper An average-case depth hierarchy theorem for Boolean circuits by Benjamin Rossman, Rocco Servedio, and Li-Yang Tan. Here are blog reports on Computational complexity, on the Shtetl Optimized, and of Godel Lost letter and P=NP. Let me mention one of the applications: refuting a 1999 conjecture by Benjamini, Schramm and me.

Update: Li-Yang Tang explained matters in an excellent comment below. Starting with: “In brief, we believe that an average-case depth hierarchy theorem rules out the possibility of a converse to Hastad-Boppana-LMN when viewed as a statement about the total influence of *constant*-depth circuits. However, while the Inf(f) <= (O(\log(S)))^{d-1} bound is often applied in the setting where d is constant,  it in fact holds for all values of d. It would interesting to explore the implications of our result in regimes where d is allowed to be super-constant.

Let me add that the bounded depth case is an important case (that I referred to here), that there might be some issues failing the conjecture for non-constant depth “for the wrong reasons”,  and that I see good prospect that RST’s work and techniques will refute BKS conjecture in full also for non-bounded depth.

Update:  Rossman, Servedio, and Tan refuted some important variations of our conjecture, while other variations remain open. My description was not so accurate and in hindsight I could also explained the background and motivation better. So rather than keep updating this post, I will write a new one in a few weeks. 


Theorem: If f is described by a bounded depth circuit of size s and depth d then I(f) the total influence of f,  is at most (\log s)^{d-1}.

The total influence I(f) of f is defined as follows: for an input x write I(x) for the number of neighbors y of x with f(y) \ne f(x). I(f) = \mathbb EI(x).

The history of this result as I  remember it is that: it is based on a crucial way on  Hastad Switching lemma going back to Hastad 1986 thesis, and for monotone functions one can use an even earlier 1984 result by Boppana.  It was first proved  (with exponent “d”) in 1993 by Linial-Mansour and  Nisan, as  a consequence of their theorem on the decay of Fourier coefficients for AC0 functions, (also based on the switching lemma). With the correct exponent d-1 it is derived from the switching lemma in a short clean argument in a 97 paper by Ravi Boppana; and finally it was extended to a sharpening of LMN result about the spectral decay by Hastad (2001).

Sipser MIT photo

Mike Sipser

Inverse Boppana-Hastad-LMN

Conjecture: (Benjanmini, Kala, and Schramm, 1999): Every Boolean function  f  is “close” to some depth-d size s circuit with (\log s)^{d-1} not much larger than I(f).

Of course, the exponent (d-1) is strongest possible but replacing it with some constant times d  is also of interest. (Also the monotone case already capture much interest.)

As we will see the conjecture is false even if the exponent d-1 is replaced by a constant times d. I do not know what is the optimal function u(d) if any for which you can replace the exponent d-1 by u(d).

Update: Following some comments by Boaz Barak I am not sure that indeed the new examples and results regarding them leads to disproof of our conjecture. The remarkable part of RST’s paper is that the RST example cannot be approximated by a circuit of smaller depth – even by one. (This has various important applications.) In order to disprove our conjecture one need to show that the influence of the example is smaller than what Boppana’s inequality ((log size)^{depth-1} ) gives. This is not proved in the paper (but it may be true). 

The RST’s result does say that if the influence is (say) logn (where n is the number of variables,) and the function depends on a small number of variables then it need not be correlated with a function in AC0.  

Anyway I will keep you posted.  

in 2007 O’Donnell and Wimmer showed that our inverse conjecture is false as stated. They took a Boolean function which is a tribe function on half the variables and “anti-tribes” on the rest. This still left the possibility that the exponent d-1 could be replaced by d or that “close” could be replaced by a weaker conclusion on substantial correlation.

Rossman,  Servedio, and a genuinely new reason for small influence!Their example, named after Mike Sipser,  is based on the AND-OR tree – a Boolean formula with alternating AND and OR levels and carefully designed parameters.  The crucial part is to show that you cannot approximate this function by lower depth circuits. The theorem proved by RST  is amazingly strong and does not allow reducing the depth even by one! The novel technique for proving it of random projections is very exciting.

It is still possible (I think) that such inverse theorems hold when the individual influences of all variables is below polylog(n)/n where n is the number of variables. Let me pose it as a conjecture:

Conjecture: Every Boolean function  f  with n variables and individual influences below polylog (n)/n is close to a function g in AC0   of size s depth d where (log s)^d is polylog (n).

And here is a post on TCSexchange with a question about “monotone  vs positive” for the class P. Similar questions for AC0 and TC0 were asked in this post.

Two Delightful Major Simplifications


Arguably mathematics is getting harder, although some people claim that also in the old times parts of it were hard and known only to a few experts before major simplifications had changed  matters. Let me report here about two recent remarkable simplifications of major theorems. I am thankful to Nati Linial who told me about the first and to Itai Benjamini and Gady Kozma who told me about the second. Enjoy!

Random regular graphs are nearly Ramanujan: Charles Bordenave gives a new proof of Friedman’s second eigenvalue Theorem and its extension to random lifts

Here is the paper. Abstract: It was conjectured by Alon and proved by Friedman that a random d-regular graph has nearly the largest possible spectral gap, more precisely, the largest absolute value of the non-trivial eigenvalues of its adjacency matrix is at most 2\sqrt{d-1} +o(1) with probability tending to one as the size of the graph tends to infinity. We give a new proof of this statement. We also study related questions on random n-lifts of graphs and improve a recent result by Friedman and Kohler.

A simple proof for the theorem of Aizenman and Barsky and of Menshikov. Hugo Duminil-Copin and Vincent Tassion give  a new proof of the sharpness of the phase transition for Bernoulli percolation on \mathbb Z^d

Here is the paper Abstract: We provide a new proof of the sharpness of the phase transition for nearest-neighbour Bernoulli percolation. More precisely, we show that – for p<p_c, the probability that the origin is connected by an open path to distance $n$ decays exponentially fast in $n$. – for p>p_c, the probability that the origin belongs to an infinite cluster satisfies the mean-field lower bound \theta(p)\ge\tfrac{p-p_c}{p(1-p_c)}. This note presents the argument of this paper by the same authors, which is valid for long-range Bernoulli percolation (and for the Ising model) on arbitrary transitive graphs in the simpler framework of nearest-neighbour Bernoulli percolation on \mathbb Z^d.

Influence, Threshold, and Noise



My dear friend Itai Benjamini told me that he won’t be able to make it to my Tuesday talk on influence, threshold, and noise, and asked if I already have  the slides. So it occurred to me that perhaps I can practice the lecture on you, my readers, not just with the slides (here they are) but also roughly what I plan to say, some additional info, and some pedagogical hesitations. Of course, remarks can be very helpful.

I can also briefly report that there are plenty of exciting things happening around that I would love to report about – hopefully later in my travel-free summer. One more thing: while chatting with Yuval Rabani and Daniel Spielman I realized that there are various exciting things happening in algorithms (and not reported so much in blogs). Much progress has been made on basic questions: TSP, Bin Packing, flows & bipartite matching, market equilibria, and k-servers, to mention a few, and also new directions and methods. I am happy to announce that Yuval kindly agreed to write here an algorithmic column from time to time, and Daniel is considering contributing a guest post as well.

The second AMS-IMU meeting

Since the early 70s, I have been a devoted participants in our annual meetings of the Israeli Mathematical Union (IMU), and this year we will have the second joint meeting with the American Mathematical Society (AMS). Here is the program. There are many exciting lectures. Let me mention that Eran Nevo, this year Erdős’ prize winner, will give a lecture about the g-conjecture. Congratulations, Eran! Among the 22 exciting special sessions there are a few related to combinatorics, and even one organized by me on Wednsday and Thursday.

Contact person: Gil Kalai,
TAU, Dan David building, Room 103
 Wed, 10:50-11:30 Van H. Vu (Yale University) Real roots of random polynomials (abstract)
Wed, 11:40-12:20 Oriol Serra (Universitat Politecnica de Catalunya, Barcelona)  Arithmetic Removal Lemmas (abstract)
 Wed, 12:30-13:10 Tali Kaufman (Bar-Ilan University)  Bounded degree high dimensional expanders (abstract)
 Wed, 16:00-16:40 Rom Pinchasi (Technion)  On the union of arithmetic progressions (abstract)
Wed, 16:50-17:30  Isabella Novik (University of Washington, Seattle) Face numbers of balanced spheres, manifolds, and pseudomanifolds (abstract)
 Wed, 17:40-18:20 Edward Scheinerman (Johns Hopkins University, Baltimore) On Vertex, Edge, and Vertex-Edge Random Graphs (abstract)
 Thu, 9:20-10:00 Yael Tauman Kalai (MSR, New England) The Evolution of Proofs in Computer Science (abstract)
 Thu, 10:10-10:50  Irit Dinur (Weitzman Institute)  Lifting locally consistent solutions to global solutions (abstract)
 Thu, 11:00-11:40 Benny Sudakov (ETH, Zurich) The minimum number of nonnegative edges in hypergraphs (abstract)


And now for my own lecture.

Influence, Threshold, and Noise:

Continue reading

Analysis of Boolean Functions week 5 and 6

Lecture 7

First passage percolation

1)  Models of percolation.

We talked about percolation introduced by Broadbent and Hammersley in 1957. The basic model is a model of random subgraphs of a grid in n-dimensional space. (Other graphs were considered later as well.) Here, a grid is a graph whose vertices have integers coordinates and where two vertices are adjacent if their Euclidean distance is one. Every edge of the grid-graph is taken (or is “open” in the percolation jargon) with the same probability p, independently. We mentioned some basic questions – is there an infinite component? How many infinite components are there? What is the probability that the origin belongs to such an infinite component as a function of p?

I mentioned two results: The first  is Kesten’s celebrated result that the critical probability for planar percolation is 1/2. The other by Burton and Keane is that in very general situations almost surely there is a unique infinite component or none at all. This was a good point to mention a famous conjecture- The dying percolation conjecture (especially in dimension 3) which asserts that at the critical probability there is no infinite component.

We will come back to this basic model of percolation later in the course, but for now we moved to a related more recent model.

2) First passage percolation

We talked about first passage percolation introduced by Hammersley and Welsh in 1965. Again we consider the infinite graph of a grid and this time we let the length of every edge be 1 with probability 1/2 and 2 with probability 1/2 (independently). These weights describe a random metric on this infinite graph that we wish to understand. We consider two vertices (0,0) and (v,0) (for high dimension the second entry can account for a (d-1) dimensional vectors, but we can restrict our attention to d=2) and we let D(x) be the distance between these two vectors. We explained how D is an integer values function on a discrete cube with Liphshitz constant 1. The question we want to address is : What is the variance of D?

Why do we study the variance, when we do not know exactly the expectation, you may ask? (I remember Lerry Shepp asking this when I talked about it at Bell Labs in the early 90s.) One answer is that we know that the expectation of D is linear, and for the variance we do not know how it behaves. Second, we expect that telling the expectation precisely will depend on the model while the way the variance grows and perhaps D‘s limiting distribution, will be universal (say, for dimension 2). And third, we do not give up on the expectation as well. 

Here is what we showed:

1) From the inequality var(D)=\sum_{S\ne \emptyset}\hat D^2(S)\le\sum \hat D^2(S)|S| we derived Kesten’s bound var (D) =O(v).

2) We considered the value s so that \mu(D>s)=t, and showed by the basic inequality above that the variance of D conditioned on D>s is also bounded by v. This corresponds to exponential tail estimate proved by Kesten.

3) Using hypercontractivity we showed that the variance of D conditioned on D>s is actually bounded above by v/log (1/t) which corresponds to Talagrand’s sub-Gaussian tail-estimate.

4) Almost finally based on a certain very plausible lemma we used hypercontructivity to show that most Fourier coefficients of D are above the log v level, improving the variance upper bound to O(v/log v).

5) Since the plausible lemma is still open (see this MO question) we showed how we can “shortcut” the lemma and prove the upper bound without it.

The major open question

It is an open question to give an upper bound of v^{1-\epsilon} or even v^{2/3} which is the expected answer in dimension two. Michel Ledoux wisely proposes to prove it just for directed percolation in the plane (where all edges are directed up and right) from (0,0) to (v,v) where the edge length is Gaussian or Bernoulli.

Lecture 8

Three Further Applications of Discrete Fourier Analysis (without hypercontractivity)

The three next topics will use Fourier but not hypercontractivity. We start by talking about them.

1) The cap-set problem, some perspective and a little more extremal combinatorics

We talked about Roth theorem, the density Hales Jewett theorem,  the Erdos-Rado delta-system theorem and conjecture. We mentioned linearity testing.

2) Upper bounds for error-correcting codes

This was a good place to mention (and easily prove) a fundamental property used in both these cases:  The Fourier transform of convolutions of two functions f and g is the product of the Fourier transform of f and of g.

3) Social choice and Arrow’s theorem

The Fourier theoretic proof for Arrow’s theorem uses only Parseval’s formula so we are going to start with that.

Fourier-theoretic proof of Arrows theorem and related results.

We talked a little about Condorcet(we will later give a more detailed introduction to social choice). We mentioned Condorcet’s paradox, Condorcet’s Jury Theorem, and the notion of Condorcet winner.

Next we formulated Arrow’s theorem.  Lecture 9 was devoted to a Fourier-theoretic proof of Arrow theorem (in the balanced case). You can find it discussed in this blog post by Noam Nisan.  Lecture 10 mentioned a few further application of the Fourier method related to Arrow’s theorem, as well as a simple combinatorial proof of Arrow’s theorem in full generality. For the Fourier proof of Arrow’s theorem we showed that a Boolean function with all its non-zero Fourier coefficients on levels 0 and 1 is constant, dictatorship or anti-dictatorship. This time we formulated FKN theorem and showed how it implies a stability version of Arrow’s theorem in the neutral case.

Analysis of Boolean Functions – Week 3

Lecture 4

In the third week we moved directly to the course’s “punchline” – the use of Fourier-Walsh expansion of Boolean functions and the use of Hypercontractivity.

Before that we  started with  a very nice discrete isoperimetric question on a graph which is very much related to the graph of the discrete cube.

Consider the graph G whose vertices are 0-1 vectors of length n with precisely r ‘1’s, and with edges corresponding to vertices of Hamming distance two. (Which is the minimal Hamming distance between distinct vertices.) Given a set A of m vertices, how small can E(A, \bar A) be? (We already talked about intersecting families of sets of constant size – the Erdos-Ko-Rado theorem and, in general, it is a nice challenge to extend some of the ideas/methods of the course to constant weight situation.)

And now for the main part of the lecture.

1) Basics harmonic analysis on the discrete cube. We considered the vector space of real functions on the discrete cube \Omega_n and defined an inner product structure. We also defined the p-th norm for 1\le p\le \infty. Next we defined the Fourier-Walsh functions and showed that they form a orthonormal basis. This now leads to the Fourier-Walsh expansion for an arbitrary real function f on the discrete cube f=\sum_{S\subset[n]}\hat f(S)W_S, and we could easily verify Parseval formula.

2) Influence and Fourier. If f is a real function on \Omega_n and f=\sum \hat f(S)W_S its Fourier-Walsh expansion. We showed that I_k(f)=\sum_{S:k \in S}\hat f^2(S). It follows that I(f)=\sum_S\hat f^2(S)|S|. The Fourier-theoretic proof for I(f) ≥ 4 t (1-t) where t=μ(f) now follows easily.

3) Chinchine, hypercontractivity and the discrete isoperimetric inequality.

Next we discussed what will it take to prove the better estimate I(f) ≥ K t log t. We stated Chinchine inequality, and explained why is Chinchine inequality relevant: For Boolean functions the pth power of the p-norm does not depend on p. (It always equals t.) Therefore if t is small, the p-th norms themselves much be well apart! After spending a few moments on the history of the inequality (as told to me by Ron Blei) we discussed what kind of extension do we need, and stated  the Bonami-Gross-Beckner inequality. We use Bonami’s inequality to proof of the inequality I(f) ≥ K t log t and briefly talked about what more does it give.

Lecture 5

1) Review and examples. We reviewed what we did in the previous lecture and considered a few examples: Dictatiorship; the AND function (an opportunity to mention the uncertainty principle,) and MAJORITY on three variables. We also mentioned another connection between influences and Fourier-Walsh coefficients: for a monotone (non decreasing) Boolean function f, I_k(f) = -2\hat f(\{k\}).

2) KKL’s theorem

KKL’s theorem: There is an absolute constant K so that for every Boolean function f, with t=μ(f), there exists k, 1 ≤ k ≤ n such that

I_k(f) \ge K t (1-t) logn/n.

To prove of KKL’s theorem: we repeat, to a large extent, the steps from Lecture 4 (of course, the proof of KKL’s theorem was where this line of argument came from.) We showed that if all individual influences are below 1/sqrt n than I(f) \ge K t(1-t) \log n.

We mentioned one corollary: For Boolean function invariant under a transitive group of permutations, all individual influences are equal and therefore I(f) \ge K t (1-t)\log n.

3) Further problems

In the  last part of the lecture we mentioned seven problems regarding influence of variables and KKL’s theorem (and I added two here):

1) What can be said about balanced Boolean functions with small total influence?

2) What can be said about Boolean functions for which I(f) ≤ K t log (1/t), for some constant K, where t=μ(f)?

3) What can be said about the connection between the symmetry group and the minimum total influence?

4) What can be said about Boolean functions (1/3 ≤ μ(f)≤ 2/3, say) for which \max I_k(f) \le K log n/n.?

5) What more can be said about the vector of influences (I_1(f),I_2(f), \dots I_n(f))?

6)* What is the sharp constant in KKL’s theorem?

7)* What about edge expansions of (small) sets of vertices in general graphs?

8) Under what conditions I(f) \ge n^\beta for β >0.

9) What about influence of larger sets? In particular, what is the smallest t (as a function of n ) such that if \mu(f)=t there is a set S of variables S ≤ 0.3n with I_S(f) \ge 0.9?

(This post  is a short version, I will add details later on.)

Analysis of Boolean functions – week 2

Post on week 1; home page of the course analysis of Boolean functions

Lecture II:

We discussed two important examples that were introduced by Ben-Or and Linial: Recursive majority and  tribes.

Recursive majority (RM): F_m is a Boolean function with 3^m variables and F_{m+1} (x,y,z) = F_1(F_m(x),F_m(y),F_m(z)). For the base case we use the majority function F_1(x,y,z)=MAJ(x,y,z).

Tribes: Divide your n variables into s pairwise disjoint sets (“tribes”) of cardinality t. f=1 if for some tribe all variables equal one, and thus T=0 if for every tribe there is a variable with value ‘0’.

We note that this is not an odd function i.e. it is not symmetric with respect to switching ‘0’ and ‘1’. To have \mu=1/2 we need to set t=\log_2n - \log_2\log_2n+c. We computed the influence of every variable to be C log n/n. The tribe function is a depth-two formula of linear size and we briefly discussed what are Boolean formulas and Boolean circuits (These notions can be found in many places and also in this post.).

I states several conjectures and questions that Ben-Or and Linial raised in their 85 paper:

Conjecture 1: For every balanced Boolean function with n variables there is a variable k whose influence is \Omega (\log n/n).

Conjecture 2: For every balanced Boolean function with n variables there is a set S of n/log n variables whose influence I_S(f) is 1-o(1).

Question 3: To what extent can the bound in Conjecture 2 be improved if the function f is odd. (Namely, f(1-x_1,1-x_2,\dots, 1-x_n)=1-f(x_1,x_2,\dots, x_n).)

Our next theme was discrete isoperimetric results.  I noted the connection between total influence and edge expansion and proved the basic isoperimetric inequality: If μ(f)=t then I(f) ≥ 4 t(1-t). The proof uses the canonical paths argument.

Lecture III:

We proved using “compression” that sharp bound on I(f) as a function of t=μ(f). We made the analogy between compression and Steiner symmetrization – a classic method for proving the classical isoperimetric theorem. We discussed similar results on vertex boundary and on Talagrand-Margulis boundary (to be elaborated later in the course).

Then We proved the Harris-Kleitman inequality and showed how to deduce the fact that intersecting family of subsets of [n] with the property that the family of complements is also intersecting has at most 2^{n-2} sets.

The next topic is spectral graph theory. We proved the Hoffman bound for the largest size of an independent set in a graph G.

I mentioned graph-Laplacians and the spectral bound for expansions (Alon-Milman, Tanner)..

The proofs mentioned above are so lovely that I will add them on this page, but sometime later.

Next week I will introduce harmonic analysis on the discrete cube and give a Fourier-theoretic explanation for  the additional log (1/t) factor in the edge isoperimetric inequality.

Important announcement: Real analysis boot camp in the Simons Institute for the Theory of Computing, is part of the program in Real Analysis and Computer Science. It is taking place next week on September 9-13 and has three lecture series. All lecture series are related to the topic of the course and especially:

Poznań: Random Structures and Algorithms 2013

michal  photo

Michal Karonski (left) who built Poland’s probabilistic combinatorics group at Poznań, and a sculpture honoring the Polish mathematicians who first broke the Enigma machine (right, with David Conlon, picture taken by Jacob Fox).

I am visiting now Poznań for the 16th Conference on Random Structures and Algorithms. This bi-annually series of conferences started 30 years ago (as a satellite conference to the 1983 ICM which took place in Warsaw) and this time there was also a special celebration for Bela Bollobás 70th birthday. I was looking forward to  this first visit to Poland which is, of course, a moving experience for me. Before Poznań I spent a few days in Gdańsk visiting Robert Alicki. Today (Wednesday)  at the Poznań conference I gave a lecture on threshold phenomena and here are the slides. In the afternoon we had the traditional random run with a record number of runners. Let me briefly tell you about very few of the other lectures: Update (Thursday): A very good day, and among others a great talk of Jacob Fox on Relative Szemeredi Theorem (click for the slides from a similar talk from Budapest) where he presented a joint work with David Conlon and Yufei Zhao giving a very general and strong form of Szemeredi theorem for quasi-random sparse sets, which among other applications, leads to a much simpler proof of the Green -Tao theorem.

Mathias Schacht

Mathias Schacht gave a wonderful talk  on extremal results in random graphs (click for the slides) which describes some large recent body of highly successful research on the topic. Here are two crucial slides, and going through the whole presentation can give a very good overall picture. ms1 mt2

Vera Sós

Vera Sós gave an inspiring talk about the random nature of graphs which are extremal to the Ramsey property and connections with graph limits. Vera presented the following very interesting conjecture on graph limits. We say that a sequence of graphs (G_n) has a limit if for every k and every graph H with k vertices the proportion in G_n of induced H-subgraphs among all k-vertex induced subgraphs tend to a limit. Let us also say that (G_n) has a V-limit if for every k and every e the proportion in G_n of induced subgraphs with k vertices and e edges among all k-vertex induced subgraphs tend to a limit. Sós’ question: Is having a V-limit equivalent to having a limit. This is open even in the case of quasirandomness, namely, when the limit is given by the Erdos-Renyi model G(n,p). (Update: in this case V-limit is equivalent to limit, as several participants of the conference observed.) Both a positive and a negative answer to this fundamental question would lead to many further (different) open problems.

Joel Spencer

Joel Spencer gave a great (blackboard) talk about algorithmic aspects of the probabilistic method, and how existence theorems via the probabilistic method now often require complicated randomized algorithm. Joel mentioned his famous six standard deviation theorem. In this case, Joel conjectured thirty years ago that there is no efficient algorithm to find the coloring promised by his theorem. Joel was delighted to see his conjecture being refuted first by Nikhil Bansal (who found an algorithm whose proof depends on the theorem) and then later by Shachar Lovett and  Raghu Meka (who found a new algorithm giving a new proof) . In fact, Joel said, having his conjecture disproved is even more delightful than having it proved. Based on this experience Joel and I are now proposing another conjecture: Kalai-Spencer (pre)conjecture: Every existence statement proved by the probabilistic method can be complemented by an efficient (possibly randomized) algorithm. By “complemented by an efficient algorithm” we mean that there is an efficient(polynomial time)  randomized algorithm to create the promised object with high probability.  We refer to it as a preconjecture since the term “the probabilistic method” is not entirely well-defined. But it may be possible to put this conjecture on formal grounds, and to discuss it informally even before.