Category Archives: Computer Science and Optimization

Analysis of Boolean Functions week 5 and 6

Lecture 7

First passage percolation

1)  Models of percolation.

We talked about percolation introduced by Broadbent and Hammersley in 1957. The basic model is a model of random subgraphs of a grid in n-dimensional space. (Other graphs were considered later as well.) Here, a grid is a graph whose vertices have integers coordinates and where two vertices are adjacent if their Euclidean distance is one. Every edge of the grid-graph is taken (or is “open” in the percolation jargon) with the same probability p, independently. We mentioned some basic questions – is there an infinite component? How many infinite components are there? What is the probability that the origin belongs to such an infinite component as a function of p?

I mentioned two results: The first  is Kesten’s celebrated result that the critical probability for planar percolation is 1/2. The other by Burton and Keane is that in very general situations almost surely there is a unique infinite component or none at all. This was a good point to mention a famous conjecture- The dying percolation conjecture (especially in dimension 3) which asserts that at the critical probability there is no infinite component.

We will come back to this basic model of percolation later in the course, but for now we moved to a related more recent model.

2) First passage percolation

We talked about first passage percolation introduced by Hammersley and Welsh in 1965. Again we consider the infinite graph of a grid and this time we let the length of every edge be 1 with probability 1/2 and 2 with probability 1/2 (independently). These weights describe a random metric on this infinite graph that we wish to understand. We consider two vertices (0,0) and (v,0) (for high dimension the second entry can account for a (d-1) dimensional vectors, but we can restrict our attention to d=2) and we let D(x) be the distance between these two vectors. We explained how D is an integer values function on a discrete cube with Liphshitz constant 1. The question we want to address is : What is the variance of D?

Why do we study the variance, when we do not know exactly the expectation, you may ask? (I remember Lerry Shepp asking this when I talked about it at Bell Labs in the early 90s.) One answer is that we know that the expectation of D is linear, and for the variance we do not know how it behaves. Second, we expect that telling the expectation precisely will depend on the model while the way the variance grows and perhaps D‘s limiting distribution, will be universal (say, for dimension 2). And third, we do not give up on the expectation as well. 

Here is what we showed:

1) From the inequality var(D)=\sum_{S\ne \emptyset}\hat D^2(S)\le\sum \hat D^2(S)|S| we derived Kesten’s bound var (D) =O(v).

2) We considered the value s so that \mu(D>s)=t, and showed by the basic inequality above that the variance of D conditioned on D>s is also bounded by v. This corresponds to exponential tail estimate proved by Kesten.

3) Using hypercontractivity we showed that the variance of D conditioned on D>s is actually bounded above by v/log (1/t) which corresponds to Talagrand’s sub-Gaussian tail-estimate.

4) Almost finally based on a certain very plausible lemma we used hypercontructivity to show that most Fourier coefficients of D are above the log v level, improving the variance upper bound to O(v/log v).

5) Since the plausible lemma is still open (see this MO question) we showed how we can “shortcut” the lemma and prove the upper bound without it.

The major open question

It is an open question to give an upper bound of v^{1-\epsilon} or even v^{2/3} which is the expected answer in dimension two. Michel Ledoux wisely proposes to prove it just for directed percolation in the plane (where all edges are directed up and right) from (0,0) to (v,v) where the edge length is Gaussian or Bernoulli.

Lecture 8

Three Further Applications of Discrete Fourier Analysis (without hypercontractivity)

The three next topics will use Fourier but not hypercontractivity. We start by talking about them.

1) The cap-set problem, some perspective and a little more extremal combinatorics

We talked about Roth theorem, the density Hales Jewett theorem,  the Erdos-Rado delta-system theorem and conjecture. We mentioned linearity testing.

2) Upper bounds for error-correcting codes

This was a good place to mention (and easily prove) a fundamental property used in both these cases:  The Fourier transform of convolutions of two functions f and g is the product of the Fourier transform of f and of g.

3) Social choice and Arrow’s theorem

The Fourier theoretic proof for Arrow’s theorem uses only Parseval’s formula so we are going to start with that.

Fourier-theoretic proof of Arrows theorem and related results.

We talked a little about Condorcet(we will later give a more detailed introduction to social choice). We mentioned Condorcet’s paradox, Condorcet’s Jury Theorem, and the notion of Condorcet winner.

Next we formulated Arrow’s theorem.  Lecture 9 was devoted to a Fourier-theoretic proof of Arrow theorem (in the balanced case). You can find it discussed in this blog post by Noam Nisan.  Lecture 10 mentioned a few further application of the Fourier method related to Arrow’s theorem, as well as a simple combinatorial proof of Arrow’s theorem in full generality. For the Fourier proof of Arrow’s theorem we showed that a Boolean function with all its non-zero Fourier coefficients on levels 0 and 1 is constant, dictatorship or anti-dictatorship. This time we formulated FKN theorem and showed how it implies a stability version of Arrow’s theorem in the neutral case.

Real Analysis Introductory Mini-courses at Simons Institute

The Real Analysis ‘Boot Camp’ included three excellent mini-courses.

Inapproximability of Constraint Satisfaction Problems (5 lectures)
Johan Håstad (KTH Royal Institute of Technology)

(Lecture I, Lecture II, Lecture III, Lecture IV, Lecture V)

jh-choclate

Unlike more traditional ‘boot camps’ Johan rewarded answers and questions by chocolates (those are unavailable for audience of the video).

Starting from the PCP-theorem (which we will take as given) we show how to design and analyze efficient PCPs for NP-problems. We describe how an efficient PCP using small amounts of randomness can be turned into an inapproximability result for a maximum constraint satisfaction problem where each constraint corresponds to the acceptance criterion of the PCP. We then discuss how to design efficient PCPs with perfect completeness in some interesting cases like proving the hardness of approximating satisfiable instances of 3-Sat.

We go on to discuss gadget construction and how to obtain optimal reductions between approximation problems. We present Chan’s result on how to take products of PCPs to get hardness for very sparse CSPs and give some preliminary new results using these predicates as a basis for a gadget reduction.

Finally we discuss approximation in a different measure, and in particular the following problem. Given a (2k+1)-CNF formula which admits an assignment that satisfies k literal in each clause, is it possible to efficiently find a standard satisfying assignment?

Analytic Methods for Supervised Learning​ (4 lectures)
Adam Klivans (University of Texas, Austin)

(Lecture I, Lecture II, Lecture III, Lecture IV) additional related lecture by Adam on Moment matching polynomials.

In this mini-course we will show how to use tools from analysis and probability (e.g., contraction, surface area and limit theorems) to develop efficient algorithms for supervised learning problems with respect to well-studied probability distributions (e.g., Gaussians). One area of focus will be understanding the minimal assumptions needed for convex relaxations of certain learning problems (thought to be hard in the worst-case) to become tractable.

Introduction to Analysis on the Discrete Cube (4 lectures)
Krzysztof Oleszkiewicz (University of Warsaw)

(Lecture I, Lecture II, Lecture III, Lecture IV) Here are the slides for the lecture which contain material for 1-2 additional lectures.

The basic notions and ideas of analysis on the discrete cube will be discussed, in an elementary and mostly self-contained exposition. These include the Walsh-Fourier expansion, random walk and its connection to the heat semigroup, hypercontractivity and related functional inequalities, influences, the invariance principle and its application to the Majority is Stablest problem. The mini-course will also contain some other applications and examples, as well as several open questions.

Analysis of Boolean Functions – week 4

Lecture 6

Last week we discussed two applications of the Fourier-Walsh plus hypercontractivity method and in this lecture we will discuss one additional application:

The lecture was based on a 5-pages paper by Ehud Friedgut and Jeff Kahn: On the number of copies of one hypergraph in another  Israel Journal of Mathematics Vol. 105 (1998) pp. 251-256.

In this application our method  has nice but not optimal consequences, and another method – applying Shearer’s inequality, gives the optimal result.

1) The question: Given a hypegraph H with k vertices what is the maximum number of labeled copies of H inside a hypegraph G with \ell edges.

There are cases where the answer is known with great precision. The Kruskal-Katona theorem gives the answer when H is the hypegraph whose edges are all the r-subsets of a vertex set V of size k.

In our study  we will fix H and care only about the asymptotic behavior as a function of \ell. We have a simple upper bound of \ell^k and the question is to identify the correct exponent.

2) Stable sets and fractional stable sets. A stable set S in a hypergraph H (also called independent set) is a set of vertices so that every edge contains at most one vertex from S. The stable number \alpha(H) of a hypergraph H is the maximum size of a stable set.

Now comes an idea which is very important in graph theory, of considering the linear programming relaxation of combinatorial parameters. A fractional stable set is an assignment of non negative weights to the vertices, so the sum of weights in every edge is at most one. So this is a “fuzzy set” of a kind the membership of a vertex to a set is described by a number in the interval [0,1] rather than the set {0,1}. The size of a fractional stable set is the sum of weights of all vertices. The fractional stable number \alpha^*(H) is the minimum size of a fractional stable set.

3) Cover and fractional cover numbers

We next described covering and fractional covering (of vertices by edges) in hypergraphs, the covering number \rho(H) of a hypergraph H and the fractional covering number \rho^*(H). Linear programming duality gives that the fractional stable number  is equal to the fractional covering number.

4) The answer:

Friedgut-Kahn theorem: The number of copies of H is a hypergraph G with \ell edges is O(\ell^{\rho^*(H)}) and this bound is sharp.

The case of graphs was proved (in a different language) by Noga Alon in his M. Sc. thesis, and Noga’s first publication  On the number of subgraphs of prescribed type pf graphs with a given number of edges,( Israel J. Math. 38(1981), 116-130).) Part of the challenge was to find the right extension for Alon’s theorem.

5) The lower bound.

Next we explained the nice and simple construction giving the lower bound, which is based on the weights realizing the fractional stable number of H.

6) Bonami’s inequality in a dual form

Our next thing was to state a consequence of Bonami’s hypercontractive inequality, which is a direct extension of Chinchine inequality. Then we showed a weaker upper bound than the actual theorem based on the Bonami inequality .

It is an interesting open question to apply harmonic analysis to the general case. (I believe it is tractable.)

7) Traces and Shearer’s lemma

Next we defined the trace of a hypergraph on a subset W of vertex-set V and stated Shearer’s lemma.

8) More about traces and a little more extremal combinatorics

Not having enough time to complete the proof of Friedgut-Kahn theorem using Shearer’s lemma, we proved the fundamental Sauer-Shelah inequality (see this post), and stated Frankl’s conjecture (see this post, and this one  (sec 3d) ).

Lecture 7

We started with the proof of the Friedgut-Kahn bound using Shearer’s lemma. Then we explained the simple connection between influences and traces and mentioned the connection of Shearer’s lemma (and the Loomis-Whitney theorem) to edge-sioperimetry.

Our next application: First Passage percolation. We gave a short introduction to models of percolation and started to discuss our fourth application of the Fourier+hypercontractivity method: An upper bound for the variance of first passage percolation. Here the method gives the best known result, but unlike KKL’s theorem the result is not sharp. We are third way toward a proof so I may write about it next time. The discussion of first passage percolation is based on the paper First Passage Percolation Has Sublinear Distance Variance by Benjamini, Kalai and Schramm.

Analysis of Boolean Functions – Week 3

Lecture 4

In the third week we moved directly to the course’s “punchline” – the use of Fourier-Walsh expansion of Boolean functions and the use of Hypercontractivity.

Before that we  started with  a very nice discrete isoperimetric question on a graph which is very much related to the graph of the discrete cube.

Consider the graph G whose vertices are 0-1 vectors of length n with precisely r ‘1’s, and with edges corresponding to vertices of Hamming distance two. (Which is the minimal Hamming distance between distinct vertices.) Given a set A of m vertices, how small can E(A, \bar A) be? (We already talked about intersecting families of sets of constant size – the Erdos-Ko-Rado theorem and, in general, it is a nice challenge to extend some of the ideas/methods of the course to constant weight situation.)

And now for the main part of the lecture.

1) Basics harmonic analysis on the discrete cube. We considered the vector space of real functions on the discrete cube \Omega_n and defined an inner product structure. We also defined the p-th norm for 1\le p\le \infty. Next we defined the Fourier-Walsh functions and showed that they form a orthonormal basis. This now leads to the Fourier-Walsh expansion for an arbitrary real function f on the discrete cube f=\sum_{S\subset[n]}\hat f(S)W_S, and we could easily verify Parseval formula.

2) Influence and Fourier. If f is a real function on \Omega_n and f=\sum \hat f(S)W_S its Fourier-Walsh expansion. We showed that I_k(f)=\sum_{S:k \in S}\hat f^2(S). It follows that I(f)=\sum_S\hat f^2(S)|S|. The Fourier-theoretic proof for I(f) ≥ 4 t (1-t) where t=μ(f) now follows easily.

3) Chinchine, hypercontractivity and the discrete isoperimetric inequality.

Next we discussed what will it take to prove the better estimate I(f) ≥ K t log t. We stated Chinchine inequality, and explained why is Chinchine inequality relevant: For Boolean functions the pth power of the p-norm does not depend on p. (It always equals t.) Therefore if t is small, the p-th norms themselves much be well apart! After spending a few moments on the history of the inequality (as told to me by Ron Blei) we discussed what kind of extension do we need, and stated  the Bonami-Gross-Beckner inequality. We use Bonami’s inequality to proof of the inequality I(f) ≥ K t log t and briefly talked about what more does it give.

Lecture 5

1) Review and examples. We reviewed what we did in the previous lecture and considered a few examples: Dictatiorship; the AND function (an opportunity to mention the uncertainty principle,) and MAJORITY on three variables. We also mentioned another connection between influences and Fourier-Walsh coefficients: for a monotone (non decreasing) Boolean function f, I_k(f) = -2\hat f(\{k\}).

2) KKL’s theorem

KKL’s theorem: There is an absolute constant K so that for every Boolean function f, with t=μ(f), there exists k, 1 ≤ k ≤ n such that

I_k(f) \ge K t (1-t) logn/n.

To prove of KKL’s theorem: we repeat, to a large extent, the steps from Lecture 4 (of course, the proof of KKL’s theorem was where this line of argument came from.) We showed that if all individual influences are below 1/sqrt n than I(f) \ge K t(1-t) \log n.

We mentioned one corollary: For Boolean function invariant under a transitive group of permutations, all individual influences are equal and therefore I(f) \ge K t (1-t)\log n.

3) Further problems

In the  last part of the lecture we mentioned seven problems regarding influence of variables and KKL’s theorem (and I added two here):

1) What can be said about balanced Boolean functions with small total influence?

2) What can be said about Boolean functions for which I(f) ≤ K t log (1/t), for some constant K, where t=μ(f)?

3) What can be said about the connection between the symmetry group and the minimum total influence?

4) What can be said about Boolean functions (1/3 ≤ μ(f)≤ 2/3, say) for which \max I_k(f) \le K log n/n.?

5) What more can be said about the vector of influences (I_1(f),I_2(f), \dots I_n(f))?

6)* What is the sharp constant in KKL’s theorem?

7)* What about edge expansions of (small) sets of vertices in general graphs?

8) Under what conditions I(f) \ge n^\beta for β >0.

9) What about influence of larger sets? In particular, what is the smallest t (as a function of n ) such that if \mu(f)=t there is a set S of variables S ≤ 0.3n with I_S(f) \ge 0.9?

(This post  is a short version, I will add details later on.)

Analysis of Boolean functions – week 2

Post on week 1; home page of the course analysis of Boolean functions

Lecture II:

We discussed two important examples that were introduced by Ben-Or and Linial: Recursive majority and  tribes.

Recursive majority (RM): F_m is a Boolean function with 3^m variables and F_{m+1} (x,y,z) = F_1(F_m(x),F_m(y),F_m(z)). For the base case we use the majority function F_1(x,y,z)=MAJ(x,y,z).

Tribes: Divide your n variables into s pairwise disjoint sets (“tribes”) of cardinality t. f=1 if for some tribe all variables equal one, and thus T=0 if for every tribe there is a variable with value ‘0’.

We note that this is not an odd function i.e. it is not symmetric with respect to switching ‘0’ and ‘1’. To have \mu=1/2 we need to set t=\log_2n - \log_2\log_2n+c. We computed the influence of every variable to be C log n/n. The tribe function is a depth-two formula of linear size and we briefly discussed what are Boolean formulas and Boolean circuits (These notions can be found in many places and also in this post.).

I states several conjectures and questions that Ben-Or and Linial raised in their 85 paper:

Conjecture 1: For every balanced Boolean function with n variables there is a variable k whose influence is \Omega (\log n/n).

Conjecture 2: For every balanced Boolean function with n variables there is a set S of n/log n variables whose influence I_S(f) is 1-o(1).

Question 3: To what extent can the bound in Conjecture 2 be improved if the function f is odd. (Namely, f(1-x_1,1-x_2,\dots, 1-x_n)=1-f(x_1,x_2,\dots, x_n).)

Our next theme was discrete isoperimetric results.  I noted the connection between total influence and edge expansion and proved the basic isoperimetric inequality: If μ(f)=t then I(f) ≥ 4 t(1-t). The proof uses the canonical paths argument.

Lecture III:

We proved using “compression” that sharp bound on I(f) as a function of t=μ(f). We made the analogy between compression and Steiner symmetrization – a classic method for proving the classical isoperimetric theorem. We discussed similar results on vertex boundary and on Talagrand-Margulis boundary (to be elaborated later in the course).

Then We proved the Harris-Kleitman inequality and showed how to deduce the fact that intersecting family of subsets of [n] with the property that the family of complements is also intersecting has at most 2^{n-2} sets.

The next topic is spectral graph theory. We proved the Hoffman bound for the largest size of an independent set in a graph G.

I mentioned graph-Laplacians and the spectral bound for expansions (Alon-Milman, Tanner)..

The proofs mentioned above are so lovely that I will add them on this page, but sometime later.

Next week I will introduce harmonic analysis on the discrete cube and give a Fourier-theoretic explanation for  the additional log (1/t) factor in the edge isoperimetric inequality.

Important announcement: Real analysis boot camp in the Simons Institute for the Theory of Computing, is part of the program in Real Analysis and Computer Science. It is taking place next week on September 9-13 and has three lecture series. All lecture series are related to the topic of the course and especially:

Analysis of Boolean Functions – week 1

Home page of the course.

In the first lecture I defined the discrete n-dimensional cube and  Boolean functions. Then I moved to discuss five problems in extremal combinatorics dealing with intersecting families of sets.

1) The largest possible intersecting family of subsets of [n];

2) The largest possible intersecting family of subsets of [n] so that the family of complements is also intersecting;

3) The largest possible family of graphs on v vertices such that each two graphs in the family contains a common triangle;

4) Chvatal’s conjecture regarding the maximum size of an intersecting family of sets contained in an ideal of sets;

5) Erdos-Ko-Rado Theorem.

Exercise: Prove one of the following

a) The Harris-Kleitman’s inequality

b) (from the H-K inequality) Every family of subsets of [n] with the property that every two sets have non-empty intersection and no full union contains at most 2^{n-2} sets.

More reading: this post :”Extremal combinatorics I: extremal problems on set systems“. Spoiler: The formulation of Chvatal’s conjecture but also the answer to the second exercise can be found on this post: Extremal combinatorics III: some basic theorems. (See also peoblen 25 in the 1972 paper Selected combinatorial research problems by Chvatal, Klarner and Knuth.)

feige

I moved to discuss the problem of collective coin flipping and the notion of influence as defined by Ben-Or and Linial. I mentioned the Baton-passing protocol, the Alon-Naor result, and Feige’s two-rooms protocol.

More reading: this post :” Nati’s influence“. The original paper of Ben-Or Linial:  Collective coin flipping, M. Ben-Or  and N. Linial, in “Randomness and    Computation” (S. Micali ed.) Academic Press, New York, 1989, pp.    91-115.

BosonSampling and (BKS) Noise Sensitivity

Update (Nov 2014): Noise sensitivity of BosonSampling and computational complexity of noisy BosonSampling are studied in this paper by Guy Kindler and me. Some of my predictions from this post turned out to be false. In particular the noisy BosonSampling is not  flat and it does depend on the input matrix.  However when the noise level is a constant BosonSampling is in P, and when it is above 1 over the number of bosons, we cannot expect robust experimental outcomes.

 

—–

 

Following are some preliminary observations connecting BosonSampling, an interesting  computational task that quantum computers can perform (that we discussed in this post), and noise-sensitivity in the sense of Benjamini, Schramm, and myself (that we discussed here and here.)

BosonSampling and computational-complexity hierarchy-collapse

Suppose that you start with n bosons each can have m locations. The i-th boson is in superposition and occupies the j-th location with complex weight a_{ij}. The bosons are indistinguishable which makes the weight for a certain occupation pattern proportional to the permanent of a certain n by n submatrix of the n by m matrix of weights.

Boson Sampling is a task that a quantum computer can perform. As a matter of fact, it only requires a “boson machine” which represents only a fragment of quantum computation. A boson machine is a quantum computer which only manipulates indistinguishable bosons with gated described by phaseshifters and beamsplitters.

BosonSampling and boson machines were studied in a recent paper The Computational Complexity of Linear Optics of Scott Aaronson and Alex Arkhipov (AA). They proved (Theorem 1 in the paper) that if (exact) BosonSampling can be performed by a classical computer then this implies a collapse of the computational-complexity polynomial hierarchy (PH, for short). This result adds to a similar result achieved independently by Michael J. Bremner, Richard Jozsa, and Dan J. Shepherd (BJS) in the paper entitled: “Classical simulation of commuting quantum computations implies collapse of the polynomial hierarchy,” and to older results by  Barbara Terhal and David DiVincenzo (TD) in the paper Adaptive quantum computation, constant depth quantum circuits and Arthur-Merlin games, Quant. Inf. Comp. 4, 134-145 (2004).

Since universal quantum computers can achieve BosonSampling (and the other related computational tasks considered by TD and BJS), this is a very strong indication for the computational complexity advantage of quantum computers which arguably brings us with quantum computers to the “cozy neighborhood” of NP-hardness.

Noisy quantum computers with quantum fault-tolerance are also capable of exact BosonSampling and this strong computational-complexity quantum-superiority applies to them as well.

Realistic BosonSampling and Gaussian Permanent Estimation (GPE)

Aaronson an Arkhipov considered the following question that they referred to as Gaussian Permanent Approximation.

Problem (Problem 2 from AA’s paper): (|GPE|_{\pm}^2): Given as imput a matrix {\cal N}(0,1)_{\bf C}^{n \times n} of i.i.d Gaussians,together with error bounds ε, δ > o, estimate to within additive error \pm \epsilon n! with probability at leat 1-δ over X, in poly(n,1/\epsilon,1/\delta) time.

They conjectured that such Gaussian Permanent Approximation is computationally hard and showed (Theorem 3) that this would imply that sampling w.r.t. states achievable by boson machines cannot even be approximated by classical computing (unless PH collapses). They regarded questions about approximation more realistic in the context of decoherence where we cannot expect exact sampling.

Scott Aaronson also expressed guarded optimism that even without quantum fault-tolerance BosonSampling can be demonstrated by boson machines for 20-30 bosons, leading to strong experimental evidence for computational advantage of quantum computers (or, if you wish, against the efficient Church-Turing thesis).

Is it so?

More realistic BosonSampling and Noisy Gaussian Permanent Estimation (NGPE)

Let us consider the following variation that we refer to as Noisy Gaussian Permanent Estimation:

Problem 2′: (|NGPE|_{\pm}^2): Given as imput a matrix M= {\cal N}(0,1)_{\bf C}^{n \times n} of i.i.d Gaussians, and a parameter t>0 let NPER (M),  be the expected value of the permanent for \sqrt {1-t^2}M+E where E= {\cal N}(0,t)_{\bf C}^{n \times n}.  Given the input matrix M together with error bounds ε, δ > o, estimate NPER(M) to within additive error \pm \epsilon n! with probability at leat 1-δ over X, in poly(n,1/\epsilon,1/\delta) time.

Problem 2′ seems more relevant for noisy boson machines (without fault-tolerance). The noisy state of the computer is based on every single boson  being slightly mixed, and the permanent computation will average these individual mixtures. We can consider the relevant value for t to be a small constant. Can we expect Problem 2′ to be hard?

The answer for Question 2′ is surprising. I expect that even when t is very very tiny, namely t=n^{-\beta} for \beta <1, the expected value of NPER(M) (essentially) does not depend at all on M!

Noise Sensitivity a la Benjamini, Kalai and Schramm

Noise sensitivity for the sense described here for Boolean functions was studied in a paper by Benjamini Schramm and me in 1999.  (A related notion was studied by Tsirelson and Vershik.) Lectures on noise sensitivity and percolation is a new beautiful monograph by Christophe Garban and Jeff Steif which contains a description of noise sensitivity. The setting extends easily to the Gaussian case. See this paper by Kindler and O’donnell for the Gaussian case. In 2007, Ofer Zeituni and I studied the noise sensitivity in the Gaussian model of the maximal eigenvalue of random Gaussian matrices (but did not write it up).

NS

Noise sensitivity depends on the degree of the support of the Fourier expansion. For determinants or permanents of an n by n matrices the basic formulas as sums of generalized diagonals describe the Fourier expansion,  so the Fourier coefficients are supported on degree-n monomials. This implies that the determinant and the permanent are very noise sensitive.

Noisy Gaussian Permanent Estimation is easy

Noisy Gaussian Permanent Estimation is easy, even for very small amount of noise, because the outcome does not depend at all on the input. It is an interesting question what is the hardness of NGPE is when the noise is below the level of noise sensitivity.

Update (March, 2014) Exploring the connection between BosonSampling and BKS-noise sensitivity shows that the picture drawn here is incorrect. Indeed, the square of the permanent is not noise stable even when the noise is fairly small and this shows that the noisy distribution does not approximate the noiseless distribution. However the noisy distribution does depend on the input!

 

AA’s protocol and experimental BosonSampling

Scott and Alex proposed a simple experiment described as follows : “An important motivation for our results is that they immediately suggest a linear-optics experiment, which would use simple optical elements (beamsplitters and phaseshifters) to induce a Haar-random m \times m unitary transformation U on an input state of n photons, and would then check that the probabilities of various final states of the photons correspond to the permanents of n \times n submatrices, as predicted by quantum mechanics.”

Recently, four groups carried out interesting BosonSampling experiments with 3 bosons (thus for permanents of 3 by 3 matrices.) (See this post on Scott’s blog.)

BKS-noise sensitivity is giving  simple predictions on how things will behave as a function of the number of bosons and this can be tested even with experiments with very small number of bosons. When you increase the number of bosons the distribution will quickly become uniform for the various final states. The correlation between the probabilities and the value corresponding to permanents will rapidly go to zero.

Some follow-up questions

Here are a few interesting questions that deserve further study.

1) Does problem 2′ capture the general behavior of noisy boson machines? To what generality noise sensitivity applies for general functions described by Boson sampling distributions?

(There are several versions for photons-based quantum computers including even an important  model by Knill, Laflamme, and Milburn that support universal quantum computation. The relevance of BKS noise-sensitivity should be explored carefully for the various versions.)

2) Is the connection with noise sensitivity relevant to the possibility to have boson machines with fault tolerance?

3) What is the Gaussian-quantum analog for BKS’s theorem asserting that noise sensitivity is the law unless  we have substantial correlation with the majority function?

4) What can be said about noise-sensitivity of measurements for other quantum codes?

A few more remarks:

More regarding noisy boson machines and quantum fault tolerance

Noisy boson machines and BosonSampling are related to various other issues regarding quantum fault-tolerance. See my added recent remarks (about the issue of synchronization, and possible modeling using smoothed Lindblad evolutions) to my old post on AA’s work.

Noise sensitivity and the special role of the majority function

bks

The main result of Itai, Oded, and me was that a Boolean function which is not noise sensitive must have a substantial correlation with the majority function. Noise sensitivity and the special role of majority for it gave me some motivation to look at quantum fault-tolerance in 2005  (see also this post,) and this is briefly discussed in my first paper on the subject, but until now I didn’t find an actual connection between quantum fault-tolerance and BKS-noise-sensitivity.

Censorship

It is an interesting question which bosonic states are realistic, and it came up in some of my papers and in the discussion with Aram Harrow and Steve Flammia (and their paper on my “Conjecture C”).

A sort of conclusion

BosonSampling was offered as a way to prove that quantum mechanics allows a computational advantage without using the computational advantage for actual algorithmic purpose. If you wish, the AA’s protocol is offered as a sort of zero-knowledge proof that the efficient Church-Turing thesis is false.  It is a beautiful idea that attracted interest and good subsequent work from theoreticians and experimentalists. If the ideas described here are correct, BosonSampling and boson machines may give a clear understanding based on BKS noise-sensitivity for why quantum mechanics does not offer computational superiority (at least not without the magic of quantum fault-tolerance).

Avi’s joke and common sense

Here is a quote from AA referring to a joke by Avi Wigderson: “Besides bosons, the other basic particles in the universe are fermions; these include matter particles such as quarks and electrons. Remarkably, the amplitudes for n-fermion processes are given not by permanents but by determinants of n×n matrices. Despite the similarity of their definitions, it is well-known that the permanent and determinant differ dramatically in their computational properties; the former is #P-complete while the latter is in P. In a lecture in 2000, Wigderson called attention to this striking connection between the boson and fermion dichotomy of physics and the permanent-determinant dichotomy of computer science. He joked that, between bosons and fermions, ‘the bosons got the harder job.’ One could view this paper as a formalization of Wigderson’s joke.”

While sharing the admiration to Avi in general and Avi’s jokes in particular, if we do want to take Avi’s joke seriously (as we always should), then the common-sense approach would be first to try to understand why is it that nature treats bosons and fermions quite equally and the dramatic computational distinction is not manifested at all. The answer is that a crucial ingredient for a computational model is the modeling of noise/errors, and that noise-sensitivity makes bosons and fermions quite similar physically and computationally.

Eigenvalues, determinants, and Yuval Filmus

It is an interesting question (that I asked over Mathoverflow) to understand the Fourier expansion of the set of eigenvalues, the maximum eigenvalue and related functions. At a later point,  last May,  I was curious about the Fourier expansion of the determinant, and for the Boolean case I noticed remarkable properties of its Fourier expansion. So I decided to ask Yuval Filmus about it:

Dear Yuval

 I am curious about the following. Let D be the function defined on {-1,1}^n^2
which associates to every +1/1 matrix its determinant.
What can be said about the Fourier transform of D? It looks to me that easy arguments shows that the Fourier transform is supported only on subsets of the entries
so that in every raw and columns there are odd number of entries. Likely there are even further restrictions that I would be interested to explore.
Do you know anything about it?
best Gil

Yuval’s answer came a couple of hours later like a cold shower:

Hi Gil,

The determinant is a sum of products of generalized diagonals.
Each generalized diagonal is just a Fourier character, and they are all different.

In other words, the usual formula for the determinant *is* its Fourier transform

This reminded me of a lovely story of how I introduced Moni Naor to himself that I should tell sometime.

What else can a quantum computer sample?

The ability of quantum computers to sample (exactly) random complex Gaussian matrices according to the value of their permanents is truly amazing! If you are not too impressed by efficient factoring but still do not believe that QC can reach the neighborhood of NP-hard problems you may find this possibility too good to be true.

I am curious if sharp P reductions give us further results of this nature. For example,  can a QC sample random 3-SAT formulas (by a uniform distribution or by a certain other distribution coming from sharp-P reductions) according to the number of their satisfying assignments?

Can QC sample integer polytopes by their volume or by the number of integer points in them? Graphs by the number of 4-colorings?