In this post I would like to report about an important paper (posted Dec. 2021) by Xun Gao, Marcin Kalinowski, Chi-Ning Chou, Mikhail D. Lukin, Boaz Barak, and Soonwon Choi. (I am thankful to Xun Gao and Boaz Barak for helpful discussions).
Limitations of Linear Cross-Entropy as a Measure for Quantum Advantage
Here is the abstract:
Demonstrating quantum advantage requires experimental implementation of a computational task that is hard to achieve using state-of-the-art classical systems. One approach is to perform sampling from a probability distribution associated with a class of highly entangled many-body wavefunctions. It has been suggested that this approach can be certified with the Linear Cross-Entropy Benchmark (XEB). We critically examine this notion. First, in a “benign” setting where an honest implementation of noisy quantum circuits is assumed, we characterize the conditions under which the XEB approximates the fidelity. Second, in an “adversarial” setting where all possible classical algorithms are considered for comparison, we show that achieving relatively high XEB values does not imply faithful simulation of quantum dynamics. We present an efficient classical algorithm that, with 1 GPU within 2s, yields high XEB values, namely 2-12% of those obtained in experiments. By identifying and exploiting several vulnerabilities of the XEB, we achieve high XEB values without full simulation of quantum circuits. Remarkably, our algorithm features better scaling with the system size than noisy quantum devices for commonly studied random circuit ensembles. To quantitatively explain the success of our algorithm and the limitations of the XEB, we use a theoretical framework in which the average XEB and fidelity are mapped to statistical models. We illustrate the relation between the XEB and the fidelity for quantum circuits in various architectures, with different gate choices, and in the presence of noise. Our results show that XEB’s utility as a proxy for fidelity hinges on several conditions, which must be checked in the benign setting but cannot be assumed in the adversarial setting. Thus, the XEB alone has limited utility as a benchmark for quantum advantage. We discuss ways to overcome these limitations.
I. Three parameters for noisy quantum circuits:
- F – The fidelity. If
is the ideal state and
is the noisy state, then the fidelity F is defined by
,
- XEB – the linear cross entropy estimator for the fidelity
- P(No err) – The probability of no errors (denoted in the paper by
).
II. Some issues:
a) The fidelity F cannot be read from the distribution of the samples produced by the quantum computer. Suppose we are given an unlimited number of samples (or a large number of samples for which the empirical distribution is a good approximation to the noisy distribution), what is the best way to estimate the fidelity?
b) If we have a polynomial number of samples in (1/F). What are the best ways to estimate the fidelity?
c) What in general are the relations between F, XEB, and P(No err)?
III. A basic observation:
A basic observation of the paper is that when you apply depolarizing noise to the gates, the resulting distribution has a positive correlation to the ideal distribution. (Hence this leads to positive XEB value.) The basic idea (as I see it) is simple: let’s consider 1-gate which is a certain unitary operator U.
The space of such operators is spanned by I, X, Y, and Z. Let us assume that applying to U unitary noise, say, Y, will lead to a new quantum circuit which gives uncorrelated samples and fidelity estimator 0. However, applying (I+X+Y+Z), which replace the qubit with a maximal entropy state is a very basic form of noise (depolarizing noise on the qubit on which the gate acts) and this form of noise is expected to slash the fidelity estimator by four and not send it to zero. (For 2-gates the story is similar but this time there are 16 basic possibilities for unitary noise so we can expect that a depolarizing noise will slash the linear cross entropy estimator by 1/16 (and not to zero).)
IV. Additional observations
The paper by Gao et al. describes various additional reasons for which the effect of gate errors will lead to positive correlations with the ideal distribution, and in general will lead to strict inequalities
(1) XEB > P(No err)
and
(2) XEB > F > P(No err)
First, it turns out that even a single unitary gate error can contribute to the increase of XEB (and also to increase F), and, moreover, the effect of two (or more) gate errors can also lead to an increased XEB (and also an increased F.)
In expectation we can actually expect.
(3) XEB > F > P(No err)
This is demonstrated by Figure 1 of the paper.
V. The idea behind the spoofing algorithm
The way Gao, Kalinowski, Chou, Lukin, Barak, and Choi used this observation is by applying such depolarization noise on a set of 2-gates that would split the circuit into two parts. This will lead to a sort of “patched” circuit for which one can make the computation separately on every patch, which gives much quicker classical algorithms.
VI The asymptotic behavior
One interesting aspect of the paper is that (as far as I could understand) when the number of qubits grows the “quantum advantage” , namely the advantage of the quantum algorithms over the classical algorithms, diminishes. As Gao et al. write
“Remarkably, the XEB value of our algorithm generally improves for larger quantum circuits, whereas that of noisy quantum devices quickly deteriorates. Such scaling continues to hold when the number of qubits is increased while the depth of the circuit and the error-per-gate are fixed…”
Remark: This conclusion assumes that you need enough samples to verify the fidelity. Philosophically one can claim that the quantum advantage may apply for producing *one* sample; After all, your argument is based anyway on extrapolation, and for supremacy experiments you cannot verify even the individual amplitudes. (I made this claim in a discussion with Daniel Lidar regarding the very nice paper by Zlokapa, Boixo, and Lidar, Boundaries of quantum supremacy via random circuit sampling, but I couldn’t persuade Daniel.)
VII Relevance to the statistical analysis and the concerns regarding the Google 2019 experiment
The paper is relevant to two important aspects of my statistical science paper with Yosi Rinott and Tomer Shoham and to our work which puts the Google 2019 experiment under scrutiny.
- The fact that XEB is systematically larger than P(No err) may support the concern that the precise agreement of the XEB estimator with the P(No err) computation (Formula (77)) is “too good to be true,” namely, it fits too well with the researcher’s expectations rather than with physical reality.
- The basic observation (III) implies that the exponential decay for the Fourier coefficients with the degrees, that we attributed only to readout errors is also caused by gate errors. Subsequently, the Fourier description of the data that we regarded as providing confirmation to the Google claim (see, Figure 2 in our recent paper) actually appears to show that the empirical data does not fit the theory.
Apropos Fourier methods and Xun Gao: The first I heard of Xun Gao’s work was in connection with his excellent early work with Duan Efficient classical simulation of noisy quantum computation that used Fourier methods to study quantum circuits.
Update (Dec. 2022):
1. There is a very interesting paper by Dorit Aharonov, Xun Gao, Zeph Landau, Yunchao Liu, and Umesh Vazirani, A polynomial-time classical algorithm for noisy random circuit sampling. Here the noise model is based on an arbitrarily small constant amount of depolarizing noise applied to each qubit at each step. The analysis is built on Xun Gao and Luming Duan’s paper mentioned above and it seems related to Fourier expansion. Let me note that under the assumption of fixed-rate readout errors my simple algorithm with Guy Kindler applies and shows that approximate sampling can be achieved by low degree polynomials and hence it is in a very low-level computational subclass of P (and even in ). I don’t know if this conclusion applies to the model of Aharonov et al.’s recent paper and this is an interesting question.
Here is the abstract of Aharonov et al.’s paper
We give a polynomial time classical algorithm for sampling from the output distribution of a noisy random quantum circuit in the regime of anti-concentration to within inverse polynomial total variation distance. This gives strong evidence that, in the presence of a constant rate of noise per gate, random circuit sampling (RCS) cannot be the basis of a scalable experimental violation of the extended Church-Turing thesis. Our algorithm is not practical in its current form, and does not address finite-size RCS based quantum supremacy experiments.
2. There is a nice blog post by Scott Aaronson on several Sycamore matters. On quantum suppremacy Scott expresses an optimistic view:
“Stepping back, what is the current status of quantum supremacy based on Random Circuit Sampling? I would say it’s still standing, but more precariously than I’d like—underscoring the need for new and better quantum supremacy experiments. In more detail, Pan, Chen, and Zhang have shown how to simulate Google’s 53-qubit Sycamore chip classically, using what I estimated to be 100-1000X the electricity cost of running the quantum computer itself (including the dilution refrigerator!). Approaching from the problem from a different angle, Gao et al. have given a polynomial-time classical algorithm for spoofing Google’s Linear Cross-Entropy Benchmark (LXEB)—but their algorithm can currently achieve only about 10% of the excess in LXEB that Google’s experiment found.”
Since Scott and I are in agreement that classical algorithms are now ten orders of magnitude quicker than those used by the Google team in 2019, I don’t see much reasons to debate the extra 2-3 orders of magnitude of supremacy in terms of electricity costs. But there is one technical matter worth noting regarding the 10% LXEB value that the Gao et al.’s method achieved. (Above we use XEB for LEXB.)
As we explain in the post, the basic observation of Gao et al. is that when we apply depolarizing noise on a set of edges that separate the circuit into two parts there is a positive correlation with the ideal distribution and there is a better classical algorithm which allows Gao et al. to achieve asymptotically better LXEB for running time. In the Google 2019 paper, the Google team talked about patch circuits that are obtained by deleting 2-gates that allows separating the full circuits into two parts and also about elided circuits where only part of these “boundary” 2-gates are deleted, and quick classical algorithms are still available. It is plausible that using these elided circuits (or alternatively even leaving alive a single 2-gate between the two parts) will allow quick classical algorithms that bridge the 10% gap that Scott mentioned.
Further update: After a long break I took part in the interesting discussion over SO. Here is a brief summary from the discussion of one of the pillars of my argument against quantum computers:
“Computational supremacy not supported by asymptotic analysis cannot be reached in practice”
This principle applies to NISQ systems for which there is by now various results (including the recent result by Aharonov et al. ) that they represent asymptotically efficient classical computation.
3. Regarding the other news of Sycamore wormholes simulation, my general view is that a demonstration of a quantum state or evolution on a few qubits could be impressive even if the classical computations required for confirming the claims are not computationally hard. For example, it will be very impressive to reach a “dense” cloud of quantum states with 3-6 qubits. (I once asked about it here.) Putting the wormholes aside I would be very interested to hear what is the precise circuit that the Sycamore ran, and how the outcomes were classically verified.
The distinction between classical simulation and quantum simulation that plays a role in Scott’s post was discussed several times in my 2012 debate with Aram Harrow. In fact our very last round of comments (Nov. 2012) was about this issue:
(Aram) (emphasis added): The same story can be told about the exponential-time classical emulation, except that now the encoding is no longer a unitary, or indeed linear, map.
I agree that the quantum emulation feels closer to the original physics than the classical emulation does. But this seems like a rather imprecise statement, and not something that can be formalized.
(Gil) This is great, Aram, in the last paragraph you say that the distinction cannot be formalized and in the previous paragraph you offer a way to formalize it!
(Aram) good point.
A little more (Dec 9): Regarding the Sycamore wormholes simulation here are links to the paper, a related commentary paper (both from Nature), articles in Quanta Magazine, NYT, and a nice video produced by Quanta Magazine. Peter Woit also wrote several critical blog posts about it. The scientific issue involves the Sachdev–Ye–Kitaev (SYK) model, and a well known principle “ER=EPR” proposed by Maldacena and Susskind a decade ago. Here is an article by Gali Weinstein with an historical perspective on Einstein’s work on ER and EPR. The ER=EPR conjecture is part of an ambitious effort to relate quantum gravity and quantum information (“it from qubit”). As Wikipedea puts it “a grander conjecture that the geometry of space, time and gravity is determined by entanglement.” Let me mention that Peter Shor (and others) had early ideas to relate quantum gravity with quantum error correction, and let me also mention a grand conjecture of my own that “the impossibility of quantum error correction and quantum fault tolerance is precisely what enable time and geometry to emerge in our physical world.”
It is unfortunate that the discussion regarding the new Sycamore experiment is centered around “meta” matters and “hype” and there is almost no discussion of the technical issues themselves.
I added some comments regarding the recent paper by Dorit Aharonov, Xun Gao, Zeph Landau, Yunchao Liu, and Umesh Vazirani, A polynomial-time classical algorithm for noisy random circuit sampling, and a nice blog post by Scott Aaronson on several Sycamore matters.