Questions and Concerns About Google’s Quantum Supremacy Claim

Yosi Rinott, Tomer Shoham, and I wrote our third paper regarding our statistical study of the Google 2019 supremacy experiment.  Our paper presents statistical analysis that may shed light on the quality and reliability of the data and the statistical methods of the Google experiment. Comments, corrections and discussion are most welcome.

The title of the Google 2019 paper was “Quantum supremacy using a programmable superconducting processor”. As the readers may remember, the supremacy claim has largely been refuted by several research groups; see this post, this one, and this one. The calibration process of the Google experiment weakens the claim of a “programmable processor”, and some of our findings from the second paper as well as a few of the findings from the new paper further weaken this claim (see below).

Questions and Concerns About Google’s Quantum Supremacy Claim

Abstract:  In October 2019, Nature published a paper [5] describing an experimental work that was performed at Google. The paper claims to demonstrate quantum (computational) supremacy on a 53-qubit quantum computer. Since then we have been involved in a long-term project to study various statistical aspects of the Google experiment.

In [32] we studied Google’s statistical framework that we found to be very sound and offered some technical improvements. This document describes three main concerns (based on statistical analysis) about the Google 2019 experiment. The first concern is that the data do not agree with Google’s noise model (or any other specific model). The second concern is that a crucial formula for a priori estimation of the fidelity is surprisingly simple and seems to involve an unexpected independence assumption, and yet it gives very accurate predictions. The third concern is about surprising statistical properties of the calibration process

Some findings of this paper

1. There is a large gap between the samples of the Google experiment and the Google noise mode, and any other specific noise model. The gap between the empirical distribution and the model is asymmetric.

2. There are large fluctuations of the empirical behavior which are not understood. Consequently, there is evidence that the distance between the Google noise model and uniform distribution is smaller (when the number of qubits is n > 16) than the distance between the experimental samples and the Google noise model.

3. The empirical behavior of the samples is not stationary.

4. While the empirical distribution is not stationary the XEB fidelity is stable along the samples. Moreover, “high energy events” that lead to abrupt increase in errors that are reported for later experiments with the Sycamore quantum computer cannot be detected in the 2019 quantum supremacy experiment.

5. The predictive power of Formula (77) for the XEB fidelity estimates is statistically surprising: the subsumed independence between components of systems such as quantum computers, which are known to be sensitive to noise and errors caused by interactions with their environment, is striking.

6. The systematic bias of the predictions of Formula (77) for patch circuits seems statistically surprising, and the Google explanation is not convincing.

7. The behavior of the fidelities of the two patches for patch circuits is very different; This appears to be in tension with Formula (77). ( We were not yet provided with the data needed to check this matter.)

8. The success of the experiments fully depends on the very large effect of the calibration adjustments. There are large differences between the effects of the calibration adjustments for different 2-gates, and even for different appearances of the same 2-gate.

9. The calibration adjustments are surprisingly effective, especially given the stated local nature of the calibration. Mathematically speaking, we witness a local optimization process reaching a critical point of a function depending on hundreds of parameters.

A few problems of general interest

Scrutinizing a scientific work necessarily involves punctiliousness and nitpicking, but there are several issues that we find of general interest.

(a) What is the statistical methodology for analyzing samples obtained from noisy quantum computers and for finding appropriate models to describe the empirical data?

(b) We suggest that the statistical independence assumption in a certain predictive model is very surprising. What could be the scientific framework and methodology to study this matter?

(c) We find it surprising that a local optimization process (namely, a process that separately optimizes each variable) of a function of many variables, reaches a critical point. What could be further tools to study this matter?

(d) What are the tools to study if an empirical behavior is non-stationary and perhaps even inherently unpredictable?

(e) What is the appropriate methodology and ethics for scrutinizing major scientific works, and how is it possible to bridge the gap between theoreticians (like us) and experimentalists?

What’s next

We are in the process of  writing a paper on readout errors, gate errors, and Fourier expansion. On the theoretical side, we will study the effect of gate errors on the Fourier expansion, which is of interest in and of itself and could serve as some sanity test for various aspects of the Google 2019 experiment. (The effect of readout errors is well understood – it is essentially the noise operator that I have been studying extensively since the mid-1980s.) On the technical side, this will be our first work where we use simulators for noisy quantum circuits, and we currently use both the Google and the IBM simulators. Then, we may apply some of our statistical tools at other NISQ experiments, and will even try to reproduce, using an IBM quantum computer, a certain random circuit experiment with 6 qubits.

Pictorial Summary

Concern I

There is a large gap between the samples of the Google quantum supremacy experiments and the Google noise model. In fact, the samples are far away from any noise model we are aware of. There is evidence that the distance between the Google noise model and uniform distribution is smaller (when
the number of qubits is n > 16) than the distance between the experimental samples and the Google noise model. We studied other properties of the empirical distribution like its behavior in different scales, its non-stationary nature, and its Fourier behavior, and there is more to be done mainly for data coming from other NISQ experiments and data from simulators of noisy circuits.

Fig2-KTS2

The left-hand side scatterplots display theoretical vs. empirical frequencies of the Google sample (file 0) with n = 12. The right-hand side scatterplots display theoretical vs. our simulated empirical frequencies according to Google’s noise model (1) with φ = 0.3701.

Concern II

The remarkable predictive power of Formula (77) is statistically surprising: the subsumed independence between components of the quantum computer, is striking. The close agreement of the experimental XEB fidelities between the patch circuits and full circuits shows an unexplained systematic deviation from the predictions of Formula (77). On the other hand, confirmations [13, 23] and replications [33, 35] lend support to the claims in the Google paper. There are various matters that remain to be explored. For example, (i) using simulations of noisy circuits, the magnitude of the difference between the two sides of Formula (77) (Gao et al. [11]) could be estimated, and (ii) the individual values in Formula (77) could be used to study the different XEB fidelities of the two patches in patch circuits.

FIG9-KRS

The striking prediction power of Formula (77): Comparison between average XEB fidelity for full, elided, and patch circuits with (77).

Concern III

The calibration process accounts for systematic errors for 2-gates and applies certain adjustments to the definition of the circuits. These adjustments are local, namely the adjustments for a 2-gate involving qubits x and y primarily depend on outcomes for 1- and 2-circuits on these qubits. Some statistical findings regarding the calibration process are: (i) The effects of the calibration is large even for a single 2-gate; (ii) there is a large difference between the effect for different 2-gates and even different appearances of the same 2-gate, and; (iii) the effectiveness of the 2-gate calibrations is remarkable. We note that these findings enhance the tension between the calibration process and Google’s claim for a “programmable quantum computer.” The effectiveness of the calibration process is especially surprising in view of the local nature of the calibration: mathematically speaking, we witness a local optimization process reaching a critical point of a function depending on hundreds of parameters.

calibA

The remarkable effectiveness of the calibration. The effect of removing the calibration for the kth 2-gate of a circuit for the first Google file (file 0) with n = 12. Note that the same 2-gate occurs periodically along the circuit, indicated by the vertical black dashed lines. Here we remove all ingredients of the calibration: both the 1-gate rotations and 2-gate adjustments.

calibB

The remarkable effectiveness of the calibration II. The effect of removing the 2-gate adjustments involving the kth 2-gate of a circuit. (The same 2-gate occurs periodically along the circuit.) The 2-gate involving the qubits (3,3) and (3,4) has consistently large effect.

To make matters clear, the calibration is not about tightening the screws in Sycamore; rather, it is about change in the program. We can think about the calibration process as a change in the model that would greatly reduce certain systematic forms of noise. For example, if we discovered that a certain 1-gate that is supposed to apply a 90-degree rotation systematically performs an 80-degree rotation, rather than changing the engineering of the 1-gate, we would change the definition of the circuit.

(no) Conclusion

“We laid the dry facts and findings, and we let the readers make their own interpretation, or rather take note of our concerns and wait for more experimental data from future experiments.”

Was the Google experiment a “Programmable quantum computer?”

The title of the Google 2019 paper was “Quantum supremacy using a programmable superconducting processor”. As we mentioned, the supremacy claim has largely (but not fully) been refuted. There are also doubts regarding the claim that the Sycamore 2019 experiment represents a “programmable processor” as the calibration process and other matters weaken this. (This was pointed out in a comment from October 2019 by Craig Gidney from the Google team and also a few months later by a commentator “Till” . Till’s comment led to interesting discussion regarding the nature of the calibration, which was earlier believed by many to represent physical changes in the device.)

Some findings of our papers further weaken the Google claim for “programmable device”. The Google paper describes about 1000 experiments on various circuits but, as it turns out, all these experiments depend on the random choices made for the largest ten circuits, and this fact is also in contrast with the “programmable” claim.  (It was quite possible to choose a different random circuit in every case.) A related concern is that improvements of the calibration process were interlaced with the experiment, and that the last minute calibration procedure for the EFGH circuits represented a substantial improvement.  We note that while the general principles of the calibration process are publicly available, the precise details are a commercial secret. Of course, concerns regarding the Google calibration process may reflect on other Sycamore experiments.

Our earlier papers

We wrote two earlier papers:

  1. Y. Rinott, T. Shoham, and G. Kalai, Statistical Aspects of the Quantum Supremacy Demonstration, Statistical Science  (2022)
  2. G. Kalai, Y. Rinott and T. Shoham, Google’s 2019 “Quantum Supremacy” Claims: Data, Documentation, & Discussion  (see this post.)

Data

The question of appropriate methodology,  ethics, and culture for scrutinizing major scientific works is related to the replication crisis that we mentioned in an earlier post. There we described our policy regarding data requests. Our experience was overall rather positive. (Things went rather slowly but we were slow as well.) We still did not get the individual terms of Formula (77) (namely, the error rate for individual 1-gates and 2-gates) but the Google team promised to try to push toward getting this information.

Disclaimer

(From our paper:) “A few months after the publication of the Google paper we initiated what has become a long-term project to study various statistical aspects of the Google experiment and to scrutinize the Google paper. This is a good place to mention that Google’s quantum supremacy claim appeared to refute Kalai’s theory regarding quantum computation ([15, 16, 18]) and Kalai’s specific prediction that NISQ systems cannot demonstrate `quantum supremacy.’ This fact influenced and may have biased Kalai’s assessment of Google’s quantum supremacy claim. (Recent improved classical algorithms have largely refuted Google’s quantum supremacy claim and therefore the Google results no longer refute Kalai’s theory.)”

For my argument see this post and this one.

A few more figures

Fig5-KTS2

Zooming in on the empirical frequency of bitstrings with amplitudes between the median and the 0.55 quantile. The left plot is the empirical occurrences of the bitstrings in Google file 0, n=12. The right plot is based on a simulation with φ = 0.3862. The red line describes the expected number of bitstrings of the Google noise model, and the blue dashed line is 3 standard deviations from the expectation. (We plan to test if these fluctuations are present in samples from IBM quantum computers.)

FIG6-KRS Comparing the two halves of the Google samples: the black vertical lines are the ℓ1 distance of the occurrences of bitstrings when we partition the samples into two halves according to the sampling order. The histograms give the ℓ1 distances between the occurrences of bitstrings for random partitions of the bitstrings into two halves.  FIG7-KRS Drifts in the fractions of ones. We divided the 500,000 bitstrings into 250 groups of 2,000 bitstrings each, according to the sampling order. For each group calculated the fraction of bitstring having a “1” bit in some place in the bitstring. The Figure shows the fractions of ones in locations 11 and 12 for one circuit (file 0) with n = 12. (The red lines are linear regression fits to the data points.) The trend is consistent along the different circuits and is different for different locations in the bitstrings. FIG8-KRS

A histogram of differences between the empirical distribution and the values given by the Google noise model, n = 12, Google file 0, φ = 0.3701. What can explain the apparent asymmetry in the gaps between the empirical distribution and the model? Is an explanation at all necessary?

Table3-KRS

This entry was posted in Computer Science and Optimization, Physics, Quantum, Statistics, Updates and tagged , , . Bookmark the permalink.

6 Responses to Questions and Concerns About Google’s Quantum Supremacy Claim

  1. Gil Kalai says:

    There are several new papers with relevant NISQ experiments and it would be interesting to statistically examine the data. 1. Phase transition in Random Circuit Sampling (The Google team) https://arxiv.org/abs/2304.11119 (seems closest to the 2019 experiment). 2. Non-Abelian braiding of graph vertices in a superconducting processor https://arxiv.org/abs/2210.10255 (The Google team) (There are a few more recent preprints from the Google team.) 3. Evidence for the utility of quantum computing before fault tolerance. https://www.nature.com/articles/s41586-023-06096-3 (IBM team; 127 qubit quantum computers. The IBM paper uses a method for “error mitigation” which seems interesting. (There are some earlier papers about the method.)

    • Gil Kalai says:

      The notion of error mitigation is especially interesting. Ohad Lev told me that there are various methods and mentioned one: By adding gates that add up to the identity you measure the quantity of interest as a function of the amount of noise and then you extrapolate to the case of interest of zero-noise. The obvious questions are: a) what are the assumption on the noise that allows this method to work and b) can it be implemented by classical algorithms.

  2. Matthew Cory says:

    Gil, I don’t see how QCs have a prayer just examining their thermodynamic behavior. It follows from basic physics that a QC needing N operations takes at least O(N^2)
    time to terminate. They scale linearly with applied force and entropy, so the constant error of the Threshold Theorem has no power. QFT is infinite-dimensional, as I point out here: https://www.physicsoverflow.org/45750/thermodynamic-speed-limit-of-quantum-computing

  3. Pingback: Random Circuit Sampling: Fourier Expansion and Statistics | Combinatorics and more

Leave a comment