Literature DB >> 34985960

Quantum computational advantage via high-dimensional Gaussian boson sampling.

Abhinav Deshpande^1,2,3, Arthur Mehta^4,5, Trevor Vincent⁴, Nicolás Quesada^4,6, Marcel Hinsche⁷, Marios Ioannou⁷, Lars Madsen⁴, Jonathan Lavoie⁴, Haoyu Qi⁴, Jens Eisert^7,8,9, Dominik Hangleiter^1,7, Bill Fefferman¹⁰, Ish Dhand¹¹.

Abstract

Photonics is a promising platform for demonstrating a quantum computational advantage (QCA) by outperforming the most powerful classical supercomputers on a well-defined computational task. Despite this promise, existing proposals and demonstrations face challenges. Experimentally, current implementations of Gaussian boson sampling (GBS) lack programmability or have prohibitive loss rates. Theoretically, there is a comparative lack of rigorous evidence for the classical hardness of GBS. In this work, we make progress in improving both the theoretical evidence and experimental prospects. We provide evidence for the hardness of GBS, comparable to the strongest theoretical proposals for QCA. We also propose a QCA architecture we call high-dimensional GBS, which is programmable and can be implemented with low loss using few optical components. We show that particular algorithms for simulating GBS are outperformed by high-dimensional GBS experiments at modest system sizes. This work thus opens the path to demonstrating QCA with programmable photonic processors.

Entities: Chemical

Year: 2022 PMID： 34985960 PMCID： PMC8730598 DOI： 10.1126/sciadv.abi7894

Source DB: PubMed Journal: Sci Adv ISSN： 2375-2548 Impact factor: 14.136

INTRODUCTION

We are arriving at an exciting era for quantum computing in which quantum experiments are pushing the limits of what is efficiently computable by the most powerful classical supercomputers. The first major goal for this era is the demonstration of a scalable quantum advantage or quantum computational advantage (QCA) (also termed “quantum computational supremacy”) over classical computers. QCA is important as a probe of the foundations of computer science, where it can be seen as an experimental violation of the extended Church-Turing thesis, and it also serves as an important benchmarking tool for comparing near-term experiments on different platforms in a fair and consistent manner. The recent groundbreaking demonstrations of QCA (, ) constitute the first notable experimental evidence against the extended Church-Turing thesis. Notwithstanding, multiple potential loopholes have been pointed out (–). QCA will not be marked by a single isolated experiment but rather will be established by gradually improving and scaling up “high complexity” experiments run over the course of many years, which improving classical algorithms will try to simulate. Our confidence that we have arrived in this new era will grow as multiple experiments, performed in different physical architectures, independently reach this conclusion in a comparable fashion. In this way, the goal may be seen as being analogous to Bell inequality violations, which were originally conducted in landmark experiments starting in the 1970s performed on a variety of different platforms but only much later were loopholes closed. In the same vein, theoretical results about QCA justify the classical hardness of simulating an experiment in the realm of asymptotically large system sizes. To interpret conclusions from experiments performed at a fixed system size, we should also consider the concrete cost of simulating these finite-size experiments using known algorithms. The two lines of inquiry are complementary to each other and support each other in a claim that any experiment is likely impossible to feasibly simulate with current hardware. Among different approaches to demonstrating QCA (, , , ), photonics provides a promising path as it enables room temperature operation, fast gate speeds, and notable potential for scalability (, ). Arguably, the most feasible approach to demonstrating QCA with photonics is to perform the Gaussian boson sampling (GBS) protocol (, ). This protocol is at the heart of the recent QCA demonstration performed by a team from University of Science and Technology of China (USTC) (), which used a GBS device with 100 modes and an average of around 45 photons. However, GBS has several important limitations. On the experimental side, current implementations of GBS either lack programmability () or have high loss rates, which could render the system classically simulable (12, 13). In addition, from a theoretical standpoint, there is a comparative lack of complexity-theoretic evidence for the hardness of GBS () and an understanding of the classical runtime of concrete algorithms to simulate GBS instances. In this work, we aim to address these challenges. We close important theoretical loopholes in the hardness argument for GBS and provide evidence for the hardness of classically simulating GBS even in the presence of loss. We moreover propose a new, programmable architecture for GBS that promises better robustness to loss in a near-term experiment and an asymptotic quantum speedup over classical algorithms. In addition, our proposed architecture is designed so that it is outside of known regimes where current algorithms can simulate finite-size GBS instances in feasible time, as we show through numerical benchmarking. We first address the open theoretical questions about GBS, namely, hardness in the regime with little overall noise in the form of optical loss. More specifically, to provide complexity-theoretic evidence for the hardness of approximately simulating GBS, we prove average-case hardness of computing output probabilities in the noise-free case, formulate the so-called “hiding property” () for GBS in terms of a random-matrix theory conjecture, and provide analytical and numerical evidence for this conjecture. These results bring GBS to the level of evidence shared by other QCA proposals such as random circuit sampling (RCS) and conventional boson sampling [see (, , )] up to a mild conjecture in random matrix theory. We then show that average-case hardness of computing output probabilities still holds in a regime of high loss rates, building on recent results (), and discuss the implications of this result on the noise-regimes in which one may still expect GBS to be hard to simulate on a classical computer. These results bolster the evidence for QCA in the USTC experiment and also any future GBS experiments. Given these theoretical results, we then address the programmability versus low-loss trade-off in current architectures. To this end, we introduce a new architecture, high-dimensional GBS, using a time-domain approach. This architecture can be implemented programmably with low overall loss while at the same time being hard to simulate for the known classical simulation algorithms. The hardness of this architecture is borne out by the hardness of computing output probabilities for the lossy, high-dimensional GBS setup. These results provide evidence of classical hardness for asymptotic system sizes. In the realm of finite system sizes, we take care to avoid regimes where the experiment can be tractably simulated (12, 16), such as when the linear-optical network has limited connectivity [such as one-dimensional (1D) network topology] or when the system is too lossy. Our proposed high-dimensional GBS architecture voids these algorithms by taking advantage of the enhanced connectivity available in higher dimensions than one. To this end, we perform benchmarking simulations to estimate the cost of high-dimensional GBS against state-of-the-art algorithms for simulating GBS and for simulating high-dimensional quantum many-body systems (17, 18). These simulations give evidence that classically intractable instances of high-dimensional GBS can be built in the laboratory with a small number of optical components. These advantages make high-dimensional GBS an ideal near-term architecture for demonstrating QCA with a programmable photonic device. Thus, by addressing the abovementioned shortcomings of GBS from the theoretical and experimental perspectives and understanding the limits of its classical simulability through both asymptotic analysis and finite-size benchmarking, this work paves the way toward more “loophole-free” demonstrations of QCA with a programmable photonic quantum device.

Hardness of approximate GBS

We begin by reviewing and strengthening the hardness argument for the task of simulating GBS as introduced in (, ). We first introduce the model of GBS and then examine the evidence for the hardness of approximate boson sampling. Two properties are required for establishing complexity-theoretic hardness of sampling using the standard QCA arguments, namely, hiding and average-case hardness of approximating probabilities. Here, we strengthen the results in (, ) by providing strong evidence for these properties in GBS. Specifically, we reduce the hiding property to a highly plausible conjecture in random matrix theory, for which we provide analytical and numerical evidence. In addition, we provide evidence for approximate average-case hardness by proving approximate worst-case hardness and near-exact average-case hardness of computing the output probabilities. Thereby, up to a random-matrix theory conjecture, we bring the hardness argument for GBS to the same standard as that of boson sampling. We then extend the latter results to the case of computing output probabilities of noisy GBS, which can be well motivated when the noise model describing the experimental data is trusted. These results show that the evidence of a quantum “signal” remains in the output distribution even in the presence of noise. Last, we discuss the implications of these results on the complexity of simulating GBS in the presence of noise.

Recap: GBS

GBS is the computational task of sampling the photon number statistics of a Gaussian state. Obtaining a sample from a typical GBS experiment involves the following steps. First, a general Gaussian state is prepared at the input, often taken to be M single-mode–squeezed vacuum states. These states are then interfered on an M-mode linear/optical interferometer containing beam splitters and phase shifters. Last, the Gaussian state at the output of the interferometer is impinged on M photon number–resolving (PNR) detectors. The resulting pattern of photon number outcomes from the detectors is the required sample. Because single-mode–squeezed states can be generated and interfered deterministically at room temperatures with high rates, GBS is experimentally feasible on large scales already today, as evidenced by the recent experiment from USTC (). In more detail, a typical GBS experiment involves interfering M single-mode–squeezed vacuum states with squeezing parameters at an interferometer specified by an M × M linear-optical unitary matrix U. Note that some of the modes can be optionally prepared in the vacuum state, and these can be specified by setting their squeezing parameter to zero. The probability of detecting n1 photons in the first mode, n2 in the second, and so on, denoted by n = (n1, …, n), is Here, is the so-called adjacency matrix of the (pure, zero displacement) Gaussian state (), and A is the symmetric matrix of size (i.e., the total photon number) obtained by repeating the ith column and row of A a total of n times. In particular, if n = 0, then the corresponding row and column is deleted. Last, the Hafnian Haf(·) of a symmetric N × N matrix B is given bywhere PMP(N) is the set of perfect matching permutations of N elements for even N, i.e., permutations μ : [N] → [N] satisfying μ(2k − 1) < μ(2k), μ(2k − 1) < μ(2k + 1). Equivalently, this is the set of all N !/(2(N/2) !) = (N − 1) ! ! ways of partitioning the set {1,2, …, N} into N/2 subsets of size 2. The Hafnian of a 0 × 0 matrix is defined to be 1, and that of an odd-size matrix is defined to be 0, which is a manifestation of the fact that squeezed states are supported on even photon number states only. By allowing for arbitrary linear-optical unitaries and arbitrary squeezing parameters on each squeezer, an arbitrary symmetric matrix A can be encoded (up to scaling prefactors) into a Gaussian state. For generic instances, the best-known algorithms to calculate Hafnians have a runtime scaling as N32, where N is the size of the matrix ().

Recap: Approximate sampling hardness of boson sampling

Before we state our technical results, we review the main steps of the hardness argument for conventional boson sampling as given by Aaronson and Arkhipov (). These steps provide context for the hardness results of GBS that we present below. In a standard boson sampling experiment, instead of interfering single-mode–squeezed states at an interferometer as done in GBS, an N-photon M-mode Fock state is prepared and evolved under a linear-optical unitary and then measured in the photon-number basis. The boson sampling task is to, given a linear-optical unitary as an input, output samples from the output distribution of a corresponding boson sampling experiment. Aaronson and Arkhipov showed that it is not possible for a classical computer to efficiently do this task unless certain complexity-theoretic conjectures are false. In particular, they reduced the task of approximating the probabilities of outputs to the task of efficient sampling, making use of an approximate counting algorithm due to Stockmeyer (). This probability estimation can in turn be related to approximating the permanent of a certain submatrix of the linear-optical unitary, which is probably hard for a class known as (). While the Stockmeyer reduction is not efficient, the existence of a classical efficient sampling algorithm would imply that -hard problems could be solved using fewer computational resources than expected, amounting to an argument by contradiction. The main difficulty in the hardness argument for boson sampling arises when extending it to the setting of approximate sampling. Here, the task is to sample from any distribution that is within constant-size total variation distance from a given ideal boson sampling distribution. This additional constraint takes into account that actual devices are bound to achieve only some finite and typically additive precision. In this setting, one may therefore argue for a separation of computational power between quantum and classical devices. Given this constraint, the hardness argument for the task of approximate sampling must take into account that the constant error budget on the distribution can be distributed arbitrarily across all outcome probabilities. In particular, this means that any specific outcome probability of the actually sampled distribution might have a large (constant-size) error when compared to the ideal distribution, which would imply that the sampler cannot be used to estimate the true outcome probabilities. To get around this issue, the argument is extended to random problem instances: Via a property of the distribution over problem instances called hiding, one can then translate typical outcomes of fixed instances to fixed outcomes of random instances. This enforces that with high probability, the overall constant error budget for the entire distribution is manifest in small errors on the individual probabilities that are proportional to the inverse size of the sample space, that is, . Technically, in standard boson sampling, showing the hiding property boils down to showing that the distribution of any small enough submatrix of a Haar-random unitary is approximately (in total variation distance) an entry-wise complex normal distribution. This implies that all collision-free outcomes are (approximately) equally distributed. In particular, Aaronson and Arkhipov () show that when M ∈ ω(N5), we can “hide” a random Gaussian matrix in a small enough submatrix of the large Haar-random unitary by an appropriate procedure because all of these submatrices are indistinguishable from random Gaussian matrices. For the approximate sampling task to remain computationally intractable, it remains to show that estimating the outcome probabilities up to inverse exponentially small error is -hard for any large-enough fraction of the problem instances—a property called approximate average-case hardness. More precisely, given a random problem instance, approximating the probability of a given outcome must be -hard with high probability. As evidence toward this property, it has been shown that exactly computing those output probabilities is, in fact, -hard on average (and this was a motivation for boson sampling in the first place), and it is known that estimating them to the required robustness level is worst-case hard. However, the hardness of computing those probabilities to a sufficiently large robustness level on average is still unknown. We now state our results concerning the hardness of general GBS, followed by our proposal for an architecture to perform high-dimensional GBS.

RESULTS

Hiding for arbitrarily many squeezers in GBS

As mentioned above, the property of hiding in boson sampling can be translated into a property of the distribution of submatrices of random linear-optical unitaries chosen from some distribution. We will now show that a similar property about the distributions of submatrices occurring in the evaluation of outcome probabilities also holds in GBS, provided that a plausible random-matrix theory conjecture holds. We focus on the paradigmatic setting in which the linear-optical unitary is drawn from the Haar measure, and we fix the input state to be such that the first K out of M modes are prepared in single-mode–squeezed states with identical squeezing parameter r, and the remaining M – K modes are prepared in the vacuum state. Furthermore, we restrict to collision-free outcomes n for which n ∈ {0,1}, giving rise to a total photon number . The probability of obtaining such an outcome n can be written as Here, denotes the matrix where is a K-dimensional identity matrix, 0 is an M-K-dimensional all-zero matrix, and as before, the notation A stands for the submatrix of A corresponding to the entries of n [see below Eq. (1)]. The task of estimating output probabilities of GBS, hence, corresponds to estimating ∣Haf((UI))∣2. To show the GBS hiding property, we need to characterize the distribution of matrices (UI)—of which the Hafnian is computed—as induced by the Haar-random choice of U and depending on the scaling relations between K, N, and M. To ensure that for every choice of K, we can restrict to collision-free outcomes, we choose the squeezing parameter r such that the average photon number (). This condition ensures that the collision-free outcomes dominate the probability weight. Here, we formulate the hiding property in GBS in terms of random matrix theory and provide strong numerical and analytical evidence that it holds regardless of the fraction of squeezed input modes so long as the collision-free condition is satisfied. Observe that the matrix can be expressed in terms of the submatrix U of U obtained by choosing rows according to n and the first K columns. To show the hiding property, we need to relate this distribution over matrices to the distribution of the symmetric product XX of a complex Gaussian N × K matrix X with mean 0 and variance 1/M, denoted as X ∼ 𝒢(0,1/M). We provide analytical and numerical evidence for the conjecture that these distributions are indistinguishable for any number of squeezers K satisfying N ≤ K ≤ M.

Conjecture 1 [hiding in GBS (informal)]

For any K such that N ≤ K ≤ M and , the distribution of the symmetric product of submatrices of a Haar-random U ∈ U(M) closely approximates the distribution of the symmetric product XX of a Gaussian matrix X ∼ 𝒢(0,1/M) in total variation distance. We provide a formal statement of the conjecture in the Supplementary Materials. There, we also discuss regimes in which the conjecture is known to be partially true (, ) and provide numerical evidence for it. Proving this conjecture is an open research problem in random matrix theory. Conjecture 1 characterizes the distribution of the symmetric product of N × K submatrices of Haar-random unitaries. In turn, the Hafnian of such symmetric products determines the output distribution of GBS. While in standard boson sampling, the hiding property amounts to hiding a small N × N Gaussian matrix in a large M × M Haar-random unitary matrix, in GBS, it amounts to hiding a small N × N symmetric Gaussian matrix XX in a large symmetric unitary matrix UI for any K ≥ N. This means that any particular submatrix cannot be distinguished from any other such submatrix of the same size, enforcing the constant error budget to be roughly equally distributed across all outcomes. In particular, the conjecture implies that the hiding property can be achieved with any number K of input squeezers as long as the average total photon number is sufficiently small. In turn, the average total photon number is determined by the total amount of squeezing across all input squeezers. Intuitively, this is due to the fact that the output of a Haar-random unitary does not depend on any fixed input state. The average output state is a product of identical thermal states whose average photon number is determined by the total input squeezing. However, the number K is still crucial for the estimation task as it determines the rank of the matrix (UI). Since the complexity of computing the Hafnian of a matrix depends on the rank of that matrix (), K should be chosen such that it is at least N. Note that the USTC experiment () used K = M/2 many squeezers, so our results are directly applicable there, strengthening the arguments for their QCA demonstration. More generally, we consider three regimes of interest and provide evidence for Conjecture 1 in the Supplementary Materials. First, the highly sparse regime in which the total number of modes scales as M = ω(K5), and the number of photons is equal to the number of squeezers, N = K, features provable hiding results due to (). Realistic experiments and proposals today operate in the regime K = cM, meaning that a constant fraction c of the input modes is squeezed. In this regime, the result in () provides analytical evidence for hiding in the asymptotic limit as long as the input squeezing is such that . Last, we also consider the intermediate regime of how M scales with K between these two extremes and give numerical evidence for hiding in this general case. Let us note that we do not expect Conjecture 1 to hold for large . In this case, it is known that hiding fails for standard boson sampling (, ).

Average-case hardness of computing GBS probabilities

As outlined earlier, the question of hardness of approximate sampling boils down to whether it is -hard to approximate most output probabilities. We now show the average-case hardness of this task when the allowed additive approximation error is exponentially small, using techniques from (). We have established that the output probabilities of GBS are given in terms of ∣Haf((UI))∣2. By virtue of the previous discussion and more precisely, Conjecture 1, the distribution over the N × N matrices (UI) for Haar random U is well approximated by complex, symmetric Gaussian matrices XX. Hence, to show the average-case hardness of computing output probabilities of GBS, it suffices to consider the following problem (δ, ϵ)-Squared-Hafnians–of-Gaussians. Input A matrix XX with X ∼ 𝒢(0,1/M). Output ∣Haf(XX)∣2 to additive error ϵ, with probability ≥δ over the distribution 𝒢(0,1/M). To complete the argument that an efficient classical approximate sampling algorithm for GBS cannot exist, it remains to prove the #P-hardness of (δ, ϵ)-Squared-Hafnians-of-Gaussians as formalized by the following approximate average-case hardness conjecture.

Conjecture 2

The (δ, ϵ)-Squared-Hafnians-of-Gaussians problem is -hard for any and any constant δ > 3/4. A proof of Conjectures 1 and 2 would imply that approximate sampling from a random, general GBS instance, is hard on average. Let us see how. Assume that there exists a classically efficient sampler O that samples from an associated distribution whose output probability for outcome i is given by q. From the promise that this distribution is ε-close in total variation distance to the target distribution, we have ∑ ∣p − q∣ ≤ 2ε, where p is the corresponding output probability of the target distribution. Choose a photon number N so that Conjecture 1 is satisfied. Among the space of all outcomes with N total photons, for a randomly chosen outcome i, we have Assuming Conjectures 1 and 2, with probability at least 3/4, p is -hard to compute to additive error . Therefore, with probability at least 3/4(1 − 1/k), it is also -hard to compute q to within error assuming M = Θ(N2). On the flip side, the Stockmeyer algorithm () allows us to compute the output probability of an arbitrary outcome q to within inverse-multiplicative polynomial precision. Furthermore, by the Markov inequality, most outcomes q cannot be much larger than where the quantity Pr (N) is the probability of seeing N total photons. This means that with probability at least 1 − 1/l, q can be computed to additive error O(l exp [−N log N − Ω(N)]) using a machine running the Stockmeyer algorithm. Therefore, setting l = 4k, a PH algorithm can solve with high probability a problem that is average-case -hard. This collapses the polynomial hierarchy. Note that since we have phrased Conjecture 2 in terms of additive error instead of multiplicative error, we do not explicitly need an anticoncentration condition of the form ≥ γ for some constant γ > 0, as is often conjectured for permanents (). Nevertheless, it is possible that Conjecture 2 already implies a weak form of anticoncentration. Informally, an anticoncentration condition states that on a large fraction of the instances, the output probabilities are large enough so that a trivial algorithm for computing the probabilities that outputs “0” is not sufficient to solve the (δ, ϵ)-Squared-Hafnians-of-Gaussians problem. This is because in order for Conjecture 2 to be true, it is necessary for the trivial algorithm to fail with high probability. As in all other known proposals for demonstrating QCA, this approximate average-case hardness conjecture remains open. Nonetheless, just like in other proposals, it turns out that one can give evidence for Conjecture 2. Namely, we can prove a weaker version of the conjecture with a smaller robustness level ϵ = O(exp [−6N log N − Ω(N)]) as opposed to ϵ = O(exp[−N log N − Ω(N)]) in Conjecture 2.

Theorem 3

The (δ, ϵ)-Squared-Hafnians-of-Gaussians problem is -hard under PH reductions for any ϵ ≤ O (exp [−6N log N − Ω(N)]) and any constant δ > 3/4. We provide a detailed proof of Lemma 3 in the Supplementary Materials. The technique we use in the proof is a worst-to-average case reduction [see ()]. That is, by assuming access to an oracle for the (δ, ϵ)-Squared-Hafnians-of-Gaussians problem, we show that one in fact approximate Haf(XX) for any matrix X ∈ ℂ. This latter task is -hard in the worst case as we show in the Supplementary Materials. At a high level, the worst-to-average case reduction relies on the fact that ∣Haf(XX)∣2 is a low degree (of degree 2 N) polynomial over the entries of the matrix X. This allows us to use the oracle to perform polynomial interpolation. Therefore, by combining this observation with the techniques in (, , ), we obtain a worst-to-average case reduction for exactly computing the output probabilities. Together, our results on the hiding property and the approximate average-case conjecture in GBS strengthen the evidence for the hardness of approximately simulating GBS in terms of the total variation distance to the ideal output distribution. Given our results, GBS is now on par with the other leading QCA proposals in terms of complexity-theoretic evidence for approximate sampling hardness (, , , , , ), up to a plausible conjecture in random matrix theory—for which we provided theoretical and numerical evidence. To achieve a demonstration covered by those complexity-theoretic results, however, the loss rate at every element of the linear-optical circuit must scale inversely with the total number of these elements—a daunting challenge from an experimental perspective.

Hardness of computation of output probabilities for noisy GBS

We now go one step further and assess how the complexity-theoretic argument for sampling hardness is affected by more realistic noise levels, particularly, in terms of photon loss. In terms of scaling, any constant loss rate of the individual optical elements can lead to the output distribution rapidly approaching a classical distribution. We now show that, nonetheless and unexpectedly, an evidence of a quantum signal remains even in the presence of significant loss. We then discuss to what extent and in which regimes such a quantum signal might lead to the hardness of simulating a lossy GBS experiment. One of our main results is the average-case hardness of computing the noisy output probability of a random GBS instance, which we obtain by using similar arguments to recent work of Bouland et al. () but now extended to the GBS setting. Our results are valid for any noise model that is local, stochastic, and is error detectable using linear optics. More specifically, we consider a setting where the noise acts locally after every gate and is of the E formwhere stochasticity requires to itself be a valid channel (i.e., a completely positive trace preserving map) with no identity component. Consider the following problem. (ϵ, η)-NoisyGBS-Probability. Input A noisy GBS instance, consisting of the linear-optical unitary U on M modes chosen from the Haar measure H, the squeezing parameters at the input, a description of the noise channels with parameters η, and a description of a collision-free outcome n with total photons. Let η = maxη. Output With probability δ over instances, an estimate of the quantity Pr (n) to additive error ϵ, where Pr (n) is the probability of obtaining outcome n. With probability 1 − δ, an arbitrary output. In the above definition, we take δ = 1 to mean the worst-case problem. We prove the following statement of average-case hardness of computing noisy probabilities.

Theorem 4

There exists a noise threshold η* and a sufficiently large polynomial such that the problem (ϵ, η)-NoisyGBS-Probability is -hard under PH reductions for any constant δ > 3/4, η ≤ η*, and . There are two parts to the proof. The first part is a proof of worst-case hardness of the problem (when δ = 1) and the second a worst-to-average case equivalence. For worst-case hardness, it turns out that because of a result of Fujii (), it suffices for the noise channel to be a convex combination of the lossless and lossy channels and to be able to error-detect it. These conditions are both met for optical loss since it is a convex combination of the channels corresponding to no photon loss, single-photon loss, and so on (). Moreover, optical loss can also be detected and corrected using only linear-optical operations and photodetection with high thresholds (). In Fujii’s argument, one postselects on the error-free outcome of an error-detection code and obtains noiseless universal gates for the class of postselected quantum computation, . This argument can apply to the optical case as well since linear optics with postselection is universal for quantum computing (). For the worst-to-average case equivalence, all we need is for the polynomial structure in the problem to be preserved. This can be satisfied for any local noise model. Preserving the polynomial structure of the output probability enables us to continue to use the same proof techniques as earlier. Before moving on, we again remind the reader that we considered the hardness of computing output probabilities. While these are not tasks that are feasible for any realistic quantum device, our results nevertheless indicate that there is a computationally intractable (but exponentially small) “quantum signal” present in the system.

The complexity of noisy and approximate GBS

We now discuss the implications of the hardness result for computing noisy GBS probabilities on the complexity of sampling from the output distribution of noisy GBS. An immediate implication of this result is that it is classically hard to exactly sample from the noisy distribution of a worst-case GBS experiment. This is because the quantum signal is still present in the distribution, so the argument based on Stockmeyer’s algorithm is valid. Thus, in the idealized situation in which loss is the only source of noise of an experimental system and the exact loss rate is known, simulating a worst-case GBS experiment is classically intractable. Note that loss rates can be inferred from standard optical tomography procedures such as that in (). Given that this result links the hardness of simulating the noisy experiment to an exponentially small quantum signal in the form of output probabilities, it is crucial that the noise model accurately captures the working of the device. We remark that an alternative proof establishing the classical hardness of exact sampling could possibly be made using a postselection argument similar to the one outline in section 4.2 of (). As noted in () however, this approach has not been shown to provide a viable path toward the goal of showing hardness of approximate sampling. By establishing the average case hardness of approximating output probabilities, Theorem 4, takes a substantive step toward establishing the hardness of approximate sampling, even in the presence of noise. We now discuss the more realistic situation in which loss is the predominant, but not the sole, source of noise in a photonic experimental system. What can we say about the hardness of approximate sampling in such a situation? To begin with, let us draw on some intuition from RCS schemes acting on n qubits. Here, the additive error incurred in estimating output probabilities using the Stockmeyer algorithm is O(2−) with high probability (since this is the size of a typical output probability in an RCS experiment). In the presence of uncorrected noise, an error of O(2−) in the noisy output probability can be too large for hardness. For example, there is evidence that with gate-wise depolarizing noise, the probabilities will deviate from uniform by merely O(2−), where m [typically ω(n)] is the total number of gates (). This means that approximate-sampling hardness cannot be shown using these techniques since it is not hard to approximate the noisy probabilities any more. In this regime, the noisy distribution is exponentially close in total variation distance to the uniform distribution, rendering the approximate sampling task for the noisy distribution classically simulable. In the case of noisy GBS, the dominant noise model, namely, loss, leads to the vacuum state for a sufficiently deep network, which is again a distribution that is easy to classically sample from (similar to the uniform distribution in qubit RCS schemes). However, if we postselect on a certain minimal number of photons surviving, the distribution need not be easy to simulate. This postselection is efficient when the depth of the circuit scales poly-logarithmically in the number of modes. In this case, the quantum signal will be large enough so that even with an inverse exponential error, deviations from the easy distribution can be detected. This excludes the simulation algorithm that samples from an easy-to-simulate distribution such as the one uniform on every photon number sector with every sector sampled according to the ideal photon number distribution. Ruling out trivial algorithms is a necessary condition for approximate average-case hardness to hold. In summary, our results indicate that there might be ‘room in the middle’ in terms of gate depth and noise rates, where hardness of sampling might hold. This intuition lies at the heart of the high-dimensional architecture (presented below). This architecture is designed in such a way that only as few gate applications as necessary for hardness are executed, so that the leeway for noise to ruin the hardness of sampling is minimized. We stress, however, that at the moment, existing proof techniques do not suffice to make a claim of this nature. In certain regimes of noisy GBS, approximate sampling is known to be classically efficient ().

High-dimensional GBS and evidence for hardness

The discussion thus far here and in the literature has focused on the hardness of GBS with unitary transformations drawn randomly from the Haar measure. This requires implementing arbitrary unitary transformations, an onerous requirement experimentally. Reference () did not meet this requirement of being able to implement arbitrary unitary transformations as a result of the interferometer being a fixed nonprogrammable device. Furthermore, there is reason to believe that in the absence of error-correction methods for linear optics, scaling arbitrary programmable interferometers to large numbers of modes is infeasible. This is because implementing an arbitrary unitary transformation requires decomposing it into beam splitters and phase shifters, and assuming that they are all applied locally, this leads to a deep optical circuit, whose depth linearly scales with its size. Since photon loss scales exponentially with the circuit depth, these models necessarily become efficiently simulatable classically for sufficiently large numbers of modes (, ). On the other hand, naively reducing the depth without giving up gate locality is not an option for QCA either. This is because shallow 1D circuits comprising local interactions with logarithmically scaling depths can be efficiently simulated classically as these do not generate enough long-range entanglement (–). These results motivate a demonstration of QCA on random optical circuits with shallow depth but with gates that are long-range in 1D, for example on circuits with local interaction in higher than one dimensions. In such a setting, a potentially reduced amount of complexity due to the reduced depth would be compensated by the large long-range entanglement generation thanks to the inclusion of long-range interactions. Therefore, such architecture would suffer less noise buildup but still remains intractable for classical computers. Models with shallow-depth but with long-range (in 1D) interactions provide a natural approach to demonstrating QCA in qubit systems (). We address the challenge of the low-loss versus depth trade-off by introducing high-dimension GBS, where programmable nonlocal gates are exploited to generate entanglement between distant modes. We show how high-dimensional GBS can be implemented scalably using optical delay lines. Before presenting the new architecture, let us recall the relevant notation on GBS and discuss its physical implementation.

A programmable architecture for high-dimensional GBS

Now, we are ready to introduce high-dimensional GBS: A sampling task that retains the programmability of the photonic device can reduce decoherence to a level that prevents classical simulability and in which large amounts of multipartite entanglement can be generated. The last two requirements are to some extent at odds with each other: Specifically, achieving long-range interactions in fixed linear 1D geometries requires finding intermediary quantum systems to mediate interactions between far separated regions, which can lead to information leaking into the environment and require more challenging experimental conditions than all-optical experiments. A way around this challenge is to consider two- or higher-dimensional geometries where quantum systems can interact with each in more than one direction. While the Google QCA experiment () involved interactions in 2D, our proposal can leverage photonics to implement distant nonlocal interactions, which can be equivalently considered as interactions in two or even higher than two dimensions. More specifically, we show how the idea of using local interactions in high-dimensional spaces to generate large amounts of multipartite entanglement can be naturally imported into photonic quantum computing by using optical delay lines and fast, programmable optical switches. Before formally stating the problem of high-dimensional GBS, we provide intuition for how to construct high-dimensional lattices using minimal optical resources. For the sake of concreteness and ease of visualization, we consider the generation of a lattice of size a = 3 in D = 2 dimensions where the vertices represent modes, and the edges represent two-body gates. A quantum circuit to achieve this connectivity and a representation of the obtained lattice are shown in Fig. 1 (A and B), respectively. Note that when the bosonic modes are represented as wires in a usual quantum circuit diagram, the gates needed to prepare the state are highly nonlocal. This is because circuit diagrams provide a representation where the modes are arranged linearly (in this case, in the vertical direction of the page). To show how optical delays provide a natural way to program short and long ranged interactions, consider first our temporal modes (pulses) prepared in squeezed vacuum states arranged one after the other, traveling along a single spatial mode, as schematically shown in Fig. 1C.

Fig. 1.

Different representations of a D = 2D optical delay GBS instance with lattice size a = 3.

Different representations of a D = 2D optical delay GBS instance with lattice size a = 3.

(A) Circuit representation. The vertical lines with dots at the end represent beam splitters. (B) Bidimensional lattice representation. The vertices of the lattice represent the modes, while edges represent beam splitters. (C) Optical circuit representation. The modes are defined by time-bins traveling in a waveguide. The horizontal gray slabs at the bottom of the delays represent the beam splitters. The number of cycles C in a high-dimensional GBS instance corresponds to applying multiple times the gates contained in the green-dotted box in (A). This action physically maps to using concatenating C copies of the delays encircled in the green box in (C). Note that for simplicity, we have not shown the photon-number detectors used to probe the quantum state at the end of the circuit. We first consider how to achieve nearest-neighbor interactions using a delay line whose length equals the separation between the pulses. As mode i is in the delay line about to exit it, it will interfere with mode i + 1, which is about to enter the delay line. The beam splitter mediating the interaction between these two modes can be programmed, allowing us to effect two-mode gates between nearest neighbors. Such programmable and fast (i.e., with less than 50 ns spacing) beam splitters have been demonstrated using electro-optic modulation and have been used in the application of photonic quantum walks in the time domain (33, 34). Now, consider the second delay line, whose length is a = 3 times the separation between the pulses. In this case, as mode i is getting ready to exit the delay line, it will interfere with i + a in the beam-splitter gate keeping the delay line. This configuration allows interactions with range a = 3 between the modes in the quantum circuit diagram in Fig. 1. Note that this construction generalizes in a natural way to D dimensions. In particular, nearest-neighbor interactions in a D-dimensional space with a lattice points per dimension (corresponding to gates with range a in a circuit diagram) can be implemented using a circuit with D optical delay lines implementing delays by amounts {1, a, a2, …, a}. If the light is made to pass through D such multiple delay lines, with C passes, then the effective transformation is composed of C cycles of local interactions in a D-dimensional lattice or equivalently, C cycles of up to a-range gates in a circuit diagram. Having provided a quantum optical implementation of high-dimensional GBS, we are now ready to formalize it by specifying four quantities: the squeezing parameter r, the lattice dimension D, the lattice size a, and the number of cycles C. An (r, a, D, C)–high-dimensional GBS instance is constructed as follows: 1) Prepare M ≔ a single-mode–squeezed vacua ∣r⟩⊗. 2) For τ = 1, apply a beam-splitter V to mode i and i + τ, where i ∈ [0, M − τ]. 3) Repeat step 2 for τ = a for d = 0, …, D − 1. 4) Repeat steps 2 and 3 for a total of C times. Having a physical architecture to implement high-dimensional GBS, we can now write down a loss budget to account for the bulk of the decoherence affecting our system. Assume that the photon-number detectors used to probe our quantum state are limited by a rate of ν detections per second, for example, as a result of the detectors dead times. From this time scale, we deduce a length scale 𝓁 = v/ν, where v is the speed of light in the delay lines. We associate with the length scale an energy transmission constant ηunit − length, which is simply the total energy transmission resulting from a propagation over a total length of 𝓁. Let us first study the case C = 1. In this case, every mode will traverse D beam splitters (to access the D different delay lines) and will propagate a total distance of if a ≫ 1. We can approximate the total transmission to scale roughly aswhere ηBS is the beam-splitter transmissivity for programmable beam splitters based on electro-optic modulation. Note that in this case, the loss scales subexponentially with the total number of modes. To allow two or more circulations, one can consider C ≥ 2 copies of the original D delay lines, giving now an updated loss budget in which the modes traverse a length proportional to Ca = CM1 − 1/ and will pass through CD beam splitters, still leading to subexponential loss accumulation. An alternative to these C copies of the delay lines is to consider a recirculation loop similar to that proposed in (), which reroutes the output of the last delay line into the input of the first one. The delay line used to implement the recirculation loop holds any modes that are not interfering inside the delay lines. If the recirculator has a loss per unit length ηunit − recirc, the net loss scales as where . Thus, depending on the exact setting, for a fixed C, the losses scale either exponentially (using recirculators) or subexponentially (considering multiple copies of the D loops) with the number of modes. We note that with current fiber-optic and photon number–resolving technology, ηunit − length can be as high as 0.998; ηBS values of 0.9 are expected or are observed in state of the art experiments such as in (). With these, the transmission of an interferometer with parameters (a = 15, D = 2, C = 2) can be above 0.70 and above 0.74 for (a = 6, D = 3, C = 1). These values promise an order or magnitude or more enhancements in loss values as compared to those expected in fully programmable GBS devices (). As noted in (), interferometers implemented using loops will typically have unbalanced losses. The numbers quoted above assume the lossiest interferometer implementable in a loop-based system, which is precisely the one in which each and every mode is fully transmitted into each loop. From the formal description of high-dimensional GBS, the covariance matrix of the generated Gaussian state can be calculated in the usual manner. In particular, we only need to specify the unitary matrix describing steps (2) to (4) above. This unitary matrix is given bywhere B(V) is an M × M unitary matrix that acts like the locally Haar-random beam-splitter V in the subspace of modes i and j and like the identity elsewhere. We denote by 𝒰 the ensemble of linear-optical unitaries applied this way. In Fig. 2, we show heatmaps of the unitary matrices associated with two typical instances from the distribution 𝒰 over high-dimensional GBS instances. Note that the structure of circuits considered allows for light from the first mode to be observed in any of the later modes, which leads to a large light cone that is somewhat different from the efficiently simulable circuits considered recently in (). From the description of the unitary matrices and the squeezing parameters, we obtain that the complex-valued adjacency matrix (as defined above) of the Gaussian state is dense, full-rank, and given by A = tanh (r)UU.

Fig. 2.

Absolute values of the entries of the unitary matrices associated with two high-dimensional GBS instances drawn from U.

Absolute values of the entries of the unitary matrices associated with two high-dimensional GBS instances drawn from U.

On the left, we show an (a = 6, D = 3, C = 1) instance, and on the right, we show an (a = 15, D = 2, C = 2) instance. Note that we explicitly color the zero entries of the unitary white; thus, the color scale is discontinuous at this end. While implementing a time-domain reconfigurable loop architecture as described above is not a straightforward task, several groups have performed experiments with tens of modes interfering in time-domain multiplexed configurations. These include time-bin () and temporal-to-spatial–encoded () boson sampling experiments () and controllable photonic random walk over multiple time-bins (, ). Moreover, recent experiments have shown that it is possible to operate with very high phase stability (), high quantum-efficiency photon-number detection (), and very low loss reconfigurable interferometric elements (). Last, for the purpose of calculating outcome probabilities, squeezed states can be considered in the Fock basis as qudits that are entangled by the beam-splitter operations. This process, as with any other quantum circuit, can be represented as networks of tensors (). In more detail, here, the qudits are initially single-index tensors (vectors) that are contracted with four-index tensors representing the beam splitters to build an open tensor network (TN), which can then be contracted to obtain the tensor of the final state. The TN representing the state can be used to calculate probability amplitudes of measurements when the output indices of the TN are contracted with vectors representing measurement outcomes. Similar TN-based techniques have been successful at delineating the QCA frontier in the context of RCS, and together with Hafnian-based methods, these will serve a similar purpose for high-dimensional GBS.

Hardness for computing noisy probabilities in high-dimensional GBS

Here, we now argue for the hardness of computing output probabilities for the noisy, high-dimensional GBS setup. In particular, we show that hardness is present even in shallow depth noisy high-dimensional GBS architectures. This is in contrast to the results discussed earlier, where no restriction is made on the depth. To do this, we simply observe that the previous argument for worst-case hardness, which depends on the noise being local and error detectable, continues to hold for the limited-depth setup (). For average-case hardness of computing noisy probabilities, we again use a worst-to-average case reduction. However, the polynomial interpolation in this case is different since a random instance is not Haar-distributed any more, but rather according to 𝒰, the distribution over random instances of high-dimensional GBS. To explain further, consider the usual interpolation X(t) = (1 − t)X + tY, where X(0) = X is drawn from 𝒰, and X(1) = Y is the matrix corresponding to a worst-case high-dimensional GBS instance. In this case, there is no guarantee that the interpolated matrices X(t) also correspond to high-dimensional GBS instances of small depth. We get around this issue by choosing a gate-wise interpolation that is similar to that seen in RCS (, ). We first define the problem of computing output probabilities of a restricted-depth high-dimensional GBS architecture. (ϵ, η)-HighDimensional-NoisyGBS-Probability. Input A noisy GBS instance drawn from 𝒰 that can be implemented in D dimensions with a constant number of cycles C = O(1) with noise parameter η, and a description of a collision-free outcome n with N = poly(M) photons. Output With probability δ over instances, an estimate of Pr (n) to additive error ϵ. With probability 1 − δ, an arbitrary output. Similar to the previous results, we can again obtain an average-case hardness result that we state here and prove in the Supplementary Materials.

Theorem 5

There exists a noise threshold η* and a sufficiently large polynomial such that the problem (ϵ, η)-High-Dimensional-NoisyGBS-Probability is -hard under PH reductions for any constant δ > 3/4, eta ≤ η*, and .

QCA frontier for high-dimensional GBS

The evidence presented above for the hardness of high-dimensional GBS comes from complexity-theoretic arguments, which are asymptotic in nature, i.e., they only specify how the hardness of a certain computation scales as the problem size is increased. For a finite-sized device, we now address a complementary but more immediate question: How much actual computational power would a classical adversary need to generate samples similar to those from finite-sized noisy GBS devices? This question can be addressed with different assumptions about the classical adversary. The experiment can be benchmarked either against simulations that try to match a reasonable model of the experiment (constrained adversary) or against simulations that merely try to spoof a given test (unconstrained adversary). The latter approach would be more rigorous as it requires making fewer assumptions, but coming up with good spoofing methods is a problem beyond the scope of this work and should be seen as an ongoing community effort (). Similar to the approach of the Google and USTC supremacy experiments (46, 47), we focus on the former approach—with a classical adversary producing samples according to a noisy model distribution—because these samples are likely to perform no worse than the actual device in suitable verification tests. In other words, we assume a specific model of the imperfect GBS device, and we demand that the classical adversary generate samples that have a probability distribution that is sufficiently close in total variation distance to the probability distributionof this model. We note, however, that the chosen model might not have been verified against the actual experiment as this sample-efficient noise-model verification of QCA experiments is a challenging problem, especially for boson sampling and GBS. We perform this benchmarking by simulating high-dimensional GBS with state-of-the-art algorithms on the current best supercomputers. In particular, we consider the fastest algorithms based on computing probability amplitudes via Hafnians and via TN contractions. The former, Hafnian-based algorithms have been optimized for simulating GBS and are not restricted to high-dimensional GBS (). The latter TN algorithms are well-suited for high-dimensional qubit circuits with shallow depth (). We note that Qi et al. () also provide a path to simulating lossy GBS if the losses scale exponentially with the system size, but these results are not applicable for high-dimensional GBS, where the losses can scale subexponentially. By benchmarking against these algorithms, we demonstrate that high-dimensional GBS experiments feasible with current optical technology are well beyond the reach of the biggest supercomputers.

DISCUSSION

In this work, we have proposed a new experimental architecture for GBS and provided asymptotic evidence for the hardness of GBS in this specific context, bridging the gap between theory and experiment. We have also benchmarked today’s best-known algorithms at simulating such an experiment, obtaining complementary evidence that a reasonably sized setup would outperform classical supercomputers at this task. Still, some theoretical questions are outstanding. 1) We have been able to show that two plausible conjectures in random matrix theory allow us to obtain the hiding property for a noiseless GBS setup, without restrictions on the number of active modes. Can we obtain a similar hiding property for the high-dimensional GBS setup introduced here? Is this also possible in the presence of noise? Answering these questions is crucial for extending the hardness of computing output probabilities to the hardness of approximate sampling from experimentally realizable distributions. 2) Informally, the anticoncentration conjecture for boson sampling (or GBS) states that the output probability of a random instance is unlikely to be very small. If this conjecture was true, then now-standard arguments can show that the output probability corresponding to an approximate sampler is, with high probability, a good multiplicative estimate to the ideal output probability. Proving this conjecture true, in either the case of boson sampling or GBS, would give increased evidence to support the goal of proving QCA via photonics. A proof of such a conjecture is challenging due to the fact that tools of unitary designs () are presumably unavailable in the bosonic setting (). 3) Notwithstanding, it would be insightful to compute the second moments for the distribution we have found to characterize GBS problem instances. These moments thus characterize the so-called collision probability of seeing the same outcome twice in an experiment, which in turn can be related not only to anticoncentration but also the verifiability of approximate GBS from samples, thus shedding some light on the structure of the GBS output distribution. 4) An important task in demonstrating QCA is to verify that the performed experiment indeed contains a nontrivial quantum signal that cannot be efficiently spoofed. The Google QCA demonstration relied on linear cross entropy benchmarking fidelity, and the USTC experiment used a heavy-output generation (HOG) ratio test as an alternative path to verifiable hardness. Whether the HOG-ratio test can be spoofed efficiently by a classical adversary such as the algorithms considered in (, ) is an open problem. 5) The recent result in () presents a classical algorithm for the simulation of high-dimensional boson sampling experiments in certain regimes. As described, this algorithm is not applicable to the architecture we propose here. Extending the algorithm to be relevant to the present architecture is an open problem. 6) With current optical technology, loss is the dominant source of noise in any GBS experiment. Consequently, we were motivated to obtain hardness results for computing the output probabilities of a GBS experiment in the presence of significant photon loss. It is natural to investigate whether similar hardness results can be obtained in the presence of other possible sources of experimental noise, such as mode mismatch, multiple Schmidt modes, interferometer phase drift, and detector dark counts. 7) It is a challenge to the community, after all, to relate boson sampling closer to practically important computational tasks and to identify new applications. In summary, this work brings the demonstration of QCA on a programmable photonics device closer to reality. It addresses previously outstanding theoretical challenges in the field by providing stronger evidence for the hardness of GBS. Crucially, we have presented a novel architecture for high-dimensional GBS using optical delay lines that promises low levels of noise without compromising on its programmability. We benchmarked this architecture against the best available classical simulation algorithms and found that already, experiments involving a moderate number of modes are far beyond reach for those algorithms. We close by briefly commenting on the experimental prospects of realizing high-dimensional GBS. Since high-dimensional GBS can be implemented in the time domain according to the scheme presented in Fig. 1, only a single squeezer and a single detector are required. If multiple detectors are available, these can be demultiplexed using optical switches to increase the effective repetition rate of the experiment and reduce the length of the delay lines. Especially promising is the case of D = 3, a = 6, C = 1, which can be implemented with only three optical delay lines and three each of reprogrammable beam splitters and phase shifters. Assuming reasonable values of squeezer out-coupling losses, free space to fiber coupling loss, and detector efficiency (–, ), we estimate that such a setup can be built using current optical technology with around 40% transmission, higher than that enabled by the ultralow nonprogrammable loss interferometer in the USTC experiment. Such a setup would enable the largest demonstration of QCA, yet with a mean detected photon number of 80 in a programmable device with 216 total modes. We hope that this work stimulates these developments.

MATERIALS AND METHODS

Computational task: Sampling from lossy GBS with finite Fock cutoff

Before looking into concrete strategies for the simulation of GBS, we detail the computational task performed by the GBS device and discuss some differences between the task and our simulation. The experimental device samples from a lossy GBS distribution with a finite Fock-basis cutoff, which results from detector limitations. To identify a range of parameters where this task is hard to simulate classically, we benchmark it against classical simulations. The simulations that we compare are somewhat different from the exact task performed by the experiment but in such a way that is advantageous to the classical simulations, thus providing stronger evidence for the large computational cost of high-dimensional GBS. We now discuss these differences. The first point of difference is the Fock cutoff, i.e., the number of Fock or photon-number levels considered in each mode. Both Hafnian and TN simulations are performed in Fock basis, and their performance is thus sensitive to the Fock cutoff. This cutoff must be chosen carefully because the squeezed state inputs in GBS have nonzero support on high Fock numbers (which could be infinite in the ideal case) (). For Hafnian-based simulations, the Fock cutoff c will lead to a constant prefactor 2 (2) in the runtime for calculating mixed-state (pure state) probabilities that would appear in sampling methods. Similarly, for TN simulations, this cutoff sets the qubit dimension in the calculation, which is also the base of the exponential function describing the time and space cost of contracting the TN. Note that squeezed states of light require that we use local Hilbert spaces with at least dimension 3 since truncating a squeezed state to the first two levels of the Fock ladder will project it into the vacuum as 〈1 ∣ r〉 = 0. Furthermore, using a Fock cutoff of 3 in the beam-splitter gates leads to highly inaccurate simulations as the beam-splitter transformations on a limited Fock subspace no longer preserve photon numbers. In other words, choosing higher Fock cutoffs will lead to more accurate but more expensive simulations. Hence, we use a cutoff of 4 to give a conservative estimate on the computational cost, although this cutoff would lead to inaccurate classical simulations. A second point of difference is that our simulations deal with the case of simulating pure states with photon numbers equal to the lossy distribution. This is a reasonable simplification since, as shown in (), simulating pure or mixed state GBS has the same complexity as calculating a number of pure-state probability amplitudes proportional to the number of modes in the system. Before describing the effect of loss on the two simulation methods, we discuss the effect of loss on the number of detected photons. In Fig. 3, we plot the lossless and lossy (transmission η = 0.5 ≈3 dB loss) distribution for M = 216 modes and squeezing parameter r = 0.8. These parameters have been chosen to correspond with an (r = 0.8, a = 6, D = 3, C = 1) high-dimensional GBS instance with experimentally reasonable loss budgets. The squeezing parameter r = 0.8 is chosen to be within reach of current sources of single-Schmidt mode degenerate squeezed light (). Note that the lossy distribution has smaller mean and variance than the lossless one (), indicating that it becomes easier to simulate a lossy distribution as the transmission η is decreased. For example, the outcome with the highest probability in the lossless distributionhas a probability of 7.28 × 10−8 under the lossy distribution. The leftward shift of this distribution will, in general, be present whenever loss acts on a pure state. For M identical squeezers (with squeezing parameter r) undergoing loss by energy transmission η, the mean and variance contract at least proportionally to ηconfirming our intuition and, moreover, showing that the prevailing sources of decoherence in photonic sampling problems behave differently from the ones in RCS implemented in superconducting circuits, where noise makes the output probability distribution become uniform ().

Fig. 3.

Distribution of the total photon number for M = 216 single mode–squeezed states with squeezing parameter r = 0.8.

Distribution of the total photon number for M = 216 single mode–squeezed states with squeezing parameter r = 0.8.

We assume a total transmission of η = 0.5 (corresponding to roughly 3 dB of loss) for the lossy distribution. Note that the lossless distribution has no support on odd numbers of photons, which explains why visually it looks as if it has more area under the curve. We now focus on the case of Hafnian-based algorithms. The cost of calculating the relevant probabilities depends only on the number of photons detected. Calculating a photon-number probability of a mixed state is roughly quadratically more expensive than calculating a pure state probability of an event with the same number of photons (). However, the cost of sampling pure and mixed states is similar. This is because lossy GBS states are classical mixtures over a displacement parameter of pure Gaussian states. Therefore, it is possible to sample from a lossy state by sampling from the convex hull parametrized by the displacement parameter and then sampling from the pure state. Thus, sampling lossy GBS states has similar computational cost as sampling pure states with the same number of photons. Likewise, for the TN-based algorithms, the cost for mixed state calculations would scale at least quadratically worse as compared to pure state calculations. This is because twice as many tensors are involved in a mixed state calculation, analogous to the quadratic overhead of keeping track of the density matrix as compared to a pure state. Note that for noisy RCS of qubits, one can trade fidelity for sampling speed (). As opposed to GBS, this improvement is possible because in RCS, the amplitudes of the different Feynman-like paths that appear when slicing through two-body gates in the circuit are comparable. Moreover, this improvement is useful as long as the Schmidt-rank of the two-body gates used to generate entanglement is small, which is not the case for the beam splitter. Furthermore, the state vectors associated with two different paths are approximately orthogonal. A final point of difference between our simulations and the actual experiment is that while our runtime estimates are for the calculation of the GBS probabilities, an actual experiment samples from this distribution. Despite this difference, our simulations allow a fair benchmarking of the quantum device because current state-of-the-art algorithms have similar complexities of sampling and calculating probabilities. We moreover give the classical adversary an extra advantage in that we allow it to assume that only pure-state output probabilities need to be calculated for sampling, as opposed to the quadratically slower mixed-state output probabilities since, as explained above, mixed-Gaussian states are convex mixtures of pure ones. In summary, we provide maximal advantage to a classical adversary by choosing a low Fock cutoff, by performing pure state simulations with low photon numbers, and by estimating time for computation rather than sampling (which is at most polynomially slower using currently known methods). This advantage ensures that despite improvements in the classical algorithms, the space of parameters that are hard to simulate classically remains so.

Hafnian-based algorithms

Consider now the probability amplitudes of n-photon events by evaluating the Hafnian. Similar benchmarkings have been performed in the past for the calculation of permanents () (relevant to boson sampling) and Torontonians () (relevant to GBS with threshold detectors). For either of these two tasks, the time complexity of calculating a probability corresponding to an n-photon event scales like O(poly(n)2), which is quadratically worse than for GBS, which scales as O(poly(n)2). For the case of boson sampling, this difference stems from the fact that any probability amplitude with n photons maps exactly to a GBS instance with 2n photons. For the case of threshold detection, it stems from the fact that one cannot assign probability amplitudes to a measurement that is not rank-1, like the positive operator-valued measure representing a “click,” which is a coarse-graining of all the projectors with nonzero photons. In any case, for either of these tasks, benchmarks up to n = 50 have been carried out requiring on the order of 2 hours for boson sampling using Tianhe-2 () and on the order of 20 hours for GBS with threshold detectors in Sunway TaihuLight (). If the matrix has no special property, such as being low rank, nonnegative, banded, or sparse, then the best known algorithms to calculate the Hafnian will scale like O(n32) for a matrix of size n × n. The adjacency matrices generated in high-dimensional GBS do not have any of these properties. In Fig. 4, we show the results of our benchmarking by implementing the Hafnian algorithm from () using a task-based approach implemented in (). Even for shared-memory CPU architectures, our new task-based implementation achieves a speed up of about 5× with respect to the current OpenMP implementation described in ().

Fig. 4.

The time cost of calculating a Hafnian of size n in double precision.

The time cost of calculating a Hafnian of size n in double precision.

The stars indicate actual sizes computed in the Niagara supercomputer (). The blue line is a fit to tNiagara(n) = cNiagaran32 with the only fitting parameter cNiagara = 5.42 × 10−15 s. The standard deviation of fitting parameter cNiagara is 1.2 × 10−16 s, which would give error bands thinner than the width of the line. We find an equivalent expected time in Fugaku, among the most powerful supercomputers, by considering the ratio of their Rmax scores (maximal LINPACK performance achieved) giving their performance in number of floating point operations per second. The conversion factor between the left scale for Niagara and the right scale for Fugaku is the ratio of Rmax values of Fugaku and Niagara, or equivalently, cNiagara/cFugaku = 122.8. Note that since the computation of Hafnians can be broken into the independent calculation of an exponential number of summands (known as an embarrassingly parallel computation), this scaling is expected to be quite accurate. On the basis of these benchmarks, we estimate that Fugaku, among the current most powerful supercomputers in the world, would require around 14 hours to compute the Hafnian of a 100 × 100 matrix. Thus, if the total photon-number distribution of a given GBS setup has significant support past 100 photons, there will be a proportionally significant number of probability amplitudes that will require at least 14 hours in Fugaku to be computed. We can get an estimate of the average time it would take to generate a sample by averaging the time it takes to generate a sample with n photons over the probability distribution of n photons. Using the same averaging procedure but applied to clicks instead of photons and assuming an overhead of 100 between computing probabilities and generating samples, the authors of () estimate that Fugaku would require around 1.9 × 1016 s to generate roughly the number of samples that their experiment produces in 200 s at megahertz clock speeds. For the lossy instance considered in Fig. 3, we find that, on average, Fugaku would require s to generate one sample. In this estimate, we do not extend the sum to all possible photon numbers but only up to those that have a chance of more than 10−7 to occur, which happens at nmax = 166 and, moreover, assumes a reasonable overhead of Famplitudes/samples = 100 for the calculation of probability amplitudes versus samples. As noted earlier, the complexity of generating a sample for a mixed or pure Gaussian state is proportional to that of calculating a probability amplitude () and the number of modes (in our case, 216), thus, using a factor of 100 is likely an underestimate. To match the number of samples generated in seconds in a quantum device operating at 10 KHz would require 6.8 × 1015 s. Thus, the computational cost of an (r = 0.8, a = 6, D = 3)–high-dimensional GBS instance, with 3 dB of loss is on par with the expected classical complexity of the USTC experiment with the added advantage of being programmable and much closer to the collision-free regime: The expected classical complexity of an experiment, such as the one just described, is similar to the expected time complexity of the USTC experiment (). However, besides the obvious disadvantage of programmability, their experiment is much farther away from the collision-free regime, in which computational complexity theoretic results guarantee the intractability of GBS. For example, if the USTC experiment had been performed with PNR detectors, we would find that their photon number distribution has mean and standard distribution 83.3 ± 20.1 over 100 modes [where we assume the squeezing parameters quoted in () and a net transmission of η = 0.3]. Note that even within the first SD, one is already beyond the total number of modes. This should be contrasted with a distribution such as the one in Fig. 3, for which we find 85.2 ± 13.9 over 216 modes.

TN methods

Another promising method to calculate the probability amplitude of high-dimensional GBS is using TN contractions. This has been the strategy of choice for classical adversaries to superconducting circuits performing RCS (, ). In this section, we find that TN algorithms can simulate 2D lossy GBS experiments on 200 modes in a reasonable amount of time. This motivates going to a higher dimension, D = 3. We find that after making several allowances to the classical algorithm and accounting for tremendous improvements in classical hardware, one of the fastest supercomputers in the world, Fugaku, would take ∼1020 s to simulate a 3D experiment on 216 modes running for 200 s. Any given quantum circuit can be written as a network of tensors such that each input quantum state is a rank-1 tensor, each gate acting on 𝓁 components is a rank-2𝓁 tensor, and each measurement operator is a rank-1 tensor (). The probability amplitude for the quantum circuit can then be calculated by contracting the TN, i.e., by summing over all the indices of the TN. However, there are multiple different orderings (paths) in which the different indices of a TN can be contracted, which influence the contraction runtime. For some of the first classical benchmarking proposals of random circuits, the contraction paths were handpicked by the researchers (). More recently, excellent randomized algorithms have been introduced to find contraction paths that have been shown to improve on previous results (). A second important practical consideration for TN contraction is that there is a trade-off between space and time complexity. That is, one can speed up substantially the contraction of a TN at the expense of assuming access to large amounts of memory. A systematic way to reduce the memory footprint of a TN contraction (at the expense of decreasing the speed of the computation) is to use a technique known as slicing, also known as variable projection or bond cutting (). Unlike for Hafnian methods where one does not need to specify much of the structure of the circuit, this information is vital in understanding the performance and limitations of TN simulations. As before, we fix the squeezing parameter r = 0.8 and assume net end-to-end transmission of η = 0.5. With these parameters and first assuming D = 2, we need at least a = 14 lattices sites per dimension to get to a mean photon number at the detectors (i.e., after loss) of . For a single-cycle C = 1, we use a TN contraction algorithm called cotengra () together with Fugaku’s LINPACK benchmark to find that this supercomputer would require less than 100 μs to contract the TN. Thus, for 2D instances up to this size, it is necessary to consider more than one cycle, implying the construction of either D extra delay lines or adding a circulator, both of which will adversely affect the net transmission. This motivates considering the next dimension, D = 3. For this case, and fixing the number of cycles to C = 1, we find that we need at least a = 6 to have a mean photon number on the order of 80 at the detectors, which would provide a nontrivial support on photon numbers that are beyond the reach of the Hafnian algorithms described above. In Table 1, we show the time it would take Fugaku to contract different 3D GBS circuits for different lattice sizes.

Table 1.

Benchmarks for a D = 3 high-dimensional GBS instance with minimal Fock space cutoff c = 4.

Number of lattice points (a)	Expected time in Fugaku (s)	Size of the largest tensor
4	1.65 × 10⁻¹	4.39 × 10¹²
5	4.56 × 10⁵	4.61 × 10¹⁸
6	2.11 × 10¹⁴	7.92 × 10²⁸

Benchmarks for a D = 3 high-dimensional GBS instance with minimal Fock space cutoff c = 4.

The first column gives the number of lattice points, from which the number of modes follows M = a3. The second column is the expected run time in Fugaku. This time is obtained by estimating the number of floating point operations required to contract the tensor using cotengra () and converting this into a time by using the Rmax floating point operation per second score for Fugaku. Note that cotengra implements randomized algorithms; thus, for each problem size, we run it 200 times and confirm that, after the first 100 runs, there is no significant variation in the best score found. The last column gives the number of elements of the largest tensor ever needed to be stored in memory during the contraction. Note that this places restrictions on the RAM available in each of the nodes of a supercomputer. In particular, the nodes in Fugaku have up to 32 gigabyte of RAM, allowing to store on the order of 4 × 109 64-bit floating point numbers; thus, an a = 6 instance will far exceed the required capacity of a single node requiring distributed storage and thus subsequent hit in efficiency due to communication complexity. Note that even allowing for a hypothetical scenario in which the random access memory (RAM) of each of its nodes has been expanded by about 19 orders of magnitude, it would take Fugaku on the order 2.11 × 1014 s to calculate a contraction with a minimal (and highly inaccurate) cutoff of 4. In reality, it is infeasible to fit the computation in the memory or even the hard disks of individual nodes, so slicing would be required, which can lead to astronomical overheads over this idealized estimate. Even without this overhead and assuming that generating a sample is as expensive as calculating a probability, simulating a 200-s 10-kHz experiment would require over 4 × 1020 s. Of course, we remind the reader once more that a direct calculation of output probabilities is not what the experiment does but only what one model of the experiment, and there may be more efficient methods for simulating a verifiable experiment. On the basis of the evidence presented above, a high-dimensional GBS instance with squeezing parameter r = 0.8, in D = 3 dimensions, with a = 6 modes per dimension or a total of 216 modes and a single cycle C = 1 is well beyond the capabilities of current simulation methods based either on Hafnian calculations or TN contractions, even when losses of around 3 dB (η ∼ 0.5) are present. This immense computational gap is present even after the fact that we allow the classical computer to ignore substantial overheads in terms of cutoff, number of modes, and samples-to-amplitudes conversion. These experimental parameters we propose are within the reach of current photonics technology, and their implementation using time-domain multiplexing can be achieved with a significantly reduced number of components. Note that after this submission, we became aware of a recent work () on an upgraded version of the experiment done in ().

15 in total

1. Photon distribution for one-mode mixed light with a generic Gaussian Wigner function.

Authors:
Journal: Phys Rev A Date: 1994-04 Impact factor: 3.140

2. Quantum circuits with many photons on a programmable nanophotonic chip.

Authors: J M Arrazola; V Bergholm; K Brádler; T R Bromley; M J Collins; I Dhand; A Fumagalli; T Gerrits; A Goussev; L G Helt; J Hundal; T Isacsson; R B Israel; J Izaac; S Jahangiri; R Janik; N Killoran; S P Kumar; J Lavoie; A E Lita; D H Mahler; M Menotti; B Morrison; S W Nam; L Neuhaus; H Y Qi; N Quesada; A Repingon; K K Sabapathy; M Schuld; D Su; J Swinarton; A Száva; K Tan; P Tan; V D Vaidya; Z Vernon; Z Zabaneh; Y Zhang
Journal: Nature Date: 2021-03-03 Impact factor: 49.962

3. Scalable boson sampling with time-bin encoding using a loop-based architecture.

Authors: Keith R Motes; Alexei Gilchrist; Jonathan P Dowling; Peter P Rohde
Journal: Phys Rev Lett Date: 2014-09-18 Impact factor: 9.161

4. Boson Sampling with 20 Input Photons and a 60-Mode Interferometer in a 10^{14}-Dimensional Hilbert Space.

Authors: Hui Wang; Jian Qin; Xing Ding; Ming-Cheng Chen; Si Chen; Xiang You; Yu-Ming He; Xiao Jiang; L You; Z Wang; C Schneider; Jelmer J Renema; Sven Höfling; Chao-Yang Lu; Jian-Wei Pan
Journal: Phys Rev Lett Date: 2019-12-20 Impact factor: 9.161

5. Gaussian Boson Sampling.

Authors: Craig S Hamilton; Regina Kruse; Linda Sansoni; Sonja Barkhofen; Christine Silberhorn; Igor Jex
Journal: Phys Rev Lett Date: 2017-10-23 Impact factor: 9.161

6. Proposal for Quantum Simulation via All-Optically-Generated Tensor Network States.

Authors: I Dhand; M Engelkemeier; L Sansoni; S Barkhofen; C Silberhorn; M B Plenio
Journal: Phys Rev Lett Date: 2018-03-30 Impact factor: 9.161

7. Deterministic generation of a two-dimensional cluster state.

Authors: Mikkel V Larsen; Xueshi Guo; Casper R Breum; Jonas S Neergaard-Nielsen; Ulrik L Andersen
Journal: Science Date: 2019-10-18 Impact factor: 47.728

8. Quantum supremacy using a programmable superconducting processor.

Authors: Frank Arute; Kunal Arya; Ryan Babbush; Dave Bacon; Joseph C Bardin; Rami Barends; Rupak Biswas; Sergio Boixo; Fernando G S L Brandao; David A Buell; Brian Burkett; Yu Chen; Zijun Chen; Ben Chiaro; Roberto Collins; William Courtney; Andrew Dunsworth; Edward Farhi; Brooks Foxen; Austin Fowler; Craig Gidney; Marissa Giustina; Rob Graff; Keith Guerin; Steve Habegger; Matthew P Harrigan; Michael J Hartmann; Alan Ho; Markus Hoffmann; Trent Huang; Travis S Humble; Sergei V Isakov; Evan Jeffrey; Zhang Jiang; Dvir Kafri; Kostyantyn Kechedzhi; Julian Kelly; Paul V Klimov; Sergey Knysh; Alexander Korotkov; Fedor Kostritsa; David Landhuis; Mike Lindmark; Erik Lucero; Dmitry Lyakh; Salvatore Mandrà; Jarrod R McClean; Matthew McEwen; Anthony Megrant; Xiao Mi; Kristel Michielsen; Masoud Mohseni; Josh Mutus; Ofer Naaman; Matthew Neeley; Charles Neill; Murphy Yuezhen Niu; Eric Ostby; Andre Petukhov; John C Platt; Chris Quintana; Eleanor G Rieffel; Pedram Roushan; Nicholas C Rubin; Daniel Sank; Kevin J Satzinger; Vadim Smelyanskiy; Kevin J Sung; Matthew D Trevithick; Amit Vainsencher; Benjamin Villalonga; Theodore White; Z Jamie Yao; Ping Yeh; Adam Zalcman; Hartmut Neven; John M Martinis
Journal: Nature Date: 2019-10-23 Impact factor: 49.962

9. Probing measurement-induced effects in quantum walks via recurrence.

Authors: Thomas Nitsche; Sonja Barkhofen; Regina Kruse; Linda Sansoni; Martin Štefaňák; Aurél Gábris; Václav Potoček; Tamás Kiss; Igor Jex; Christine Silberhorn
Journal: Sci Adv Date: 2018-06-29 Impact factor: 14.136

10. On-demand photonic entanglement synthesizer.

Authors: Shuntaro Takeda; Kan Takase; Akira Furusawa
Journal: Sci Adv Date: 2019-05-17 Impact factor: 14.136

2 in total

1. Loops simplify a set-up to boost quantum computational advantage.

Authors: Daniel Jost Brod
Journal: Nature Date: 2022-06 Impact factor: 69.504

2. Quantum computational advantage with a programmable photonic processor.

Authors: Lars S Madsen; Fabian Laudenbach; Mohsen Falamarzi Askarani; Fabien Rortais; Trevor Vincent; Jacob F F Bulmer; Filippo M Miatto; Leonhard Neuhaus; Lukas G Helt; Matthew J Collins; Adriana E Lita; Thomas Gerrits; Sae Woo Nam; Varun D Vaidya; Matteo Menotti; Ish Dhand; Zachary Vernon; Nicolás Quesada; Jonathan Lavoie
Journal: Nature Date: 2022-06-01 Impact factor: 69.504

2 in total