Jacob F F Bulmer1, Bryn A Bell2, Rachel S Chadwick1,3, Alex E Jones1, Diana Moise4, Alessandro Rigazzi4, Jan Thorbecke5, Utz-Uwe Haus6, Thomas Van Vaerenbergh7, Raj B Patel2,8, Ian A Walmsley2, Anthony Laing1. 1. Quantum Engineering Technology Labs, University of Bristol, Bristol, UK. 2. Ultrafast Quantum Optics Group, Department of Physics, Imperial College London, London, UK. 3. Quantum Engineering Centre for Doctoral Training, University of Bristol, Bristol, UK. 4. Hewlett Packard Enterprise, Zurich, Switzerland. 5. Hewlett Packard Enterprise, Amstelveen, Netherlands. 6. HPE HPC/AI EMEA Research Lab, Wallisellen, Switzerland. 7. Hewlett Packard Labs, HPE Belgium, Diegem, Belgium. 8. Department of Physics, University of Oxford, Oxford, UK.
Abstract
Identifying the boundary beyond which quantum machines provide a computational advantage over their classical counterparts is a crucial step in charting their usefulness. Gaussian boson sampling (GBS), in which photons are measured from a highly entangled Gaussian state, is a leading approach in pursuing quantum advantage. State-of-the-art GBS experiments that run in minutes would require 600 million years to simulate using the best preexisting classical algorithms. Here, we present faster classical GBS simulation methods, including speed and accuracy improvements to the calculation of loop hafnians. We test these on a ∼100,000-core supercomputer to emulate GBS experiments with up to 100 modes and up to 92 photons. This reduces the simulation time for state-of-the-art GBS experiments to several months, a nine-orders of magnitude improvement over previous estimates. Last, we introduce a distribution that is efficient to sample from classically and that passes a variety of GBS validation methods.
Identifying the boundary beyond which quantum machines provide a computational advantage over their classical counterparts is a crucial step in charting their usefulness. Gaussian boson sampling (GBS), in which photons are measured from a highly entangled Gaussian state, is a leading approach in pursuing quantum advantage. State-of-the-art GBS experiments that run in minutes would require 600 million years to simulate using the best preexisting classical algorithms. Here, we present faster classical GBS simulation methods, including speed and accuracy improvements to the calculation of loop hafnians. We test these on a ∼100,000-core supercomputer to emulate GBS experiments with up to 100 modes and up to 92 photons. This reduces the simulation time for state-of-the-art GBS experiments to several months, a nine-orders of magnitude improvement over previous estimates. Last, we introduce a distribution that is efficient to sample from classically and that passes a variety of GBS validation methods.
A quantum advantage is typically considered to be achieved when a quantum experiment outperforms a classical computer at a computational task, with strong evidence of an exponential separation between quantum and classical run times. On the basis of plausible complexity conjectures, boson sampling (, ) is a class of photonic experiments with potential to deliver quantum advantage. Measurement of correlated photon detection events constitutes sampling from a distribution with probabilities that correspond to classically intractable matrix functions. In Gaussian boson sampling (GBS) (), squeezed states are injected into an interferometer, with subsequent photon detection producing correlation events that are related to matrix loop hafnians (, ). A major advancement in experimental photonics was recently reported, in which a GBS experiment composed of 100 optical modes, named Jiŭzhāng (), observed up to 76 photon detection events and claimed a quantum advantage. Once assembled, Jiŭzhāng ran in 200 s, while the best available classical algorithms running on the most powerful contemporary supercomputer would require 600 million years to simulate Jiŭzhāng.While the theoretical proposal for GBS assumed the use of photon number–resolving detectors (PNRDs), experimental implementations frequently use threshold detectors, which click to distinguish between 0 and at least 1 photon. This does not affect the complexity of GBS provided that collisions (multiple photons arriving at the same detector) are unlikely (). These events were assumed improbable and were neglected in the original proposal (). Jiŭzhāng both uses threshold detectors and operates in a regime where there is a high probability of collisions between photons. The classical complexity for GBS with PNRDs in regimes with high collisions has recently been studied, albeit in a restricted setting (). However, we are not aware of any attempts at benchmarking in this regime or any work covering investigating the complexity of collisions when measured using threshold detectors. This obfuscates the boundary for quantum advantage.Here, we present classical algorithms that calculate exact, correlated photon detection probabilities for GBS simulations with PNRDs, in the presence of collisions, faster than existing methods. Furthermore, we introduce a new classical method to generate samples for GBS simulations with threshold detectors, which runs orders of magnitude faster than classical methods to generate samples with PNRDs, when collisions dominate. Both of these results provide quadratic speedups to prior methods in the high-collision regime. We apply these results to two sampling algorithms: a probability chain rule method () and Metropolis independence sampling (MIS) (). We report nine–orders of magnitude reduction in the time taken to simulate idealized Jiŭzhāng-type GBS experiments with threshold detectors. This enabled us to classically simulate, on a ∼100,000-core supercomputer, GBS experiments with 100 modes and up to 60 click detection events. Replacing threshold detectors with PNRDs in this simulation allows us to generate a 92-photon sample but increases the run time significantly. This is a substantial improvement over previous attempts to simulate GBS (–), which reached around 20 photons. A review of GBS simulation methods can be found in the Supplementary Materials.We find that simulating a 60-mode experiment with PNRDs is of comparable complexity to simulating a 100-mode experiment with the same density of photons and threshold detectors. Last, we develop and investigate a classically tractable distribution that passes a variety of canonical GBS verification tests, highlighting the importance of verifying GBS experiments against the most stringent adversarial tests available. By providing algorithms that exploit photon collisions to reduce the complexity of GBS and benchmarking these algorithms at large scales, we provide a reference point with which all experiments wishing to claim quantum advantage can be compared against. These results significantly sharpen the boundary of quantum advantage in GBS.
RESULTS
Loop hafnian algorithms
A particular detection event can be described by a photon number pattern , where n is the number of photons in mode i. The probability of obtaining some from a GBS experiment iswhere P0 is the probability of measuring vacuum, lhaf ( · ) is the loop hafnian function, and is a matrix that can be derived from and the covariance matrix and displacement vector of the Gaussian state (see Materials and Methods). is a 2N × 2N matrix, where N = ∑. However, for a pure Gaussian state, is block diagonal, with blocks and , in which case is an N × N matrix, so it is considerably faster to calculate its loop hafnian compared to . While a realistic GBS experiment will not produce a pure state, a Gaussian mixed state can be expressed as a statistical ensemble of pure states with differing displacement vectors (, ); so, for the purposes of a sampling algorithm, it is generally possible to randomly choose a complex displacement vector from the correct distribution and then sample from the corresponding pure state. Hence, the computational complexity of generating a sample is set by the calculation of an N × N loop hafnian, .The fastest known algorithms for the loop hafnian run in exponential time using an inclusion/exclusion formula similar to the Ryser algorithm for the permanent (). In boson sampling with Fock state inputs, Ryser can be generalized to take advantage of collisions, reducing the number of inclusion/exclusion terms to calculate from 2 to ∏(n + 1) (–). The repeated-moment formula for the loop hafnian achieves the same scaling for GBS (). However, there is a much faster formula for general loop hafnians: The eigenvalue trace algorithm performs inclusion/exclusion on pairs of photons and so requires only 2 terms (, ). Here, we generalize eigenvalue trace to take advantage of collisions, reducing the number of terms to ∏(η + 1), where η is the number of times a particular pairing of photons is repeated. This is lower-bounded by and upper-bounded by 2. The grouping of photons into pairs is arbitrary, so we make use of a greedy algorithm to choose repeated pairings, reducing the number of inclusion/exclusions steps to as close to the lower bound as possible (see the Supplementary Materials). Figure 1 (A to C) shows how, when collisions occur in more than one mode, repeated pairs can be formed. In this example, , which is arranged into two pairings, one of which is repeated. This gives a sum over (2 + 1)(1 + 1) = 6 terms, reduced from 8 using eigenvalue trace (), and compared to 24 using the repeated-moment algorithm ().
Fig. 1.
Conceptual schematic of repeated-row loop hafnian algorithm and threshold detector sampling method.
(A) A GBS outcome with collisions, measured with PNRDs. (B) To calculate the associated probability, we group the photons into pairs (red lines) to maximize the number of repeated, identical pairs. (C) An inclusion/exclusion formula, or a finite-difference sieve, can then operate on the resulting pairs, with repeated pairs leading to a speedup. (D) The same event measured with threshold detectors, with “clicks” shown as green ticks. (E) We consider a fan-out to an array of subdetectors, with none likely to receive >1 photon. We can ignore the outcomes of all but the first detector to see a photon. x is introduced as the relative position of the detected photon and is also the fraction of the subdetectors that are ignored. (F) The probability of detecting the first photon at position x can be expressed as a loss followed by single-photon detection.
Conceptual schematic of repeated-row loop hafnian algorithm and threshold detector sampling method.
(A) A GBS outcome with collisions, measured with PNRDs. (B) To calculate the associated probability, we group the photons into pairs (red lines) to maximize the number of repeated, identical pairs. (C) An inclusion/exclusion formula, or a finite-difference sieve, can then operate on the resulting pairs, with repeated pairs leading to a speedup. (D) The same event measured with threshold detectors, with “clicks” shown as green ticks. (E) We consider a fan-out to an array of subdetectors, with none likely to receive >1 photon. We can ignore the outcomes of all but the first detector to see a photon. x is introduced as the relative position of the detected photon and is also the fraction of the subdetectors that are ignored. (F) The probability of detecting the first photon at position x can be expressed as a loss followed by single-photon detection.We also present a loop hafnian formula that uses a finite-difference sieve instead of an inclusion/exclusion formula, like the Glynn formula for the permanent (–). This significantly improves the numerical accuracy with only a minor time penalty. Whereas accuracy is an issue for eigenvalue trace when N > 50 (), the finite-difference sieve method has relative error <10−8 when tested up to N = 60 (see the Supplementary Materials). This allows us to maintain accuracy for large loop hafnians while using a conventional 128-bit complex floating point data type, which is desirable for speed and portability. We therefore use the finite-difference sieve formula for all benchmarking results presented in the “Benchmarking” section.
Threshold detectors
When threshold detectors are used, the detection probabilities can be calculated using the Torontonian matrix function (), which involves a sum over 2 terms, where Nc is the number of clicks (outputs with one or more photons). However, calculating this quantity is not necessarily the fastest approach to sampling threshold detection patterns. For a sufficiently low density of photons, it may be faster to simulate PNRDs and then simply reduce each nonzero photon number to a click. We show that it is possible to improve this, for any density of photons, to the level of an Nc × Nc loop hafnian, containing 2/2 terms.We consider the detection system depicted in Fig. 1E. The mode is uniformly fanned out to many PNRD subdetectors such that the probability of a collision in any one subdetector can be neglected. This system provides a conceptual bridge between threshold detection and number-resolved detection (). If these subdetectors within a mode are sampled sequentially, then once a single photon is seen, that mode registers a click. The remaining subdetectors, which have not yet been sampled, can be ignored because no more information is required about that mode. Hence, the number of detected single photons to simulate is Nc, which sets the size of loop hafnian calculation. x is introduced as an additional variable giving the position of the single photon within the fan-out, normalized to vary between 0 and 1. As a result, a fraction x of the subdetectors are ignored; this can be related to applying a loss of x to the mode before detecting a single photon, as shown in Fig. 1F.
Sampling algorithms
Chain rule sampling
These methods can be applied directly to the chain rule for simulating GBS described in () and in Materials and Methods. Here, the photon number in each mode is sampled sequentially, conditioned on the photon numbers in the previous modes. Finding the conditional probability distribution for mode j requires calculating the joint probabilities of (n1, …, n) for all values of n up to ncut, where ncut is some cutoff such that the probability of having a greater number of photons can be neglected. Because ncut should generally be several times larger than the expected number of photons, the speedup for calculating collision probabilities is especially applicable here. Furthermore, we make use of a batched method for simultaneously calculating all of the loop hafnians required for different values of n, with approximately the same run time as calculating the largest loop hafnian, where n = ncut (see the Supplementary Materials). When simulating threshold detectors, we choose to reduce ncut for each subdetector to 1. We again use a batching method to more efficiently sample different subdetectors within the same mode, which largely offsets the additional overhead from sampling several subdetectors per mode.
Metropolis independence sampling
We also investigate MIS, a Markov chain Monte Carlo method, for generating GBS samples. Here, samples s are drawn from a proposal distribution, where s is the ith sample in the chain. They are then accepted with probabilitywhere P(s) is the target probability distribution, in this case that of ideal GBS, while Q(s) is the proposal probability distribution, i.e., the probability of proposing a particular s. If a proposed sample is rejected, then the previous sample is repeated, s = s. This update rule ensures that the chain will converge toward the target distribution, which is its equilibrium state (, ). Usually, some burn-in time, τburn, is used to allow the chain to converge. As sequential samples are not independent, some thinning interval, τthin, can also be used to suppress the probability of seeing repeated samples, keeping only one in every τthin sample. These parameters are critical to the efficiency of MIS and can generally be improved by choosing a proposal distribution that is close to the target distribution.We expand our sample space so that s contains the photon number pattern and the complex displacement vector . For the proposal samples, we draw from the correct distribution for the desired mixed state and then generate from an “independent pairs and singles” (IPS) distribution based on the resulting pure state (see the Supplementary Materials). This distribution, which we introduce in this work, can be sampled from efficiently and has probabilities given by N × N loop hafnians of positive matrices. As an aside, we observe that the IPS distribution is already sufficient to pass many GBS verification methods (see the Supplementary Materials). The run time per sample is dominated by the two loop hafnians in P(s) and Q(s), with P(s) and Q(s) already calculated in the previous step.When simulating threshold detectors with MIS, we take the continuum limit of a large number of subdetectors and introduce x as an additional continuous random variable that gives the position of the “first” detected photon within each mode with nonzero photons. Given a proposed photon number pattern, can be sampled efficiently from its conditional distribution . P(s) and Q(s) are then calculated with Nc × Nc loop hafnians, tracing out the unused subdetectors. One subtlety is that tracing out reintroduces mixture into the quantum state, so it is necessary to sample a further adjustment to the displacement vector to obtain a pure state. This is only used in the calculation of P(s). Details are given in the Supplementary Materials.While we have chosen here to investigate MIS, other Monte Carlo methods may also be relevant such as rejection sampling (). These methods may improve the run time, but we do not believe that they would change the asymptotic complexity.
Benchmarking
To benchmark these methods, we choose parameters similar to those of Zhong et al. (), while varying the system size by choosing the number of modes M. For the interferometer transformation, we sample a Haar random unitary matrix (). The interferometer is fed with M/4 sources of two-mode squeezed vacuum, for which we choose a uniform squeezing parameter r = 1.55 and overall transmission η = 0.3 (70% loss). To demonstrate the correctness of our methods, we first test them on an M = 8 example, which is small enough that the results can be compared to the exactly calculated distributions. Figure 2 shows the accumulated distribution from 106 samples with total photon number N = 6, generated by MIS for PNRD, along with the exactly calculated distribution. The total variation distance , which is consistent with statistical uncertainties. With threshold detectors, we find that the TVD for the Nc = 3 distribution is 2.9 × 10−3, which benefits from the smaller statistical uncertainty due to the smaller number of possible outcomes. For the chain rule algorithm, we produce 106 samples with both PNRD and threshold detectors. For PNRD with a cutoff of 12 photons, there were 74,973 samples with N = 6 from 106 total samples, giving a TVD = 0.0554. For threshold detectors with 12 subdetectors, there were 195,150 Nc = 3 samples, and these gave a TVD = 0.0138. The larger TVDs are explained by the smaller sample size of the postselected distributions.
Fig. 2.
Theoretical and sample-estimated GBS probability distributions.
Probability distribution for all six-photon detection outcomes for an eight-mode PNRD GBS simulation (A) and an eight-mode, three-click threshold detector GBS simulation (B). Blue bars show estimated probabilities using MIS; orange bars show exact probabilities.
Theoretical and sample-estimated GBS probability distributions.
Probability distribution for all six-photon detection outcomes for an eight-mode PNRD GBS simulation (A) and an eight-mode, three-click threshold detector GBS simulation (B). Blue bars show estimated probabilities using MIS; orange bars show exact probabilities.For large-scale tests, we make use of an internal HPE Cray EX benchmarking system consisting of 1024 nodes. A typical node is equipped with two AMD EPYC 7742 64-core processors clocked at 2.25GHz, and the nodes are interconnected with the Cray Slingshot 10 high-performance network. We first benchmark our loop hafnian formula on proposed IPS samples for an M = 60 example. The run time as a function of N is shown in Fig. 3, along with timings for the basic formula without speedup due to collisions. Making use of collisions generally improves the run time by one to two orders of magnitude for this range and allows 80-photon probabilities to be calculated in comparable time to a 60-photon probability without collisions. However, there is a large variation in run time between samples with the same N, depending on the amount of collisions in any particular configuration of the sampled photons. On the other hand, the run time for a loop hafnian without speedup from collisions shows little variation from 𝒪(N32) scaling, at least for N > 40.
Fig. 3.
Loop hafnian run time benchmarking.
Run time using the HPE benchmarking system, comparing eigenvalue trace loop hafnian algorithm on N × N matrices with and without speedup due to collisions (orange and blue dots). The blue line is an exponential fitted to the blue points. Collisions are determined by generating 39 samples for each N from the IPS distribution on 60 modes.
Loop hafnian run time benchmarking.
Run time using the HPE benchmarking system, comparing eigenvalue trace loop hafnian algorithm on N × N matrices with and without speedup due to collisions (orange and blue dots). The blue line is an exponential fitted to the blue points. Collisions are determined by generating 39 samples for each N from the IPS distribution on 60 modes.Using chain rule sampling, we simulate an M = 60 experiment with PNRDs, setting ncut = 12 and an additional global cutoff of 80 photons. We generate 4200 samples in ∼3 hours. The global cutoff has no effect on the probability distribution of samples below the cutoff and is used to keep the run time per sample constrained. Figure 4A shows a histogram of the number of samples against number photons, which is in good agreement with the calculated distribution. Figure 4B shows the corresponding run times of the samples. Below ∼45 photons, the sample time appears approximately constant, suggesting that the problem size is not large enough to take full advantage of the system. Beyond that, the run time increases rapidly, although there is a wide range of variation depending on the particular configuration of output photons. We provide a rough fit line to this scaling, equal to (0.15 + 1.59 × 10−9 × N3e0.147)s. Using this to extrapolate to photon numbers >80, we estimate that the average time per sample is ∼10 s. With the ∼66 times larger number of central processing units available in Fugaku, the world’s top-ranked supercomputer, this could be reduced to 130 ms.
Fig. 4.
Chain rule benchmarking on 60-mode PNRD GBS.
Chain rule simulation of M = 60 experiment with PNRDs. (A) Number of samples as a function of photon number, with the theoretically calculated distribution (red line) and (B) run time versus number of photons fitted with an exponential plus a constant (red line).
Chain rule benchmarking on 60-mode PNRD GBS.
Chain rule simulation of M = 60 experiment with PNRDs. (A) Number of samples as a function of photon number, with the theoretically calculated distribution (red line) and (B) run time versus number of photons fitted with an exponential plus a constant (red line).We then test chain rule simulation of an M = 100 experiment with threshold detectors, using 12 subdetectors per mode, and a global cutoff of 60 clicks. We generate 1600 samples in ∼3.5 hours. Figure 5A shows the histogram of click numbers, and Fig. 5B shows the corresponding run times of the samples. Beyond ∼45 clicks, the sample time increases approximately exponentially, from which we extrapolate to click numbers >60. The run times are fitted with a line (0.58 + 3.15 × 10−7 × 2)s. From this, we predict that the mean time per sample is 8.4 s. On Fugaku, this could be reduced to around 127 ms.
Fig. 5.
Chain rule benchmarking on 100-mode threshold detector GBS.
Chain rule simulation of M = 100 experiment with threshold detectors. (A) Number of samples as a function of number of clicks, fitted with a Gaussian (red line). (B) Run time versus number of clicks, fitted with an exponential plus a constant (red line).
Chain rule benchmarking on 100-mode threshold detector GBS.
Chain rule simulation of M = 100 experiment with threshold detectors. (A) Number of samples as a function of number of clicks, fitted with a Gaussian (red line). (B) Run time versus number of clicks, fitted with an exponential plus a constant (red line).On the basis of the scaling of the loop hafnian calculation and the distribution of samples over number of clicks, the estimated average time per MIS step is 0.45 s for an M = 100 system with threshold detectors. On Fugaku, this could be reduced to 7 ms, which is somewhat faster than generating a sample through the chain rule. However, the raw MIS chain will contain a high frequency of repeated samples due to rejections of the proposal sample; for some applications, this may be unimportant, but it would provide a clear difference from a true GBS experiment, where repeated samples are highly unlikely. In the Supplementary Materials, we investigate the τthin required to suppress repeated samples and find that it increases rapidly with system size, such that for M = 100, it is likely to be in excess of 600. Hence, if independently distributed samples are required, then the chain rule method is most likely preferable.
DISCUSSION
Our results provide a new reference point for classical run times of GBS, an improved understanding of the classical complexity, and could improve verification techniques by making it practical to generate small numbers of samples from the distribution of much larger-scale experiments. Our IPS proposal distribution generates samples in polynomial time and is a better approximation than the standard adversarial models in the verification of GBS. As reported in the Supplementary Materials, the IPS distribution is largely able to pass the quantitative tests of GBS used in (), which suggests a need for stronger verification methods, at the least, using IPS as a classically simulable adversary.For GBS with threshold detectors, we have shown that the complexity can be reduced quadratically from to . Comparing to the experiment of Zhong et al. (), where 50 million samples were accumulated in 200 s, our 100-mode chain rule simulation implies that the classical run time can be reduced to ∼73 days. With announced plans for exascale computing on the horizon, for example, the planned system Frontier at the Oak Ridge National Laboratory, this could be further reduced to around 3 weeks. This does not diminish the experimental achievement of large-scale GBS from Zhong et al. (), which remains faster than classical methods on supercomputers, if the time required for circuit programming (or, in the present case, fabricating a new, fixed interferometer) is not included. However, it has previously been reported that in boson sampling with Fock state inputs, at least 50 photon events are required to extend beyond the reach of an exact classical simulation in a reasonable time scale (); for collision-free GBS, this threshold has been reported as being around 100 photons (). We have now demonstrated that for GBS with threshold detectors, the number of correlated detector clicks should also be around 100. For GBS with PNRDs, the number of photons required to surpass this classical threshold will depend on the amount of collisions but must be ≥100.Future claims to quantum advantage in GBS experiments might include increasing the level of programmability () and including PNRDs, which our results suggest adds significantly to the complexity, thus providing an alternative route to a larger quantum advantage than increasing the size of threshold detector experiments. For example, our 60-mode PNRD chain rule simulation ran in comparable time to the 100-mode threshold detector simulation. Meanwhile, a 100-mode PNRD simulation proved impractically slow even on the HPE Cray EX benchmarking system; we generated a single 92-photon event in 82 min.Our algorithms are able to simulate the ideal operation of an experiment; they are near exact, down to Fock-basis truncation and finite precision errors. Any real experiment will feature noise and imperfections such as loss and nonideal interference. We already include a realistic level of loss in our benchmarking, and nonideal interference could be included if needed. Recent work has shown that efficient classical algorithms that approximate GBS can outperform the experiment because of its relatively high level of noise (). However, it is important to note that these efficient algorithms cannot continue to outperform the experiment if sufficient reduction to experimental imperfections can be achieved. Furthermore, some classical simulation methods for boson sampling actively exploit errors such as loss and imperfect distinguishability to improve their run time (–). In future work, it may be possible to combine our methods with approximations that exploit errors to allow a faster simulation of an imperfect experiment.
MATERIALS AND METHODS
Gaussian boson sampling
This section outlines the methods for calculating photon number statistics in GBS. A derivation for the zero-displacement version of these formulae is given in (). For displaced Gaussian states, we use the methods in (, ). The Wigner function of an M-mode Gaussian state can be efficiently represented by using the 2M length mean vector R, and the 2M × 2M covariance matrix , of the canonical position and momentum operators and . Equivalently, it can be represented in terms of creation and annihilation operators as a complex valued displacement α and covariance matrix σ ()Transformations between these bases are linear, as we can write and . For details on how to construct these covariance matrices, we recommend reading (, , ) and the tutorial by Olivares on “Quantum optics in the phase space” ().We further define σ = σ + I/2 as the complex-valued covariance matrix of the state’s Husimi Q function,= , and .Probabilities of measuring photon number patterns with PNRDs are now given bywhere lhaf ( · ) is the loop hafnian function. is formed from by repeating the ith and (i + M)th rows and columns n times, and similarly, the ith and (i + M)th entry in γ is repeated n times to form . Then, the diagonal elements of are replaced by the elements of because the weights of the loops are given on the diagonal of the matrix.For pure states, can be written in block form asHere, is a symmetric N × N matrix, with N the total photon number. As a result,so probabilities from a pure state can be calculated using loop hafnians of matrices of half the size compared to a mixed state.
Sampling pure Gaussian states from mixed Gaussian states
Using the Williamson decomposition, we can write the covariance matrix as, = . Here, is a diagonal covariance matrix describing a thermal state in each mode, and defines a symplectic transformation. Hence, any mixed Gaussian state can be written as a pure channel acting on thermal states.By defining , a covariance matrix of a pure Gaussian state, and , a covariance matrix describing the Gaussian classical noise added to the state, we can now write the original covariance matrix as = + (, ).For the purposes of sampling the state, we can choose a pure state with vector of means ′ sampled from the multivariate normal distribution described by covariance matrix and means . This results in a pure state with covariance matrix given by and means given by ′. To sample from the multivariate normal distribution, we use the Cholesky decomposition of the covariance matrix ().
Chain rule GBS sampler
In this section, we describe our chain rule sampling algorithm. Probability chain rule methods were first shown to be useful for simulating boson sampling by Clifford and Clifford () and then later applied to GBS (). We build upon the methods introduced in ().Sampling using the chain rule for probability proceeds by choosing part of the sample (in this case, e.g., the number of photons in the first mode) from its marginal probability distribution, then fixing this, and choosing the next part (e.g., number of photons in the second mode) from its conditional probability distribution depending on the first part. This is expressed asThis allows samples to be built up from distributions with very large numbers of possible outcomes, without calculating the probability of every possible outcome. In GBS, a difficulty is that the marginal probabilities are equivalent to probabilities from a mixed quantum state, and these are quadratically harder to calculate than for a pure state. This is because is not block diagonal for mixed states, so its loop hafnian must be calculated directly.To circumvent this, we simulate a measurement on all the modes in the coherent state basis. From this, we obtain a set of coherent state amplitudes as measurement outcomes. We are free to ignore any given mode’s measurement outcome and choose to instead simulate its measurement in the photon number basis, conditional on the other mode’s measurement outcomes. Using this, we progressively replace each element of by photon numbers . Sampling in the coherent state basis does not add to our complexity, as this is a Gaussian operation and so can be efficiently simulated using the Gaussian state formalism. Computing the conditional Gaussian states involves calculating the Schur complement of the state’s covariance matrix ().The procedure is as follows ():1) Sample modes 2 to M in the coherent state basis, obtaining a sample from the distribution P(β2, …, β).2) Sample the photon number in the first mode from the distribution P(n1∣β2, …, β).3) For m = 2 to M – 1, (i) Begin with a sample from the intermediate distribution: P(n1, …, n, β, …, β). (ii) Discard the coherent state amplitude β, and replace it with a photon number n drawn from the distribution P(n∣n1, …, n, β, …, β). (iii) This leaves a sample drawn from the distribution P(n1, …, n, β, …, β) that can be used as a starting point for the next step.4) Discard β, and replace it with n, drawn from P(n∣n1, . . , n). This leaves a photon number sample drawn from the distribution P(n1, …, n).A proof showing that this algorithm samples from the GBS distribution is given in (). To sample from P(n∣n1, …, n, β, …, β), the joint probabilities P(n1, …, n, β, …, β) are calculated for all n between zero and some finite cutoff ncut. Assuming that the probability that n > ncut is small enough to be neglected, normalizing the joint probabilities to 1 provides a good approximation to the conditional distribution. Calculating these joint probabilities dominates the computational effort for sampling each mode and grows with the number of detected photons.We find that the joint probabilities are given bywhere is formed from by repeating the ith row and column n times and then, in the same manner, repeating the entries of γ′ along the diagonal of , where γ′ is given byHere, is nonzero only for the modes that have already been sampled in photon number, and similarly, the values of are set to zero as the corresponding mode is sampled in photon number. We note that because ncut should usually be several times greater than the expected number of photons, these calculations will often contain photon collisions.To arrive at Eqs. 11 and 12, we note that the projection of the output state ∣ψ⟩ onto the state ∣n1, …, n, β, …, β⟩ can be expressed aswhere is the displacement operator. Hence, we can apply a displacement of to the state and calculate the probability from a projection onto a photon number state, with vacuum in the modes that are yet to be sampled.In the Supplementary Materials, we describe algorithms to speed up loop hafnian calculations in the presence of detecting photon collision events and a method of batching together the calculations for different n such that the total run time is approximately equal to that of calculating the largest n. When simulating threshold detectors, we expand each mode to several subdetectors, as shown in Fig. 1E, and treat them as separate modes in the chain rule sampling algorithm, with the only difference being that once a photon is detected, no further information is required from the remaining subdetectors within that mode. Hence, they can continue to be projected onto the coherent state basis, where they do not contribute to the complexity of calculating the probabilities. In the Supplementary Materials, we discuss a batched method of calculating the loop hafnians required for different subdetectors within the same mode, achieving a speedup by noting that only the diagonal entries of change between subdetectors.Because the order with which this algorithm progresses through the modes is arbitrary, we choose to go in order of increasing mean photon/click number. This slightly reduces the run time because photons are less likely to be detected in the earlier modes, so the size of the loop hafnians required in these stages is generally reduced. An implementation of the chain rule algorithm and all other algorithms presented in this work can be found in our online code repository ().
Authors: Craig S Hamilton; Regina Kruse; Linda Sansoni; Sonja Barkhofen; Christine Silberhorn; Igor Jex Journal: Phys Rev Lett Date: 2017-10-23 Impact factor: 9.161