Quantum computers are becoming increasingly accessible and may soon outperform classical computers for useful tasks. However, qubit readout errors remain a substantial hurdle to running quantum algorithms on current devices. We present a scheme to more efficiently mitigate these errors on quantum hardware and numerically show that our method consistently gives advantage over previous mitigation schemes. Our scheme removes biases in the readout errors, allowing a general error model to be built with far fewer calibration measurements. Specifically, for reading out n-qubits, we show a factor of 2n reduction in the number of calibration measurements without sacrificing the ability to compensate for correlated errors. Our approach can be combined with, and simplify, other mitigation methods, allowing tractable mitigation even for large numbers of qubits.
Quantum computers are becoming increasingly accessible and may soon outperform classical computers for useful tasks. However, qubit readout errors remain a substantial hurdle to running quantum algorithms on current devices. We present a scheme to more efficiently mitigate these errors on quantum hardware and numerically show that our method consistently gives advantage over previous mitigation schemes. Our scheme removes biases in the readout errors, allowing a general error model to be built with far fewer calibration measurements. Specifically, for reading out n-qubits, we show a factor of 2n reduction in the number of calibration measurements without sacrificing the ability to compensate for correlated errors. Our approach can be combined with, and simplify, other mitigation methods, allowing tractable mitigation even for large numbers of qubits.
Noisy-intermediate scale quantum (NISQ) computers () are running increasingly complicated algorithms on small to intermediate numbers of qubits (–). However, their usefulness continues to be limited by noise, leading to unreliable outputs. Error mitigation schemes (, ) compensate for errors through a combination of calibration measurements and postprocessing and have been applied to the benchmarking of NISQ hardware (–), quantum chemistry and solid-state physics problems (–), dynamical quantum simulations (, ), and demonstrations of quantum supremacy (). They have been proposed to bridge the gap between current devices and future fault-tolerant error correction (), which actively corrects errors in the quantum state. Error mitigation has already proven to be an important tool to reach previously unachieved benchmarks on existing hardware (, ).Qubit readout is a significant source of error in quantum computing experiments. This is particularly true for the popular superconducting qubit architectures, which typically have per-qubit readout error probabilities of a few percent () [detailed information about readout error probabilities on current devices can be found through IBM’s Qiskit platform ()]. In practice, the measurement errors on transmon-based devices are additionally complicated by effects such as bias toward certain states and cross-talk–induced correlations (–). Furthermore, quantum experiments often involve measuring many qubits at a time (, ), compounding the impact of readout errors. Together, these effects make readout errors a significant hurdle to scaling up NISQ computation.Readout error mitigation schemes combine an error model with calibration measurements. The calibrated model is then used to infer the “error-free” result of an experiment (, , –). The quality of the mitigation strongly depends on the choice of error model; however, there is a trade-off between model complexity and calibration cost. Simple models, for example, those assuming qubit-wise–independent errors (, ), require fewer calibration measurements but may not capture the true error process. In contrast, using fewer assumptions leads to a more general error model but at the cost of potentially requiring a prohibitive number of calibration measurements (). Here, we present a scheme that addresses both these problems, giving a lossless reduction in error model complexity and introducing a single, model agnostic, calibration step. This allows the most suitable model to be chosen a posteriori.Here, we introduce bit-flip averaging (BFA), a scheme that uses random bit-flips to simplify the effective error process. We analytically show that averaging over these random bit-flips allows one to more efficiently parameterize and estimate readout errors. The error process under BFA admits convenient mathematical symmetries that greatly simplify the inference of error-free experimental results. We compare our approach to full mitigation and tensor product noise (TPN) models and show that BFA outperforms both. The bit-flips introduced by our method can be uniquely inverted, allowing for mid-circuit measurement and feed-forward algorithms (, ) experimental overhead, requiring only a layer of single qubit gates and classical postprocessing.Imperfect multi-qubit measurements can be effectively modeled as a classical process (, , –). This can be understood as a probabilistic corruption of the error-free result. Assuming that the measurements (in the computational basis) will be performed across a constant number of qubits, this model is expressed in terms of a response matrix M such that Mσσ = p(σ ∣ σ, is the probability of reading out σ given that the error-free outcome should have been σ′. The observed outcome probabilities obs are given by the action of the response matrix on the error-free probabilities trueIn general, the matrix M is not symmetric as readout on many devices is biased toward some states (). Our protocol uses random bit-flips to symmetrize the response matrix, averaging out the biases. This markedly reduces the number of parameters required to define this matrix; this reduction is O(22) → O(2) for n read-out qubits. It also simplifies the matrix inversion task required to find true.The calibration step involves estimating M by preparing and measuring each of the computational basis states. The kth column of M is the vector of measurement outcome probabilities given an input computational basis state ∣k〉. This requires enough calibration shots to sufficiently determine 2 (potentially) unique probabilities for each of the 2 different ∣k〉, which is especially problematic if time-drifting errors necessitate frequent recalibrations. At worst, calibration costs scale as O(22); however, in practice, many of the error probabilities will be negligibly small (i.e., those for simultaneous errors on many qubits) and can be safely approximated as zero. We show in sample complexity and scaling of full BFA calibration that the number of calibration measurements needed to estimate a single distribution (column of M) typically scales at a rate ≪O(2) (although still exponentially in n). Nevertheless, even if each distribution can be described with a marked reduced set of probabilities, there are still exponentially many distributions (input states) to estimate.Once M is estimated, readout errors are typically mitigated by either inverting M or by solving a constrained linear optimisation problem [minimizing (obs − M)2 over , subject to physical probabilities]. We note that both problems quickly become intractable with increasing numbers of qubits.In practice, there will be some underlying structure to the readout error distributions. Several proposals have taken advantage of this by making assumptions about the error process (, –). A common and effective choice of simplified model assumes that the readout errors for each qubit are independent, yielding the so-called TPN model (, , ). This simplification allows the response matrix to be given in terms of 2n single qubit error probabilities and . For TPN, the response matrix MTPN is the tensor product of single-qubit response matriceswhere is the probability that the ith qubit reads out s given the error-free readout should have been s′ [and etc.]. The TPN model can be calibrated more efficiently than the full scheme as and can be found by sampling only the input states ∣0…0〉 and ∣1…1〉, respectively. The inverse of MTPN is now tractable and is simply the tensor product of inverse single-qubit response matrices.On real devices multi-qubit readout errors can be correlated (, ) (through cross-talk effects), limiting the accuracy of many simplified models. Alternative approaches have been proposed to deal with correlated errors in a scalable way, e.g., using continuous Markov processes () or via cumulant expansion (). These methods extend the TPN approximation by characterizing the readout errors in terms of single qubit and pairwise (between physically/frequency close qubits) correlated error terms. Although we do not consider these models directly here, our BFA proposal naturally extends to these correlation-extended models. Furthermore, as our scheme eliminates the bias in the readout errors toward certain states, it allows for these models to be expressed in terms of fewer parameters. This simplification allows these models to be calibrated with fewer measurements and thereby mitigate readout errors more efficiently.
RESULTS
Averaging out readout errors with bit-flips
Our BFA method yields a greatly simplified and more easily measurable response matrix without sacrificing the ability to capture correlated readout errors. By applying random pre- and postmeasurement bit-flips, we completely symmetrize the response matrix and remove readout biases. The process is qualitatively similar to the randomized benchmarking techniques () that are often used to efficiently quantify gate errors. Methods that tackle state-dependent bias have been proposed, e.g., the “Static Invert-and-Measure” scheme (); however, the scheme introduced here allows for more active and efficient response matrix–based mitigation to be used. Our scheme also provides computational advantages in applying the mitigation; we give an analytic formula for the inverse of the simplified response matrix under BFA that can be calculated using only vector-matrix multiplication.Following the standard response matrix approach, we assume that every measurement will be performed across a fixed number of qubits. Mitigation happens in two stages: a calibration stage where the response matrix is measured, and an experimental stage in which readout errors are mitigated using this response matrix. In each shot of the experiment, we bit-flip random qubits before measuring them and then invert the bit-flip in the (classical) readout for the corresponding qubits (Fig. 1). We repeat and average over this process, randomly selecting different qubits to bit-flip each shot. As we will show, this simplifies the measured effective response matrix (measurement of this is shown in Fig. 1A). The bit-flipped qubits are chosen uniformly at random per shot, and the bit-flips are implemented with an gate. Here and in the following, we assume the gate errors introduced by are negligible. By continuing the random bit-flips and classical correction when performing a quantum experiment, the readout errors continue to be simplified, allowing for easier mitigation (Fig. 1B). In the absence of any readout errors, BFA has no effect on the readout results. Last, if necessary, we can include another set of gates to the postmeasurement flipped qubits, ensuring a consistent a posteriori state.
Fig. 1.
Example of BFA for four qubits.
(A) Determining the response matrix requires applying gates to randomly chosen qubits (red squares) while leaving others unflipped (cyan squares). The bit-flipping is inverted in the classical readout result (red and cyan circles), leaving the resulting measurement invariant under the full process. Averaging over many random choices of bit-flipped qubits means that the effective readout error process is uniquely determined by considering only the logical zero state. (B) To use the BFA response matrix to mitigate errors in an experiment, it is crucial the experiment sees the same effective measurement process; this is simply achieved by applying BFA to the experimental measurement process. If required, the gates can be inverted (transparent squares), leaving the quantum state invariant under BFA.
Example of BFA for four qubits.
(A) Determining the response matrix requires applying gates to randomly chosen qubits (red squares) while leaving others unflipped (cyan squares). The bit-flipping is inverted in the classical readout result (red and cyan circles), leaving the resulting measurement invariant under the full process. Averaging over many random choices of bit-flipped qubits means that the effective readout error process is uniquely determined by considering only the logical zero state. (B) To use the BFA response matrix to mitigate errors in an experiment, it is crucial the experiment sees the same effective measurement process; this is simply achieved by applying BFA to the experimental measurement process. If required, the gates can be inverted (transparent squares), leaving the quantum state invariant under BFA.Without loss of generality, we describe the measurement operation and corresponding response matrix using Kraus operators. For n qubits, we do this in terms of a set of 2 measurement Kraus operators {Aσ} wherewhich was chosen in such a way to recover Eq. 1. This operator Aσ corresponds to an uncertain measurement (with {∣σ′〉} being computational basis vectors), yielding a classical readout bit-string σ. We note that Aσ corresponds to a quantum noise–limited measurement operator. An additional incoherent classical assignment error can be included; however, this also gives a response matrix of the form in Eq. 1. Our BFA scheme yields the same result for any combination of these two processes.Eq. 1 is recovered by considering the probability that a measurement of a quantum state ρ yields a readout σwhere true(σ′) = 〈σ′∣ρ∣σ′〉 is the probability of an error-free measurement of ρ to yield the state ∣σ′〉. We identify this sum as the matrix equation obs = Mtrue.BFA can be taken into account by applying the relevant bit-flipping operations and adjusted readout results directly to the measurement operators. We quantify a bit-flip in terms of a binary string s such that the unitary operator applied to the qubits is , a tensor product of Pauli operators and identity operators where s is the ith bit of s, e.g., . For an n-qubit measurement, we choose a random bit-string s with probability 1/2, and given a bit-string readout σ, the corresponding measurement operator iswhere and σ′ ⊕ s is the bit-wise addition of s and σ′. We now consider how averaging over s changes the readout error process. From here, averaging over s is denoted by a tilde. The (s-averaged) probability of observing σ isallowing us to identify a new response matrix (such that ) that describes the readout errors under BFA. The elements of this new matrix are simply the s-averaged conditional probabilities under BFA and are given byFrom this equation, we see that , and so, we have arrived at a far simpler, symmetrized error model with 2 − 1 parameters, instead of the 2(2 − 1) in M.Thanks to this symmetry, we can express the whole response matrix under this bit-flipping protocol in terms of just the parameters in its first column. The conditional index in the response matrix can now be dropped asPhysically, this is because any computational basis state is equally likely to be bit-flipped into any other basis state. As it is the bit-flipped state that is measured by the physical apparatus, the effective error probability is averaged across all inputs, removing any bias toward certain states. This removal of bias gives a huge practical advantage of BFA over normal response matrix error mitigation; calibrating the entire symmetrized error matrix only requires measurement of the probabilities in its first column, which is done with just one input state ∣0…0〉. As no assumptions are made about whether the errors are correlated, these correlations can be effectively dealt with using this scheme. In particular, one can infer correlations by comparing the relative probabilities of different errors in the calibration data.For practical reasons, readout errors are often biased toward certain states. With superconducting qubits, the readout of qubits in the state ∣0〉 is typically more reliable than for the state ∣1〉. Adaptive mitigation schemes (, ) have been proposed to more effectively compensate for readout errors by exploiting this bias. These involve applying gates to certain qubits before measurement to maximize the proportion that are measured in the state ∣0〉, reducing the probability of readout errors and allowing the error-corrupted measurement distributions to be estimated more easily. However, these schemes only yield improvements for certain states and require the circuits for suitable states to be run twice (without and with adaptive gates) while also requiring the full response matrix to be found. By symmetrizing the readout errors, BFA increases the probability of some states (those with a high proportion of ∣0〉’s) being read out incorrectly, but this is balanced by effective error probability for other states (with a high proportion of ∣1〉’s) being reduced. The balancing of readout errors coupled with the factor of 2 reduction in the cost of estimating the response matrix and the lack of circuit-specific calibration measurements allow BFA to compete with these adaptive schemes while being applicable to a broader set of states.The probabilities have a convenient physical interpretation as the average probability that an error with syndrome S (the bit-string identifying which qubits are read out incorrectly) occurs, e.g., is the probability that readout errors occur simultaneously on the 0th, 1st and 3rd qubits. The error matrix is symmetric about both its diagonal and antidiagonal, allowing it to be decomposed into a compact formwhere we have used the same notation for the matrix X( as for the operator . This is due to Eq. 7 being invariant under σ, σ′ → σ ⊕ s, σ′ ⊕ s (i.e., invariant under ). To take advantage of this sparse representation, one must continue to perform the bit-flipping and classical correction during experiments. This requirement adds negligible overhead as single qubit bit-flips can typically be performed with very high fidelity or are completely free if combined with an existing gate. The corresponding bit-flip of the measurement output requires only classical Boolean logic.The decomposition in Eq. 9 gives us an advantage in both mitigation strategies (response matrix inversion and constrained least squares minimization). For the least-squares method, Eq. 9 tells us with what probability we expect a given readout string to be corrupted by the binary addition of S. If many of these probabilities are zero (or negligibly small and so can be set to zero), then this would allow us to use a sparse matrix representation of , allowing the optimization problem to be solved more easily.The matrix inverse mitigation strategy requires to be found. As contains only tensor products of the identity and Pauli X matrix, all these terms can be simultaneously diagonalized by the application of the Hadamard matrix H. As we show in the section on “Derivation of inverse BFA-simplified response matrix,” this gives the vector of eigenvalueswhere is a vector of the probabilities (corresponding to the first column of ). Like , the inverse must be symmetric about both its diagonal and antidiagonal, meaning that it can also be decomposed onto Pauli X matrices, i.e., the form given in Eq. 9. As shown in the section on “Derivation of inverse BFA-simplified response matrix,” the inverse BFA-simplified response matrix is given in terms of the vector of reciprocal eigenvalues λ−1 ≡ (1/λ0, …,1/λ) byThis shows another clear advantage to bit-flipping over the full mitigation approach. Like with , we only need to find the elements of the inverse’s first column, and this can be done with simple matrix multiplication (as opposed to a computationally costly general matrix inverse).As the primary function of BFA is to average out bias in the readout errors toward certain measurement outcomes, we can also use it to further simplify other simplified error models. As an example of how BFA can simplify other approximate measurement error mitigation protocols, we can consider how the TPN model transforms under bit-flipping. Under bit-flipping, the biases of the qubit-wise readout errors are averaged out, meaning that the BFA-symmetrized TPN matrix is given byCombining the TPN model with BFA provides two main advantages: the first being that the number of parameters to estimate for the combined model on n qubits is n instead of 2n. The second comes in estimating ; these probabilities can be measured by preparing the state ∣0…0〉. This is the same experimental procedure as is required for calibration of the full BFA matrix , and so, a single set of calibration results can be used for both models. In this example, a TPN + BFA model could be calibrated first and its predictions for the different error probabilities checked against the calibration data. If this proves unsuitable, then a larger more general model could be used without requiring any further calibration measurements.The information contained in the BFA calibration measurements of ∣0…0〉 fully describes the response matrix, and so, it can be used to calibrate any response matrix–based approach. This means that one is not forced to make any assumptions about the model (e.g., independent errors, pairwise correlations, or a full model) before calibration.This flexibility potentially allows for readout error mitigation to be performed even for large numbers of qubits, provided any correlations in the readout errors have some degree of locality. If the qubits can be grouped into disjoint sets such that there are no intergroup correlations (for example, if correlations only occur between qubits coupled to the same readout cavity), then an expanded TPN-like model could be used in which each group has its own full response matrix. The response matrix for a measurement of all the qubits would then be given by a tensor product of those for each group. BFA would allow this model to be calibrated using only a single measured input state at a cost scaling at worst as 𝒪(2), where k < n is the number of qubits in the largest grouping. While the groupings could initially be chosen on the basis of some knowledge of the device (e.g., by readout cavity, operating frequency, or some spatial consideration), BFA would allow this grouping to be changed retroactively to match the calibration data. In the Supplementary Text, we show an example calibration process for a sparse response matrix that exhibits correlations between only a restricted number of qubits. We demonstrate how an appropriate choice of model can make both the calibration and subsequent mitigation more accurate.
Sample complexity and scaling of full BFA calibration
To use BFA most effectively, it should be combined with an error model that best balances the trade-off between model expressibility and calibration cost. However, it is helpful to estimate the worst-case cost to calibrate a BFA model using the full symmetrized response matrix as in Eq. 9, under some physically motivated assumptions. On real devices, we expect that even when correlations are taken into account, the probabilities of errors occurring on many qubits simultaneously are negligible and can be neglected. This effectively reduces the number of parameters that must be estimated to find and, thereby, . The number of probabilities that give nonnegligible contributions to will provide an indication of the number of calibration measurements required to estimate , and equivalently the cost to estimate a single column of M.To give an idea of how many parameters must be retained in , we consider a TPN model for an n-qubit readout with constant single-qubit readout error probabilities pe for all qubits. While this neglects any correlations in the errors, we expect that these would act as a modest correction to the TPN model and so would not greatly affect the calibration cost. For a conservative scaling estimate, pe could be the largest measured single-qubit readout error probability for the device in question. We then calculate the number N of error probabilities that must be retained so that their cumulative probability reaches above a threshold (where is sorted in descending order).Under these assumptions, the number of qubits Q that experience a readout error is binomially distributed Q ∼ B(n, p). From this, we can calculate the highest weight of error k (i.e., largest number qubits that are simultaneously read out incorrectly) that must be retained such that the cumulative probability up to k is greater than 1 − ϵ. This is given by , where S(k; n, pe) = 1 − Pr (Q ≤ k). Thus, N is the number of possible readout errors with weight less than or equal to kWhile this sum does not have a closed form solution, for k/n ≤ 1/2 (expected for large n and pe ≪ 1), it is bounded () bywhere is the binary entropy function. This means the number of measurements needed to calibrate a completely general is expected to scale at worst as O(2). Although the number of outcomes that significantly contribute to the total probability is typically much less than the 2 worst case, the required number of terms still scales exponentially with n.For large n, we can examine the scaling of N by approximating Q with a normal distribution Q ∼ (npe, npe(1 − pe) givingwhere SN(k) = 1 − Erf(k), and we have applied a continuity correction of 0.5. Using this approximation, we see that in the limit of large n, the lower bound tends to , and so, we require at least O(2())) samples to calibrate . As the upper bound on N tends to 2), we can also identify a very rough rule of thumb for the error rates under which full mitigation is tractable, e.g., nH(pe) < 10.We can further quantify the sample complexity of estimating a typical by applying bounds for the estimation of arbitrary discrete distributions. A useful bound for this is given in () and states that for an N-outcome discrete distribution with true probabilities , the expected total variation distance between and the empirical distribution after m samples is bounded byThis shows that we can estimate to an expected accuracy ϵ using m = N/ϵ2 shots. From (), we also have a concentration bound for this inequality. The probability that we observe a distribution after m shots that differs from the true distribution with δ > ϵ is bounded for byThis implies that to be sure that we have estimated to within an accuracy ϵ with a failure probability less than γ, we need at least shots.Therefore, estimating an arbitrary n-qubit response matrix using BFA to an accuracy ϵ with failure probability less than γ requires at least or O(2/ϵ2) shots. With full mitigation, all 2 columns must be estimated independently, meaning a worst-case sample cost of and an O(22/ϵ2) sample complexity.These worst-case sample complexities correspond to a completely general M in which all 2 outcomes in each column are significant. From Eq. 14, the typical number of significant probabilities N given a representative single-qubit readout error probability pe is bounded by N ≤ 2 (where k is defined as before). This yields to give a more typical estimate for the sampling cost under BFA of shots and, for large n, a sampling complexity of O(2/ϵ2) shots.The exact sampling cost depends on the values of the various error probabilities. In the above analysis, we assume that the sampling cost under BFA can be reasonably well approximated by the cost of naively sampling a symmetric TPN response matrix with unbiased and equal per-qubit readout errors. While this is a reasonable approximation for BFA, it ignores the readout error biases encountered in full mitigation schemes. With the typical biases observed for superconducting qubits, the columns (output distributions) corresponding to input states with large numbers of qubits in the state ∣0〉 are more sharply peaked than for states with many qubits in ∣1〉 (as the former will have lower readout error probabilities). Sharper distributions [e.g., p(σ∣0…0)] can be estimated with fewer samples than flatter ones [e.g., p(σ∣1…1)], so the required number of samples to reach a given accuracy will vary across the columns of M. Because of the averaging that occurs, the sample complexity of estimating the single distribution required for BFA will usually lie somewhere between the best-case cost of estimating p(σ∣0…0) and worst case of p(σ∣1…1).While the cost of calibrating is lower bounded by the cost of estimating the cheapest column of M, the calibration will be significantly cheaper than for M’s most expensive column. Because full mitigation requires estimating ∼2 columns that are more sampling expensive than the single column required for BFA and ∼2 that are cheaper, we expect the total reduction in calibration cost brought by BFA to average out to a factor of ∼2, despite the variable column cost in M caused by readout error biases.This analysis considers a naive estimation of M and using a full dense matrix model rather than making any assumptions about correlations present. Simplified models (e.g., TPN) that require estimation of fewer parameters will incur a smaller sampling cost than the general model considered here. However, because BFA ensures symmetry in the effective response matrix thereby reducing the number of parameters needed to describe it, we expect that it will yield a reduction in calibration cost in practically all realistic cases. The bounds we use for the analysis of a generic M are valid for the estimation of arbitrary discrete probability distributions. Therefore, they are also applicable to the sampling cost of estimating the noisy (premitigation) measurement distribution for an arbitrary quantum circuit—i.e., this requires at most O(2/ϵ2) shots with or without BFA.
Simulated measurement of response matrices
To obtain realistic readout error models for our simulations, we measured full response matrices on ibmq_manhattan () for one to eight qubit readouts (data taken on 1 December 2020). To minimize sampling error, we used 216 shots per computational basis state (per column of M). Taking these measured response matrices as “exact,” we used M to simulate the readout error process as described in Eq. 1. This effectively simulates a full on-device readout process from which we benchmark various BFA strategies. The different schemes used for our simulations are summarized in Table 1.
Table 1.
A summary of the mitigation schemes considered here, highlighting key characteristics.
Error mitigation scheme summary
Scheme
Num. measuredstates
Num. freeparameters
Correlatederrors
Full mitigation
2n
2n(2n − 1)
Yes
TPN
2
2n
No
BFA
1
2n − 1
Yes
BFA + TPN
1
n
No
Figure 2 demonstrates the advantage of using BFA, comparing the exact and calibrated 4-qubit response matrices. For both schemes, a budget of 100 × 24 shots were used to estimate the response matrix. Here, the BFA advantage is immediately obvious: For full mitigation, this budget must be divided between the 24 input states that are measured, while for BFA, all 100 × 24 shots are used to measure ∣0000〉 (with shot-by-shot bit-flipping). For this budget, BFA produces an accurate estimate of its target response matrix , in contrast to full mitigation’s poor estimate of M. A more accurate response matrix allows for more effective error mitigation in the final experiment.
Fig. 2.
Illustrating the sampling advantage of BFA over full mitigation.
Example response matrix plots from four qubits showing the exact matrix (left) and a finite shot (100×2 shots) estimation (right). For full measurement (top), the bias is manifested by lack of diagonal symmetry, and the many independent parameters increase the sampling error of its estimate. The BFA response matrix (bottom) exhibits many symmetries, and sampling error is nearly imperceptible. The exact response matrix for the error process (top left) is measured from ibmq_manhattan using 220 total shots (216 per input state), and the exact BFA response matrix (bottom left) is calculated using Eq. 8.
Illustrating the sampling advantage of BFA over full mitigation.
Example response matrix plots from four qubits showing the exact matrix (left) and a finite shot (100×2 shots) estimation (right). For full measurement (top), the bias is manifested by lack of diagonal symmetry, and the many independent parameters increase the sampling error of its estimate. The BFA response matrix (bottom) exhibits many symmetries, and sampling error is nearly imperceptible. The exact response matrix for the error process (top left) is measured from ibmq_manhattan using 220 total shots (216 per input state), and the exact BFA response matrix (bottom left) is calculated using Eq. 8.We stress that the calibration shown in Fig. 2 assumes a full dense response matrix rather than a TPN model. It is likely that the response matrix used for Fig. 2 (measured on-device) can be well approximated using a TPN model, allowing it to be sampled more efficiently (as only marginal distributions per qubit rather than full 2-outcome distributions must be estimated). However, to showcase the advantage BFA brings over the most general approach for generic response matrices, we have used the full dense model here that requires the 2 outcome measurement distributions for all 2 computational basis state inputs to be measured.Figure 2 shows BFA greatly reduces the amount of calibration data required to obtain a faithful description of the response matrix. We can quantify this reduction by examining the number of shots required for the estimated response matrix to converge to that of the underlying error model. To measure the “closeness” of the estimated response matrix Mobs to the true response matrix M, we use the average of the column-wise classical fidelities (as each column is a distinct probability distribution), which for distributions p and q is defined as . Our figure of merit FM (which we will refer to as the “response matrix fidelity”) for the closeness of two response matrices M and N is then given byFigure 3A compares the fidelity between the exact and estimated response matrices for the different models considered. For a fair comparison, the same total number of shots are used to estimate each model, and the fidelities are averaged over 50 trials. The exact response matrix used here is one we directly measured for 5-qubits on ibmq_manhattan and so provides a realistic picture of how the schemes fare on current devices. For full mitigation and the TPN model, the target is the exact response matrix used to simulate the error process, while for the BFA and the combination of BFA + TPN, the target is the symmetrized version of the exact matrix (as in Eq. 9).
Fig. 3.
Scaling of the response matrix fidelities with total calibration shots.
(A) Readout errors simulated using M measured for n = 5 qubits on ibmq_manhattan. The fidelity, Eq. 18, is calculated relative to the exact response matrix M (Full Mitigation and TPN) or (BFA and BFA + TPN). (B) As in (A), using the same response matrix but with artificially boosted correlations (with boosting factor γ = 20; see section on “Boosting correlations in readout errors”). For both plots, the fidelities are averaged over 50 repeats, with error bars showing the middle 95% percentile of data.
Scaling of the response matrix fidelities with total calibration shots.
(A) Readout errors simulated using M measured for n = 5 qubits on ibmq_manhattan. The fidelity, Eq. 18, is calculated relative to the exact response matrix M (Full Mitigation and TPN) or (BFA and BFA + TPN). (B) As in (A), using the same response matrix but with artificially boosted correlations (with boosting factor γ = 20; see section on “Boosting correlations in readout errors”). For both plots, the fidelities are averaged over 50 repeats, with error bars showing the middle 95% percentile of data.In comparison to the simplified schemes, full mitigation requires a far greater number of measurements to converge to a good recreation of the true response matrix. BFA very quickly converges to the maximum response matrix fidelity, requiring around two orders of magnitude fewer calibration shots than full mitigation to reach comparable fidelities. The schemes using TPN converge to FM → 1, indicating the TPN assumption (i.e., independent readout errors) is justified for this particular experimentally measured response matrix. The combination of TPN and BFA yields the best fidelity with the fewest calibration shots as it is the most parameter-efficient model that sufficiently captures the true readout error process.As discussed in the section on “Averaging-out readout errors with bit-flips,” the same calibration measurements are required to infer the BFA and BFA + TPN response matrices. In this instance, the combination of TPN + BFA was the fastest model to yield a useful description of the error process. However, in situations where significant cross-talk leads to correlated readout errors, the TPN approximation of independent errors becomes invalid. It is therefore helpful to also consider the scaling of the different schemes in a case where the TPN assumption manifestly fails.
Response matrix measurement with cross-talk
To provide a toy model for cross-talk–induced correlated errors, we change the experimentally measured response matrix M to artificially amplify correlations between readout errors on adjacent qubits. Specifically, we boost the probability of particular syndromes to get a new response matrix Mγ. The strength of the amplification is parameterized by a boosting factor γ (see section on “Boosting correlations in readout errors” for details). Figure 3B shows how FM scales with the number of calibration shots. The 5-qubit response matrix used here is that of Fig. 3A, but correlation boosted with γ = 20. Here, γ = 20 was chosen to highlight the contrast between the TPN-based schemes and those capable of dealing with correlated errors.We see that in the presence of correlated errors, the BFA provides a clear advantage, converging to the optimal fidelity with far fewer shots (around a factor 2 = 32) than required for full mitigation. The two TPN-based models saturate at fidelities well below the optimal value of 1 as they cannot account for correlated errors by construction. The fidelities obtained take longer to saturate than in Fig. 3 because the underlying readout error process with boosted correlations is more complicated than the original response matrix (which can be accurately described with TPN).The advantage of BFA over full mitigation will become increasingly apparent as more qubits are measured, as demonstrated in Fig. 4. The plot shows the fidelity of (γ = 20) boosted response matrices taken on ibmq_manhattan estimated for different numbers of qubits. At each (n-qubit) point, 2 × 100 simulated shots were used to estimate the response matrix.
Fig. 4.
Demonstrating the advantage of BFA with increasing numbers of qubits n.
For each n, the response matrix was measured on ibmq_manhattan, and the correlations between neighboring qubits boosted with γ = 20. This matrix was then used to sample 2n × 100 shots and estimate the response matrix fidelities. The error bars (sampling error) show the middle 95% percentile of 50 fidelity estimates.
Demonstrating the advantage of BFA with increasing numbers of qubits n.
For each n, the response matrix was measured on ibmq_manhattan, and the correlations between neighboring qubits boosted with γ = 20. This matrix was then used to sample 2n × 100 shots and estimate the response matrix fidelities. The error bars (sampling error) show the middle 95% percentile of 50 fidelity estimates.Again, the full mitigation scheme has to share the 2 × 100 budget among the 2 input basis states leading to increasingly severe sampling errors. While the TPN (and BFA + TPN) schemes have much lower sampling error, they both suffer by their inability to express correlated errors. This limit is less of a problem for BFA + TNP, as the BFA symmetrization helps reduce the effect of correlations, and the fewer free parameters further reduce sampling errors. In particular, we note the response matrix estimated by TPN + BFA is a better approximation to the γ-boosted response matrix than is managed by TPN alone. This hints at a further advantage of the BFA scheme; averaging over different error probabilities damps biased correlations in the error model, yielding an effective error model that is closer to TPN. Last, we note that full BFA obtains by far the best fidelity as it most effectively balances sampling errors and (correlation) model expressibility.
Simulating graph state fidelity measurement
For direct comparison with previous work, we consider the example given in (). We demonstrate our BFA scheme in a practical context by considering the problem of measuring the fidelity of a linear graph state of varying numbers of qubits. Again, we compare combinations of full mitigation and TPN models with BFA on simulated measurements using the experimentally measured response matrices from ibmq_manhattan. For a linear array of n qubits with initial state ∣+〉⊗, a linear graph state ∣g〉 is created by applying controlled- gates to adjacent qubits. This graph state has a stabilizer group S generated by the set of Pauli operators (dropping the Z−1 and Z operators for i = 0, n respectively). The state fidelity can be measured averaging the expectation value of elements in S. As a simplification, and to ensure we are making consistent comparisons, we only measure the generators of the stabilizer group themselves, providing an approximation to the fidelity.Figure 5A shows this approximation of the fidelity found for simulations in which graph states of different sizes are prepared noiselessly and then measured with readout errors. Readout error mitigation is then applied using the four schemes. The response matrices used to simulate the readout errors were those experimentally measured on the ibmq_manhattan. For comparison, the fidelities with no mitigation applied and those for mitigation using the exact response matrix of the noise process are also shown. For the simulation of an n-qubit graph state, a budget of 100 × 2 calibration shots were allowed for measurement of the model’s response matrix, while 105 shots were used for each of the two circuits used to measure .
Fig. 5.
Demonstrating the effectiveness of readout error mitigation schemes for observable (graph state fidelity) estimation.
(A) Simulated estimates of graph state fidelity for varying numbers of qubits. For an n qubit graph state, 100 × 2 calibration shots were used to estimate the response matrix, and 105 shots were used for the circuits to measure the fidelities. The readout errors were generated from response matrices directly measured on ibmq_manhattan. For comparison, the mitigation-free and exactly mitigated (using the ibmq_manhattan response matrices) fidelities are also shown. (B) Root-mean-square fidelity error with respect to mean of exact mitigation. For both plots, the data are averaged over 50 repeats, with error bars indicating the middle 95% of the population.
Demonstrating the effectiveness of readout error mitigation schemes for observable (graph state fidelity) estimation.
(A) Simulated estimates of graph state fidelity for varying numbers of qubits. For an n qubit graph state, 100 × 2 calibration shots were used to estimate the response matrix, and 105 shots were used for the circuits to measure the fidelities. The readout errors were generated from response matrices directly measured on ibmq_manhattan. For comparison, the mitigation-free and exactly mitigated (using the ibmq_manhattan response matrices) fidelities are also shown. (B) Root-mean-square fidelity error with respect to mean of exact mitigation. For both plots, the data are averaged over 50 repeats, with error bars indicating the middle 95% of the population.The two schemes incorporating BFA perform very well, particularly with increasing system size, giving fidelities close to that achieved when mitigation is performed using the exact response matrix (i.e., perfect knowledge of the error process). However, given this calibration budget, full mitigation performs significantly worse than the other schemes. As in previous simulations, the full scheme only gets 100 calibration shots for each of the 2 input states, resulting in high sampling errors.It is tempting to quantify the performance of the mitigation schemes in terms of how close their fidelity is to the optimal value of 1 as we are performing simulations without gate errors. However, because we are calibrating the response matrix and calculating expectation values on the basis of finite shots, noncorrectable sampling noise remains the limiting factor. Therefore, a schemes performance should be compared against exact mitigation with the response matrix used to simulate the error process. This gives the fairest estimate of the maximum improvement possible with this classical postprocessing-based mitigation strategies.Figure 5B shows the average root-mean-square fidelity error of the different models presented with respect to the mean “exact mitigation” results, , where the expectation is taken over the 50 trials. Again, we see that the full mitigation scheme consistently underperforms the others, suffering from far greater errors relative to exact mitigation. The TPN model performs similarly to the two bit-flipping schemes until n > 5, where TPN + BFA outperforms TNP alone, indicating biased correlations that BFA averages out (cf. Figs. 3B and 4). For larger system sizes, both BFA-based schemes manage to replicate the exact mitigation results with remarkable accuracy. The simplified schemes benefit from receiving exponentially many more measurement shots per calibration circuit than for full mitigation, markedly reducing sampling error. Having a more accurate response matrix means the mitigation yields results closer to what would be found if using the response matrix that exactly generates the error process.
DISCUSSION
We have presented BFA, an effective scheme for readout error mitigation on near-term qubit devices. We demonstrate that BFA can augment, and consistently outperform, other measurement error mitigation strategies as it always simplifies the underlying error model. This simplification allows the response matrix to be measured using far fewer resources than would otherwise be required. Furthermore, all BFA-augmented error models are calibrated from the same measurements, allowing these to inform the choice of model. In particular, BFA + TPN, BFA + full matrix mitigation, and all other combined schemes only require measurement of the state ∣0…0〉.BFA works by applying bit-flips to random qubits (premeasurement) and subsequently undoing these bit-flips in the classical readout result. This greatly simplifies the observed measurement response matrix, removing all biases toward particular input states. This bias is separate from readout error correlations, and so, BFA does not impose any assumptions about the error process. The resulting response matrix admits a highly symmetric form. We derive a general analytic expression for its inverse and show that it can be calculated with vector-matrix multiplication.We benchmark BFA using numerical simulations estimating response matrix and quantum state fidelities and examine the role of correlated readout errors. The simulations are based on the empirical response matrices measured precisely on an IBM quantum device. Our results show that BFA can accurately estimate the response matrix with as many or fewer measurement shots than required by other schemes. Furthermore, when readout error correlations are artificially boosted, we show BFA requires orders of magnitude fewer calibration shots to find an optimal error description. Last, we test the performance of BFA in a realistic task of measuring the (simulated) fidelity of a linear graph state in the presence of readout errors. In each case, BFA results in a more accurate fidelity estimate compared to non–bit-flipped counterparts.Implementing our scheme on real devices is technically simple but is prohibitively impractical in the current version of Qiskit. Previous works have demonstrated classical error models accurately describe on-device readout errors (, , ), which encouragingly suggests that BFA will continue to surpass other mitigation strategies on physical devices. We stress that practical implementation only requires minor changes to device access, and the BFA method itself only adds (effectively free) quantum bit-flips and classical postprocessing.BFA is a useful tool for NISQ era quantum computing, allowing noisy measurements to be mitigated even in the presence of significant readout error correlations. This provides more freedom in the fabrication of solid-state quantum devices, allowing more compact qubit layouts and greater connectivity. Efficiently mitigating correlated errors is particularly important for the current generation of quantum processors where high-quality devices are in high demand but short supply. Being able to perform small tasks on lower-quality devices without being significantly disadvantaged by readout noise helps alleviate the throughput issues that currently limit the effectiveness of near-term quantum algorithms.
MATERIALS AND METHODS
Simulations
Our simulations were performed using Qiskit’s QasmSimulator (). The circuits were simulated noiselessly and sampled finitely many times. The results of these measurement shots were then fed into the classical model described in Eq. 1 to simulate the readout error process. If a computational basis state ∣j〉 (expressed in binary) is measured, then the final output is sampled from the probability distribution given by the jth column of the target response matrix.
Derivation of inverse BFA-simplified response matrix
Starting from Eq. 9, we can diagonalize using the Hadamard matrixwhere Z( is defined in the same way as X(. While is a classical matrix, we note its analytical form, Eq. 9, lends itself to simple manipulations using the Pauli algebra. As this is now a sum over diagonal Pauli Z matrices, the eigenvalues {λ} of are now just the diagonal elements of this transformed matrix. The ith diagonal element of the Pauli operator Z( is given bywhere i · s is the dot-product between binary-vector representations of the integers i and s. This can then be rewritten in terms of the elements of the Hadamard operator , givingThe diagonal elements of , and so the eigenvalues of , are then given byNow that we have shown Eq. 10 and obtained the eigenvalues of we can find the inverse . We can do this by finding the projections of the diagonalized inverse matrix H⊗−1H⊗ = Λ−1 = diag (1/λ0, …,1/λ2) onto the different Pauli Z matrices {Z(} as, after the diagonalizing transformation is undone (sending Z( → X(), these projections will give us the coefficients in Eq. 11. As Tr(Z(Z() = 2δ, these components can be found by taking the trace of the diagonalized inverse multiplied by the different Z(
Boosting correlations in readout errors
To amplify the errors on adjacent qubits, we multiply the probability that an error occurs with syndrome S by γ, where γ is a boosting factor and n(S) is the number of adjacent 1-valued bit pairs in the syndrome S. The response matrix is then renormalized. In our simulations, we take adjacent to mean pairs of qubits that can be acted on with a two-qubit gate. The response matrices that were measured on ibmq_manhattan were for qubits connected in a linear chain, meaning that, for example, the error probability p(01001∣10100) that has syndrome 01001 ⊕ 10100 = 11101 would be multiplied by γ2 as there are two pairs of qubits that have undergone readout errors.
Authors: Vojtěch Havlíček; Antonio D Córcoles; Kristan Temme; Aram W Harrow; Abhinav Kandala; Jerry M Chow; Jay M Gambetta Journal: Nature Date: 2019-03-13 Impact factor: 49.962
Authors: Abhinav Kandala; Kristan Temme; Antonio D Córcoles; Antonio Mezzacapo; Jerry M Chow; Jay M Gambetta Journal: Nature Date: 2019-03-27 Impact factor: 49.962
Authors: Abhinav Kandala; Antonio Mezzacapo; Kristan Temme; Maika Takita; Markus Brink; Jerry M Chow; Jay M Gambetta Journal: Nature Date: 2017-09-13 Impact factor: 49.962
Authors: Frank Arute; Kunal Arya; Ryan Babbush; Dave Bacon; Joseph C Bardin; Rami Barends; Rupak Biswas; Sergio Boixo; Fernando G S L Brandao; David A Buell; Brian Burkett; Yu Chen; Zijun Chen; Ben Chiaro; Roberto Collins; William Courtney; Andrew Dunsworth; Edward Farhi; Brooks Foxen; Austin Fowler; Craig Gidney; Marissa Giustina; Rob Graff; Keith Guerin; Steve Habegger; Matthew P Harrigan; Michael J Hartmann; Alan Ho; Markus Hoffmann; Trent Huang; Travis S Humble; Sergei V Isakov; Evan Jeffrey; Zhang Jiang; Dvir Kafri; Kostyantyn Kechedzhi; Julian Kelly; Paul V Klimov; Sergey Knysh; Alexander Korotkov; Fedor Kostritsa; David Landhuis; Mike Lindmark; Erik Lucero; Dmitry Lyakh; Salvatore Mandrà; Jarrod R McClean; Matthew McEwen; Anthony Megrant; Xiao Mi; Kristel Michielsen; Masoud Mohseni; Josh Mutus; Ofer Naaman; Matthew Neeley; Charles Neill; Murphy Yuezhen Niu; Eric Ostby; Andre Petukhov; John C Platt; Chris Quintana; Eleanor G Rieffel; Pedram Roushan; Nicholas C Rubin; Daniel Sank; Kevin J Satzinger; Vadim Smelyanskiy; Kevin J Sung; Matthew D Trevithick; Amit Vainsencher; Benjamin Villalonga; Theodore White; Z Jamie Yao; Ping Yeh; Adam Zalcman; Hartmut Neven; John M Martinis Journal: Nature Date: 2019-10-23 Impact factor: 49.962
Authors: C Neill; T McCourt; X Mi; Z Jiang; M Y Niu; W Mruczkiewicz; I Aleiner; F Arute; K Arya; J Atalaya; R Babbush; J C Bardin; R Barends; A Bengtsson; A Bourassa; M Broughton; B B Buckley; D A Buell; B Burkett; N Bushnell; J Campero; Z Chen; B Chiaro; R Collins; W Courtney; S Demura; A R Derk; A Dunsworth; D Eppens; C Erickson; E Farhi; A G Fowler; B Foxen; C Gidney; M Giustina; J A Gross; M P Harrigan; S D Harrington; J Hilton; A Ho; S Hong; T Huang; W J Huggins; S V Isakov; M Jacob-Mitos; E Jeffrey; C Jones; D Kafri; K Kechedzhi; J Kelly; S Kim; P V Klimov; A N Korotkov; F Kostritsa; D Landhuis; P Laptev; E Lucero; O Martin; J R McClean; M McEwen; A Megrant; K C Miao; M Mohseni; J Mutus; O Naaman; M Neeley; M Newman; T E O'Brien; A Opremcak; E Ostby; B Pató; A Petukhov; C Quintana; N Redd; N C Rubin; D Sank; K J Satzinger; V Shvarts; D Strain; M Szalay; M D Trevithick; B Villalonga; T C White; Z Yao; P Yeh; A Zalcman; H Neven; S Boixo; L B Ioffe; P Roushan; Y Chen; V Smelyanskiy Journal: Nature Date: 2021-06-23 Impact factor: 49.962