Literature DB >> 28667372

What can be observed in real time PCR and when does it show?

Pavel Chigansky¹, Peter Jagers², Fima C Klebaner³.

Abstract

Real time, or quantitative, PCR typically starts from a very low concentration of initial DNA strands. During iterations the numbers increase, first essentially by doubling, later predominantly in a linear way. Observation of the number of DNA molecules in the experiment becomes possible only when it is substantially larger than initial numbers, and then possibly affected by the randomness in individual replication. Can the initial copy number still be determined? This is a classical problem and, indeed, a concrete special case of the general problem of determining the number of ancestors, mutants or invaders, of a population observed only later. We approach it through a generalised version of the branching process model introduced in Jagers and Klebaner (J Theor Biol 224(3):299-304, 2003. doi: 10.1016/S0022-5193(03)00166-8 ), and based on Michaelis-Menten type enzyme kinetical considerations from Schnell and Mendoza (J Theor Biol 184(4):433-440, 1997). A crucial role is played by the Michaelis-Menten constant being large, as compared to initial copy numbers. In a strange way, determination of the initial number turns out to be completely possible if the initial rate v is one, i.e all DNA strands replicate, but only partly so when [Formula: see text], and thus the initial rate or probability of succesful replication is lower than one. Then, the starting molecule number becomes hidden behind a "veil of uncertainty". This is a special case, of a hitherto unobserved general phenomenon in population growth processes, which will be adressed elsewhere.

Entities: Chemical Disease

Keywords: Branching processes; Initial number; Michaelis–Menten; PCR; Population dynamics; Population size dependence

Mesh：

Substances：
DNA

Year: 2017 PMID： 28667372 PMCID： PMC5772144 DOI： 10.1007/s00285-017-1154-1

Source DB: PubMed Journal: J Math Biol ISSN： 0303-6812 Impact factor: 2.259

Introduction

In the polymerase chain reaction a molecule replicates with a probability p(z), which will be of the formunder the asumption of Michaelis–Menten kinetics. Here, K is the Michaelis–Menten constant, large in terms of molecule numbers, z the number of DNA molecules at the actual round, and C a constant, which can be written as vK, where v is the maximal rate or speed of the reaction, corresponding to . Then, is the probability of successful replication under the most benign circumstances, and the decrease of p(z), as the number z of DNA strands present increases, mirrors that the latter are being synthesized from DNA building blocks, which disappear as the number of DNA molecules increases. As has been observed recently, though this is the general pattern, there are exceptions where the replication probability actually increases in the very first generation, due to impurities in templates (Ståhlberg et al. 2016). In this paper we disregard this and rely upon the Michaelis–Menten based approach in Jagers and Klebaner (2003), where it was used to explain the first exponential but later linear growth of molecule numbers, see also Best et al. (2015), Lalam et al. (2004), Lievens et al. (2012). For a statistical analysis, where PCR is modeled by branching processes without environmental change due to growth but with random effects and starting numbers cf. Hanlon and Vidyashankar (2011). Here we turn to the important task of determining the initial number, viewed as unknown but fixed, of molecules in a PCR amplification, i.e. classical quantitative PCR. In literature, it has been treated under the simplifying assumption of constant replication probabilities p(z), cf. Olofsson (2003), Vikalo et al. (2007). For an experimental approach based on differentiation see Swillens et al. (2004) and for a mathematical paper, focussing however on mutations in an abstract formulation see Piau (2005). Through the use of digital PCR (Vogelstein and Kinzler 1999) and barcoding (Best et al. 2015; Ståhlberg 2016, personal communication) new possibilities and techniques have been introduced. We hope to be able to treat such frameworks. The present work should be suitable for calibration and interpolation of density values in realtime PCR (Kubista 2016, personal communication) in the usual way. Observed values yield model parameter estimates. Thus specified, the model delivers predictions of missing values. In our setup, the value of v turns out to be crucial, the cases and yielding quite different situations. If the starting efficiency , then individual molecules replicate randomly and essentially independently during an intitial phase. By branching process theory their number will therefore, to begin with, grow like the product of a random factor and the famous exponential population growth. Randomness is therefore an essential part of the initial conditions of later phases with more of interaction with the environment but also more of deterministic structure, due to law of large numbers effects. It is in this sense, the original starting number has been hidden by a ’veil of uncertainty’. If, on the other hand, , the first observable process size can be inverted to yield the starting number. This phenomenon is what we investigate, for PCR in the present paper and for populations in habitats with a finite carrying capacity in a companion paper (Chigansky et al. 2017), cf. also Barbour et al. (2015, 2016). For somewhat related early examples from epidemic processes and a recent from population genetics, cf. Kendall (1956), Whittle (1955), Martin and Lambert (2015).

Mathematical setup

Denote the number of molecules in the n-th PCR cycle by , , so that can be viewed as generated by the recursionstarted at , where the ’s are Bernoulli random variables taking values 1 and 0 with complementary probabilities, andwhere denotes the sigma-algebra of the events, observable before time n. Consider the process , which we shall call the density process. An important role in its behaviour is played by the functionwhich is, indeed, the conditional expectation of given ,The following result is known, see Kurtz (1970), Klebaner (1993).

Theorem 1

Suppose that , as . Then, for any n,where denotes the n-th iterate of f. If the PCR starts from a fixed number of molecules, clearly . Since , also , for any n, and it follows that for any n. In other words, the limiting reaction is not observable at any fixed number of repetitions. The main result of this paper is that it becomes observable when the number of iterations is , where . To arrive at the result we make use of a linear replication process , in which the probability of successful molecular replication is constant and equal to v. In each round each molecule is thus replaced by two with probability v, but remains there alone with probability . The expected number of successors is thus . Mathematically, this process is given recursively by [see e.g. Haccou et al. (2007), Harris (2002) or Jagers (1975)]where the are independent Bernoulli random variables withSince the constitute a uniformly integrable martingale, it has an a.s. limitwith , provided . If the process starts from molecules, then in view of the branching property, the corresponding limit iswhere the are i.i.d. with the same continuous distribution as W. As is well known from branching process theory (see e.g. Theorem 8.2 in Harris (2002)), the moment generating function of the latter , is unique among moment generating functions satisfying the functional equationsubject to , where and . In our case, it takes the formThe random variable appears in the main result as an argument of the deterministic function H obtained as the limitIts existence and some properties are studied in the next section. Here we formulate the main result and an important corollary.

Theorem 2

Let and start the PCR amplification from molecules. Then converges in distributionalong any subsequence, such that are integers.

Remark 1

With , the process grows deterministically at the geometric rate and in this case . As will be increasingly clear, there are, however reasons to treat separately.

Corollary 1

For and any fixed n where denotes the n-th iterate of f andThis assertion extends to weak convergence of the sequences regarded as random elements in :

Remark 2

The limits increase strictly with respect to n. If , their entries are continuous random variables with positive variance, whereas if they are positive reals. If the limit in (6) is taken along an arbitrary subsequence K, then is asymptotic to the same limit up to a deterministic correction, which emerges in the rounding:

The limit function H(x)

Existence

Write the two expressions for f, (2) andwhere . This expression is more suitable for analysis of iterates of f near zero. It is easy to establish that f is increasing, which yields that all are increasing. Since for any ,Henceand the sequence is monotone decreasing in n for any positive x. Therefore the following limit in (5) exists,

Continuity

We show next that the convergence in (5) is uniform on bounded intervals. First observe thatIt is now easy to see by induction, that for any n and x Next, by (7) the Taylor expansion readsfor an appropriate . Replace now x by to haveHence we obtainwhere we have used that . The bound (8) shows that the seriesconverges uniformly on compacts. As a consequence of uniform convergence, we have that H is continuous.

The functional equation

Further, since , by taking the limit as , we obtain that H solves Schröder’s functional equationHowever, since the zero function is a solution, we must show that H is not identically zero. is also a solution, it is however directly excluded, since convergence is from above, . To show that H is positive, use (7) to obtain the following formula for the n-th iteratewhere, as usual, . Replacing x with , we haveClearly, , and , thereforeandHence from (10), for any n which is strictly positive for . Therefore in this domain.

Monotonicity

We show next that H is increasing. Let . Then each is increasing and thus does not decrease. Further, recall thatand for all . Hence for any ,andTaking the limit , we see that H(x) is a strictly increasing function on an open vicinity of the origin. Suppose now that H is constant on an interval with . Then, by (9), for any integer and, since H(x) does not decrease, it must be constant on all the intervals . In particular, H(x) cannot be strictly increasing on any open vicinity of the origin. The obtained contradiction shows that H is strictly increasing everywhere on . Next, since we have shown that the converge uniformly,for any as . Thus we have the following corollary needed in the proofs to come.

Corollary 2

We shall also need the inverse . It is easy to see that it solves the functional equation

Proofs

Let us start with the fundamental recursive equation for the stochastic density process (cf. Klebaner 1993)withNote that is a martingale difference sequence andThe corresponding deterministic recursion, obtained by omitting the martingale difference term, is

Proof of Theorem 2

In what follows bar denotes the density processes, i.e., , . Consider first the case . Define timeswhere is an arbitrary fixed constant and K is such that both and are integers. The crux of the proof is to approximate the density process in two steps. First, on the interval by the linear process , and then on the interval by the nonlinear deterministic recursion, however started from the random point , resulting from the first step. Denote by the flow, generated by the nonlinear deterministic recursion (13), i.e. its solution at time , when started from x at time k, . Further, write for the stochastic flow generated by the nonlinear process X, that is, the random map defined by the solution of the equation, cf. (1),subject to , at the terminal time . In particular, for any , andLet us stress that all the random objects here are defined on the same probability space and by construction coupled as described at the beginning of the proof. In the next steps we show that andBy (4), with , we may writeand henceTherefore, (14) follows from Corollary 2,To show (15) let for Subtracting the deterministic recursion (13) from the stochastic one (11) we haveThus the sequence satisfieswhere we have used (12) to bound . Note that , as both recursions start at the same point at time . Thereforesince and (15) now follows. The proof of (16) is more delicate and is done by coupling. We construct the nonlinear and linear replication processes and on the same probability space as follows. Let be i.i.d. random variables with the uniform distribution on [0, 1]. DefineThen and are realized by the formulae (1) and (3) with and as above. Since , we have for all n, j and therefore the linear process Y is always greater than the nonlinear process Z,Construct an auxilliary linear process , which bounds from below until gets larger than for . Actually we require that . LetandThen clearly, as long as . HenceIt is also clear that for all n, j, hence . Thus we obtainWe show next thatby using the inequality above. Since the moments of simple Galton–Watson processes are easily computed (Theorem 5.1 in Haccou et al. (2007), Harris (2002), or Jagers (1975))Since also, the first term in (17) satisfiesBy the Cauchy-Schwartz inequality for the second termSince for all n, it takes longer for the former process to reach than the corresponding time for the latter,Thereforewhere the last bound is Doob’s inequality for the martingale . Taking into account that , we obtain from the above estimatesRecall that . It follows that the convergence to the limit in (18) holds in , and in probability. For the corresponding densities, we have by dividing through by K thatSince and the function f is concave (), its derivative attains its maximum value at zero, and for any . Therefore . For and , this and (19) yieldsand the proof of case is complete. Consider now the case . In this case, the probability of successful replication isand the function f isHere andThe proof is the same, except that the linear replication process is in fact deterministic , if it starts with molecules, because the probability of replication is 1, . Hence the limit . The theorem is proved.

Proof of Corollary 1

The result follows by induction on n from the fundamental representation (11). For it is the statement of the main result. For take limits as in (11), and note that the stochastic term vanishes. Similarly, having proved it for n, it follows for . The functional limit theorem follows from finite dimensional convergence implying convergence in the sequence space, cf. Billingsley (Billingsley 1999, p. 19).

The relation to actual observations

Let denote the minimal observable concentration of DNA in the PCR experiment under consideration. Assume that the latter starts from inititial templates, where z is an unknown number and . Our aim is to determine z for . Mathematically, we shall interpret this as . In PCR literature based on enzyme kinetic considerations, values of the Michaelis–Menten constant range at least from (Lalam 2006) up to (Gevertz et al. 2005), in terms of molecule numbers. There are then two cases, known or unknown rate v. In the latter situation, v will have to be estimated from the observed concentrations. Further, as pointed out, the cases and exhibit an intriguing disparity, viz. consider first . By Corollary 1 The limit process here has strictly increasing trajectories and its entries have continuous distributions, so with probability one none of them equals . The first hitting timebeing a discontinuous functional with respect to the locally uniform metric on space of sequences, is however continuous almost surely under the limit law. Thereforeconverges weakly toIf , the limit sequence is deterministic and strictly increasing. Provided no happens to coincide with , we have weak convergence . Otherwise, still exists and differs at most by 1 from . We disregard this nuisance and assume in both cases that we have observed concentration values strictly larger than from onwards: , and correspondingly for , , , (to ease notation, we omit the dependence of upon .) By (9) this simplifies tofor andotherwise. Note that typically, since the experimenter would like to catch the density as early as possible, , which for example could be of the order of 0.05. Since H(x) is fairly close to the diagonal for (see Figure 1) and , we can conclude that as a rule .

Fig. 1

The function H(x) for several values of v

The function H(x) for several values of v As well as assuming K and known it is easy to think of situations where so is v. Then we can proceed directly to determining z. For this is straightforward:More generally,If there is variation between the z-values thus obtained we can of course take arithmetic means of the right hand side for the different observed j. Now, if , we obtainin the sense that the right hand side is an observed value of the random variable W(z). The initial number z of DNA molecules has now been hidden from direct calculation. What can be done is to estimate z from data, e.g. maximise the density at the first point of observation,where * denotes convolution power, is the density of W, which we know to have the moment generating function from Sect. 2, corresponding to v. In this, z is an unknown parameter and we obtain a maximum likelihood estimate , where and z ranges over natural numbers. Again we can also consider later -values and take averages, if this increases stability. Note that if z is large (but still much smaller than K), then by the local central limit theorem the ML problem is roughly the same as finding z maximizing the normal density with mean z and variance at the point ,This yields the estimateor rather one of its neighboring integers. Now, if entities cannot be deduced a priori the question arises to what extent they can be estimated from our sequence of observations. Clearly, in the limit the relation between an observation x and its successor in the next round will be that the latter converges to f(x), as , by Corollary 1. Thus e.g.,orThese problems are fairly standard in statistical literature but certainly deserve a special investigation in the present context, if possible together with an experimental study of replication of single or few molecules, in order to determine the initial efficiency, v.

10 in total

1. Digital PCR.

Authors: B Vogelstein; K W Kinzler
Journal: Proc Natl Acad Sci U S A Date: 1999-08-03 Impact factor: 11.205

2. Instant evaluation of the absolute initial number of cDNA copies from a single real-time PCR curve.

Authors: Stéphane Swillens; Jean-Christophe Goffard; Yoann Maréchal; Alban de Kerchove d'Exaerde; Hakim El Housni
Journal: Nucleic Acids Res Date: 2004-03-29 Impact factor: 16.971

3. Mathematical model of real-time PCR kinetics.

Authors: Jana L Gevertz; Stanley M Dunn; Charles M Roth
Journal: Biotechnol Bioeng Date: 2005-11-05 Impact factor: 4.530

4. Estimation of the reaction efficiency in polymerase chain reaction.

Authors: Nadia Lalam
Journal: J Theor Biol Date: 2006-06-09 Impact factor: 2.691

5. Random variation and concentration effects in PCR.

Authors: Peter Jagers; Fima Klebaner
Journal: J Theor Biol Date: 2003-10-07 Impact factor: 2.691

6. Enzymological considerations for a theoretical description of the quantitative competitive polymerase chain reaction (QC-PCR).

Authors: S Schnell; C Mendoza
Journal: J Theor Biol Date: 1997-02-21 Impact factor: 2.691

7. A simple, semi-deterministic approximation to the distribution of selective sweeps in large populations.

Authors: Guillaume Martin; Amaury Lambert
Journal: Theor Popul Biol Date: 2015-02-24 Impact factor: 1.570

8. Enhanced analysis of real-time PCR data by using a variable efficiency model: FPK-PCR.

Authors: Antoon Lievens; S Van Aelst; M Van den Bulcke; E Goetghebeur
Journal: Nucleic Acids Res Date: 2011-11-18 Impact factor: 16.971

9. Computational analysis of stochastic heterogeneity in PCR amplification efficiency revealed by single molecule barcoding.

Authors: Katharine Best; Theres Oakes; James M Heather; John Shawe-Taylor; Benny Chain
Journal: Sci Rep Date: 2015-10-13 Impact factor: 4.379

10. Simple, multiplexed, PCR-based barcoding of DNA enables sensitive mutation detection in liquid biopsies using sequencing.

Authors: Anders Ståhlberg; Paul M Krzyzanowski; Jennifer B Jackson; Matthew Egyud; Lincoln Stein; Tony E Godfrey
Journal: Nucleic Acids Res Date: 2016-04-07 Impact factor: 16.971

10 in total

2 in total

1. A minimally parametrized branching process explaining plateau phase of qPCR amplification.

Authors: Qingyang Luo
Journal: J Math Biol Date: 2018-08-27 Impact factor: 2.259

2. On the establishment of a mutant.

Authors: Jeremy Baker; Pavel Chigansky; Peter Jagers; Fima C Klebaner
Journal: J Math Biol Date: 2020-02-26 Impact factor: 2.259

2 in total