Philipp Thomas1. 1. Department of Mathematics, Imperial College London, London SW7 2AZ, UK p.thomas@imperial.ac.uk.
Abstract
Population growth is often ignored when quantifying gene expression levels across clonal cell populations. We develop a framework for obtaining the molecule number distributions in an exponentially growing cell population taking into account its age structure. In the presence of generation time variability, the average acquired across a population snapshot does not obey the average of a dividing cell over time, apparently contradicting ergodicity between single cells and the population. Instead, we show that the variation observed across snapshots with known cell age is captured by cell histories, a single-cell measure obtained from tracking an arbitrary cell of the population back to the ancestor from which it originated. The correspondence between cells of known age in a population with their histories represents an ergodic principle that provides a new interpretation of population snapshot data. We illustrate the principle using analytical solutions of stochastic gene expression models in cell populations with arbitrary generation time distributions. We further elucidate that the principle breaks down for biochemical reactions that are under selection, such as the expression of genes conveying antibiotic resistance, which gives rise to an experimental criterion with which to probe selection on gene expression fluctuations.
Population growth is often ignored when quantifying gene expression levels across clonal cell populations. We develop a framework for obtaining the molecule number distributions in an exponentially growing cell population taking into account its age structure. In the presence of generation time variability, the average acquired across a population snapshot does not obey the average of a dividing cell over time, apparently contradicting ergodicity between single cells and the population. Instead, we show that the variation observed across snapshots with known cell age is captured by cell histories, a single-cell measure obtained from tracking an arbitrary cell of the population back to the ancestor from which it originated. The correspondence between cells of known age in a population with their histories represents an ergodic principle that provides a new interpretation of population snapshot data. We illustrate the principle using analytical solutions of stochastic gene expression models in cell populations with arbitrary generation time distributions. We further elucidate that the principle breaks down for biochemical reactions that are under selection, such as the expression of genes conveying antibiotic resistance, which gives rise to an experimental criterion with which to probe selection on gene expression fluctuations.
Exploring the consequences of cell-to-cell variability is crucial to understand the functioning of endogenous and synthetic circuitry in living cells [1-4]. In clonal cell populations, differences between cells arise from several dynamical effects. An important source of heterogeneity is the intrinsic stochasticity in biochemical reactions [5-7]. Equally important, but often overlooked, is the substantial variability of individual cell division timings. In fact, two sister cells do not divide at the same time [8-10], which leads to more heterogeneous cell ages in clonal populations. Although both of these factors have been studied independently, their contributions cannot easily be separated when cells are growing and dividing [11,12]. Approaches that allow to investigate the interplay of these effects remain widely unexplored.Stochasticity in the levels of molecules per cell is commonly modelled using the stochastic simulation algorithm [5,13]. While this approach fares well for non-growing cells, it does not take into account the fact that molecule levels per cell need to double over the cell division cycle, at least on average. A number of studies therefore considered lineages that track the biochemical dynamics inside a single dividing cell over many generations [14-19]. Such approaches are well equipped to model the statistics observed in mother machines [20,21], for example, or using cell tracking [22] that allow monitoring intracellular biochemistry in isolated cells over time. Modelling approaches that also account for the substantial variations observed in cell division timing are only currently being developed [23,24]. Generation times of single Escherichia coli [10] and budding yeast cells [25], for example, vary up to 40% and 30% from their respective means, and similar values have been observed in mammalian cells [26].On the other hand, population snapshots are commonly used to quantify heterogeneity across clonal cell populations. Such data are obtained from flow cytometry [27] or smFISH [28], for instance. An important source of heterogeneity in these datasets stems from the unknown cell-cycle positions [29]. Sorting cells by physiological features—such as using cell-cycle markers, DNA content or cell size as a proxy for cell-cycle stage—are used to reduce this uncertainty [27,30,31]. It has also been suggested that simultaneous measurements of cell age, i.e. the time interval since the last division, could allow monitoring the progression of cells through the cell cycle from fixed images [30-33]. Presently, however, there exists no theoretical framework that addresses both cell-cycle variability and biochemical fluctuations measured across a growing cell population, and thus we lack the principles that allow us to establish such a correspondence.In applications, it is often assumed that the statistics observed over successive cell divisions of a single cell equals the average over a population with marked cell-cycle stages at a single point in time [34]. In statistical physics, such an assumption is referred to as an ergodic hypothesis, which once it is verified leads to an ergodic principle. Such principles certainly fare well for non-dividing cell populations, but it is less clear whether they also apply to growing populations, in particular, in the presence of fluctuating division times of single cells. While this relationship can be tested experimentally [35,36], we demonstrate that it is also amenable to theoretical investigation.In this article, we develop a framework to analyse the distribution of stochastic biochemical reactions across a growing cell population. We first note that the molecule distribution across a population snapshot sorted by cell ages disagrees with the statistics of single cells observed in isolation, similarly to what has been described for the statistics of cell-cycle durations [8,37,38]. We go on to show that a cell history, a single cell measure obtained from tree data describing typical lineages in a population [39-43], agrees exactly with age-sorted snapshots of molecule numbers. The correspondence between histories and population snapshots thus reveals an ergodic principle relating the cell-cycle progression of single cells to the population. The principle gives important biological insights because it provides a new interpretation to population snapshot data.In the results, we investigate the differences of the statistics of isolated cell lineages and population snapshots. Section 2.1 develops a novel approach to model the stochastic biochemical dynamics in a growing cell population. We derive the governing equations for an age-sorted population and formulate the ergodic principle. In §2.2, we demonstrate this principle using explicit analytical solutions for stochastic gene expression in forward lineages and populations of growing and dividing cells. Our results are compared with stochastic simulations directly sampling the histories of cells in the population. Finally, in §2.3, we elucidate using experimental fluorescence data of an antibiotic-resistance gene that testing the principle allows us to discriminate whether a biochemical process is under selection.
Results
Several statistical measures can be used to quantify the levels of gene expression in single cells and populations. Distributions obtained across a cell population, such as those taken from static images, represent the final state of a growing population (figure 1a, shaded green cells). Cells in isolation, by contrast, can be modelled as random paths in the lineage tree, denoted as forward lineages (figure 1a, black line), that follow either of the two daughter cells with equal probability. Stochastic simulations show that the distributions of these two statistics disagree (figure 1b), and thus, apparently, the population dynamics violates the ergodic hypothesis. In the following analysis, we provide a novel ergodic principle relating how single cells progress through the cell cycle to the distribution of a growing cell population. To formulate the principle, we introduce histories (figure 1a, red line), a single-cell statistic that has the same distribution as a population that is sorted by cell ages (figure 1c,d). Such histories are obtained by choosing an arbitrary cell in the population and tracing its evolution back to the ancestor from which the whole population originated.
Figure 1.
Ergodic principle between single cells and the population. (a) Lineage tree of a clonal population leading to heterogeneous distribution of molecule numbers per cell (green colour). Two types of lineage statistics characterize the tree: (i) forward lineages (black line) originate from a common ancestor, end at an arbitrary cell in the population, and (ii) histories start from an arbitrarily chosen cell in the population and end at a common ancestor (red line). The conceptual difference between these measures are the probabilities with which these lineages are selected. (b) Statistics of simulated forward lineages do not agree with snapshot distributions (see box 1 and caption of figure 5 for simulation details). (c) Statistics observed across the population and the statistics after sorting cells by age, which allows inferring the cell-cycle progression. (d) The ergodic principle states that the statistics of age-sorted cells equals the distribution observed along cell histories but not along forward lineages.
Ergodic principle between single cells and the population. (a) Lineage tree of a clonal population leading to heterogeneous distribution of molecule numbers per cell (green colour). Two types of lineage statistics characterize the tree: (i) forward lineages (black line) originate from a common ancestor, end at an arbitrary cell in the population, and (ii) histories start from an arbitrarily chosen cell in the population and end at a common ancestor (red line). The conceptual difference between these measures are the probabilities with which these lineages are selected. (b) Statistics of simulated forward lineages do not agree with snapshot distributions (see box 1 and caption of figure 5 for simulation details). (c) Statistics observed across the population and the statistics after sorting cells by age, which allows inferring the cell-cycle progression. (d) The ergodic principle states that the statistics of age-sorted cells equals the distribution observed along cell histories but not along forward lineages.
Figure 5.
Lineage and history characterization of the expression of a stable protein. (a) Representative time courses of forward lineages (blue) and histories (red) of a stable protein collected from a single population tree. (b) Simulated distributions of inherited molecules in forward lineages (blue) and histories (red) are shown as shaded areas. The analytical solution for a population (solid red line) agrees with the simulated histories but not with the lineage verifying the ergodic principle. For comparison, we also show the analytical solution of a lineage (solid blue line). (c) Distributions obtained irrespectively of cell age from forward lineages (shaded blue) and across the population (shaded blue), both of which are in excellent agreement with the simulations. In all panels, the distribution of division time is lognormal with unit mean and s.d. 3 resulting in a growth rate of approximately 1.25. Parameters for the simulation of the full gene expression model (2.26) are k0 = 20, k1 = 20 and k2 = 100.
Biochemical dynamics in cell populations
We model stochastic biochemical reactions in a population of growing and dividing cells. We assume that each cell contains a pool of interacting biochemical species X1, X2, …, X reacting via a network of R intracellular reactions of the formwhere r = 1, …, R and ν± are the stoichiometric coefficients. To model the effect of cell divisions, we associate to each cell an age τ measuring the time interval from cell birth. If cells divide with an age-dependent rate γ(τ), which is independent of the number of molecules x in the network under consideration, the division times τd of each cell are distributed according toEquivalently, for a given division-time distribution one obtains the corresponding rate function viaWhile the forward-lineage approach is summarized in appendix A, we here focus on the population dynamics. To this end, we associate with the state of the population the density of cells n(τ, x, t) with molecule count x and age between τ and τ + dτ at time t. This quantity represents the outcome of repeated snapshots of the population growth process, which averages out variations in the total number of cells (see appendix B for a detailed derivation). The mean number of cells in the population is then given bySince the probability for a cell to divide at age τ is given by γ(τ) dτ, the rate of change in the cell density obeyswhere is the transition matrix for the biochemical reactionsw is the propensity and (ν) = ν+ − ν− is the stochiometric vector of the r-th reaction. Equation (2.3) must be supplemented by a boundary condition accounting for cell divisions. Because the number of newborn cells equals twice the number of dividing cells, the condition readsThe division kernel B(x | x′) in equation (2.3) models the inheritance of x daughter molecules from x′ mother molecules and is given bywhere B1(x | x′) and B1(x′ − x | x′) are the marginal distributions of inherited molecules for the two daughter cells (see appendix B for details). Note that the total molecule numbers are conserved during cell division. For example, when molecules are partitioned with equal probability between daughter cells, B1 and B are binomial [16,44].A simple algorithm that enables simulating the biochemical dynamics in the population exactly, which we will refer to as the First Division Algorithm, is given in box 1. Step 2 simulates the transitions due to biochemical reactions, equation (2.3), while step 3 implements the boundary condition (2.3) for cell divisions. The density of cells with molecule count x and age τ obtained from several snapshots then obeys equations (2.3).1. Initialization. At time t = 0, initialize the cell population by assigning to each cell an age τ, a division time τd, and molecule count x.2. Biochemical reactions. Determine the next dividing cell from j = argmin (τd, − τ) and Δt = min (τd, − τ). Advance the molecule numbers of each cell independently from age τ to τ + Δt using the Gillespie algorithm and advance time from t to t + Δt.3. Cell division. Replace the dividing cell by two newborn daughter cells of zero age. Assign to one of these a molecule number distributed according to B1(x | xj), depending on the mother's molecule count x, and assign the remaining molecules to the other daughter. Assign to each daughter independently a division time distributed according to φ(τd).4. Repeat. Repeat from 2 until t = t.While stochastic simulations are relatively easy to carry out, equations (2.3) are generally difficult to solve because they represent an integro-partial differential equation. To allow for analytical progress, we consider the long-term behaviour in which the population grows exponentially with rate λ,and identify Π(τ, x) as the joint distribution of cell ages and intracellular molecule counts in the population. We treat this distribution synonymously with the population snapshot. Using ansatz (2.5) in equation (2.3), we obtainSimilarly, the corresponding boundary condition is given by equation (2.3) but with replacing n(τ, x) by Π(τ, x).
Age distribution in a population
The age distribution measures the fraction of cells in the population that reach a given age. It is obtained from marginalizing the population distribution over the molecule numbersSumming equation (2.6) and solving (see appendix C for details), we find that the age distribution obeyswhich recovers the results in [8,9,45]. The integral in the above equation denotes the probability that a cell has not divided before reaching age τ.The population growth rate λ needs to be determined from the boundary condition (appendix C), which leads to the characteristic equationalso known as the Euler–Lotka equation [8,9,45]. The largest root of this equation yields the population growth rate λ. For a discussion of the relation between the population growth rate and the mean division time, see [10,45].
Molecule-number distribution in an age-sorted population
Next, we consider the probability of observing x molecules in a cell of age τ. This conditional probability is given by the number of cells with age τ and molecule count x divided by the number of cells at that ageIt can be verified by plugging equation (2.10) into (2.6) that the distribution obeys the master equationSimilarly, inserting equation (2.10) with (2.5) into boundary condition (2.3), we deduce that the distribution of inherited molecules obeysWe identify the density ρ with the ancestral division-time distribution [8,40]describing the past division-times in a growing cell population. Taken together, equations (2.11) describe the distribution in an age-sorted population. Combining these equations with the age distribution given in the previous section they contain the full statistical information of the population distribution. An extension of this result to heritable division times is provided in appendix C.Since histories contain only ancestral cells, their division times are distributed according to the ancestral division-time distribution ρ, as has been pointed by Wakamoto et al. [40]. In figure 2a, we compare the division-time distributions in forward lineages and histories for variable division times. We observe that the distribution tails of φ are suppressed in cell histories due to exponential dependence on the growth rate λ. Specifically, we highlight the fact that cells with division times shorter than ln 2/λ are over-represented in histories compared with forward lineages, while cells with longer division times are over-represented.
Figure 2.
Generation time variability in forward lineages and histories. (a) Division times in forward lineages (dashed lines) following a gamma distribution with unit mean and coefficients of variation of 10% (red, scaled by factor ), 50% (blue) and 100% (yellow dashed line). The corresponding division time distributions in cell histories are shown by solid lines. (b) The corresponding age distributions in forward lineages (dashed), histories (solid lines) and the population (light dashed). Note that the age-distribution in a history is the same as the population for the exponential distribution (CV 100%, yellow line), and the distributions in histories and forward lineages are very similar for small cell-cycle variability (CV 10%, red lines).
Generation time variability in forward lineages and histories. (a) Division times in forward lineages (dashed lines) following a gamma distribution with unit mean and coefficients of variation of 10% (red, scaled by factor ), 50% (blue) and 100% (yellow dashed line). The corresponding division time distributions in cell histories are shown by solid lines. (b) The corresponding age distributions in forward lineages (dashed), histories (solid lines) and the population (light dashed). Note that the age-distribution in a history is the same as the population for the exponential distribution (CV 100%, yellow line), and the distributions in histories and forward lineages are very similar for small cell-cycle variability (CV 10%, red lines).
Ergodic principle for cell populations
We now explain how these quantities can be computed from population data. To this end, we use the approach presented by Nozoe et al. [42] enumerating all lineages of a given population tree. We denote the molecule numbers in lineage j by (x1,(τ1,), …, x(τd, )) with D completed cell divisions, where x(τ) is the time-course from birth to division of molecule numbers of the ith cell in that lineage. We then evaluate the average of a function f(x) over a lineage of cells at a given age τ,where is the number of cells in the lineage that reach age τ before dividing.Because forward lineages track either one of the two daughter cells with equal probability, the probability of choosing lineage j from the tree is 2−, which decreases with the number of cell divisions [42]. Because the division times in forward lineages follow the distribution φ, we haveThe distribution under the integral is the lineage-probability described in appendix A.Next, we consider the distribution along histories that track an arbitrary cell from the population and trace back its evolution. The probability of choosing such a history is the same for every lineage and therefore equals the inverse number of cells N(t) in the population. The statistics of histories can thus be interpreted as a typical lineage. We note that equations (2.11) can be understood as describing the distribution along a lineage, similar to the forward-lineage approach develop in appendix A (cf. equations (A 1)), but by replacing the division-time distribution φ with the one of the ancestral population ρ. Since histories are composed entirely of ancestral cells, their division times follow the ancestral division-time distribution ρ. It hence follows that histories posses the same statistics as in the age-sorted population, that isThe distribution Π(x | τ) is the lineage distribution given by the solution of equations (2.11) and equals the distribution in an age-sorted population, as we have shown in §2.1.2. We thus formulate the ergodic principle as the average of the histories of single cells in a population obtained over many cell divisions equals the average over an age-sorted cell population at every time point. A more intuitive way of stating the result is that the endpoints of all lineages have the same distribution Π(x | τ) as cells of τ in the population. Note that equations (2.13) and (2.14) are equal only for deterministic division times. In fact, most lineages are close to the history-average given by the right-hand side of equation (2.14), as we show in the following section through the use of examples (§2.2).Before proceeding, we investigate the age distribution that yields the fraction of cells reaching age τ in a history. Because division times in histories are distributed according to ρ and the probability that a cell has not divided before age τ is , we havewhere 〈τd〉 is the mean of the distribution ρ, which arises as a normalizing constant (see appendix D). The result is notably different from the age-structure in a population Π(τ), compare equation (2.8). For example, the corresponding age distributions for gamma-distributed division times are compared in figure 2b.It thus follows that the average over a history of molecule numbers is not generally the same as the population average irrespective of age. The only exception to the rule are memoryless division times, i.e. constant division rates resulting in exponentially distributed division timings. In this case, equation (2.8) and (2.15) agree, but interestingly, the population distribution differs from the forward-lineage distribution.
Analytical solutions for forward lineages, histories and snapshot distributions
To obtain analytical solutions we make use of the generating function approach. For this purpose, we restrict ourselves to binomial partitioning of moleculesThe boundary condition (2.11) for the probability generating function then simplifies toIn the following, we demonstrate how explicit solutions for the lineage, histories and snapshot distributions can be obtained for simple examples of stochastic gene expression but arbitrary division-time distributions.
Molecule synthesis and degradation
As a first application, we consider a birth–death process. Molecules are synthesized at a constant rate k0 and are degraded via a first-order reaction with rate k1,To test whether differences between forward lineages and histories could be significant in a single population, we used the First Division Algorithm (box 1) to simulate a population tree corresponding to figure 1a. We then select histories by choosing cells at random from the population and tracing their evolution back through the population tree. Similarly, we choose forward lineages by following the first cell in the population over many cell cycles selecting either daughter cell with equal probability. A qualitative comparison of few representative time-courses shows that forward lineages exhibit fewer cell divisions but higher molecule counts than histories (figure 3a), consistent with the predicted age-distributions, which contain more old cells in forward lineages than in histories (figure 2b).
Figure 3.
Histories represent typical lineages of finite population trees. (a) Representative forward lineages (blue) and histories (red lines) from a single population for the birth–death process (2.18) with rates k0 = 5 and k1 = 0.1 and division times are exponentially distributed with unit rate. Simulated time-courses show molecule synthesis by biochemical reactions (step 1 in box 1) interrupted by sudden decreases in molecule number due to binomial partitioning at cell division (step 2 in box 1). (b) Smoothed frequency distributions of the mean number of inherited molecules computed for each lineage of 10 population trees with N = 100 (teal), 1000 (blue) and 10 000 cells (magenta) at the final time and rates k0 = 50 and k1 = 1. The distribution mode is centred about the mean of histories (solid grey line) but not about the means of forward lineages (dashed grey line). (c) The corresponding second moment E[x2 | 0] is shown. The result indicates that the statistics of most lineages is close to the one of histories.
Histories represent typical lineages of finite population trees. (a) Representative forward lineages (blue) and histories (red lines) from a single population for the birth–death process (2.18) with rates k0 = 5 and k1 = 0.1 and division times are exponentially distributed with unit rate. Simulated time-courses show molecule synthesis by biochemical reactions (step 1 in box 1) interrupted by sudden decreases in molecule number due to binomial partitioning at cell division (step 2 in box 1). (b) Smoothed frequency distributions of the mean number of inherited molecules computed for each lineage of 10 population trees with N = 100 (teal), 1000 (blue) and 10 000 cells (magenta) at the final time and rates k0 = 50 and k1 = 1. The distribution mode is centred about the mean of histories (solid grey line) but not about the means of forward lineages (dashed grey line). (c) The corresponding second moment E[x2 | 0] is shown. The result indicates that the statistics of most lineages is close to the one of histories.To quantify whether histories represent typical lineages in a finite population, we simulate 40 lineage trees up to a population size of N = 100, 1000 and 10 000. We then calculate mean and second moments for each individual lineage in the tree and evaluate their distributions across all lineages (figure 3b,c). As the population size increases the distribution of first and second moments becomes centred about the moments of the history distribution which are smaller than the corresponding ones in forward lineages. In particular, the moments of histories correspond to the distribution modes verifying that histories are typical lineages, even for finite populations.Next, we investigate how to analytically characterize the distributions of both processes. The generating function of the corresponding master equation obeys the evolution equationBecause molecules are produced and degraded independently, their number x per cell can be separated into a sum of two independent contributions: (i) the amount of molecules newly produced and degraded up to age τ, and (ii) the amount of molecules inherited after cell division. Clearly, the first contribution is Poissonian and its mean is given by μ(τ) = (k0/k1)(1 − e−). The second contribution needs to be determined from the boundary condition as we show in the following.The solution to equation (2.19) can be written as a product of generating functions of the aforementioned contributionswhere the first factor is the generating function of newly produced and degraded molecules until age τ. Using this solution in equation (2.17) we find the conditionIf cell divisions occur at deterministic time intervals, the integral is straightforward. In this case, the distribution of inherited molecules p0 is Poissonian with mean μp(τ) = ((1 − )/(2 − ))(k0/k1) and depends on the division-time τd. This result has been obtained earlier by Schwabe et al. [29], and as a particular case by Johnston et al. [19].The solution turns out to be more involved in the presence of division-time variability. Certainly, equation (2.20) could be solved for particular choices of ρ(τd). Since, however, the division-time variability is difficult to characterize, in general, we aim to solve this equation for arbitrary distributions. To this end, we expandand write an equation for the series coefficients a. Plugging equation (2.22) into (2.20) and equating equal powers of z, we findwhere is the Laplace transform of the division-time distribution in equation (2.11). From the above formula all coefficients can be computed recursively using a0 = 1. It follows using G0((z − 1) e−) in equation (2.22) and differentiating at z = 1 that the probability of observing x molecules inherited after cell division isNote that the distribution depends on the cell age τ because the inherited molecules are degraded over the cell cycle. Because the number of molecules produced up to age τ is Poissonian, the total amount of molecules for a cell of age τ obeysThe distributions of forward lineages are obtained by substituting φ for ρ in equation (2.23) (appendix A).In figure 4a, we show the resulting distributions of inherited molecules (τ = 0) for different levels of cell-cycle variability (figure 4a,i) for the cases of an age-sorted population (solid lines) and for the lineage statistics (dashed lines). Division times are assumed to be gamma-distributed, which fits cell-cycle variability observed in many bacteria [8,10]. The molecule number distributions are obtained using either ρ or φ in equation (2.23) and truncating equation (2.24) after the first 150 terms. In agreement with the theoretical predictions, lineage and histories distributions are similar for small division time variability (red lines) but become significantly different as variability increases (yellow lines). Also shown are the corresponding distribution for varying degradation rates (figure 4a,ii). We observe that forward lineages and history statistics are comparable for unstable molecules (fast degradation) but not for stable ones (slow degradation). This finding is in line with the intuition that rapid degradation averages out timing fluctuations.
Figure 4.
Analytical solutions for a birth–death process. (a)(i) Analytical distributions of inherited molecules in a population and in forward lineages are depicted for gamma-distributed division times with 10% (red), 50% (blue) and 100% (yellow line) variations about the mean (as shown in figure 2). Simulated distributions of inherited molecules in histories are shown as shaded areas. Lineages and histories agree only for small division times variability. Parameters are k0 = 50 and k1 = 1. (a)(ii) Analytical solutions for different degradation rates, k1 = 0.1 (teal), k1 = 1 (blue) and k1 = 10 (magenta). The distribution of inherited molecules in a population (τ = 0, solid lines) coincides with the simulated histories (shaded ares), which verifies the ergodic principle, but not with the lineage distribution (dashed lines). Histories and forward lineages agree well only for large degradation rates. Parameters are k0 = 50k1 and exponentially distributed division times with unit mean (CV[τ] = 1). (b) The corresponding distributions are shown for the mid-cell cycle τ = 0.5. (c) Simulated distributions estimated irrespectively of cell age across the population (shaded areas) agree well with analytical solutions (solid lines). Also shown are the theoretical distributions across forward lineages (dashed lines).
Analytical solutions for a birth–death process. (a)(i) Analytical distributions of inherited molecules in a population and in forward lineages are depicted for gamma-distributed division times with 10% (red), 50% (blue) and 100% (yellow line) variations about the mean (as shown in figure 2). Simulated distributions of inherited molecules in histories are shown as shaded areas. Lineages and histories agree only for small division times variability. Parameters are k0 = 50 and k1 = 1. (a)(ii) Analytical solutions for different degradation rates, k1 = 0.1 (teal), k1 = 1 (blue) and k1 = 10 (magenta). The distribution of inherited molecules in a population (τ = 0, solid lines) coincides with the simulated histories (shaded ares), which verifies the ergodic principle, but not with the lineage distribution (dashed lines). Histories and forward lineages agree well only for large degradation rates. Parameters are k0 = 50k1 and exponentially distributed division times with unit mean (CV[τ] = 1). (b) The corresponding distributions are shown for the mid-cell cycle τ = 0.5. (c) Simulated distributions estimated irrespectively of cell age across the population (shaded areas) agree well with analytical solutions (solid lines). Also shown are the theoretical distributions across forward lineages (dashed lines).To support the numerical results, we evaluate the mean number of molecules at birth
. For fast degradation, the expression reduces to in both forward lineages and histories because it is independent of the division times. For slow degradation, however, we obtain E[x | 0] ∼ k0E(τd) in histories and Efw[x | 0] ∼ k0E(τd) in forward lineages. Since E(τd) ≤ E(τd) with equality for deterministic division times [10], we conclude that cells in histories and the population contain fewer molecules than in forward lineages. One can further show that the difference between population and forward lineages E[x | 0] − Efw[x | 0] = − k0ln 2 CV2[τ] + O(CV4) increases with the coefficient of variation of timing fluctuations.In figure 4b, we illustrate the corresponding distributions for τ = 0.5 corresponding to half of the mean generation time. We observe that both cell-cycle variation (figure 4b,i) and degradation (figure 4b,ii) mitigate the discrepancies between lineage and history statistics. Intuitively, this could also be concluded from the fact that equation (2.19), and similarly the distribution of molecules equation (2.11), approaches a steady state independent of the number of inherited molecules for large τ. For all cases, we verified the ergodic principle by stochastic simulations using the First Division Algorithm (box 1) of 40 population trees from which we chose histories at random. The corresponding distributions are shown as shaded areas in figure 4a,b, which are in excellent agreement with our analytical solutions (solid lines).To investigate the effect of the overall distribution in the population irrespectively of cell age, we numerically integrate the analytical solution (2.25) over the age-distribution Π(τ) of the population given by equation (2.8), ; the result is shown in figure 4c. The corresponding age distribution in a forward lineage is given in appendix D. We find that these predictions (solid lines) are in excellent agreement with the simulated distributions across populations (shaded areas) but not with the distributions across forward lineages (dashed lines).
Expression of a stable protein
Characteristic mRNA half-lives in E. coli are of the order of 5 min, while doubling times range from 20 min to several hours. Given the findings of the previous section, this suggests that generation time variability has little effect on mRNA distributions. Protein half-lives in living cells, however, occur on timescales much longer than the doubling time, meaning that many proteins are essentially stable and diluted mainly through cell divisions.To investigate protein variability in growing cell populations, we model the expression of such a protein including mRNA transcription via the reactionsUsing the First Division Algorithm (box 1), we simulate a population of dividing cells and collect representative forward lineages and histories (figure 5a). Given that empirical distributions of rescaled log-division times in E. coli are invariant and bell-shaped across conditions [46], we assume lognormal distributed division times. We observe that protein time-courses corresponding to forward lineages have higher protein levels and are noisier than histories.Lineage and history characterization of the expression of a stable protein. (a) Representative time courses of forward lineages (blue) and histories (red) of a stable protein collected from a single population tree. (b) Simulated distributions of inherited molecules in forward lineages (blue) and histories (red) are shown as shaded areas. The analytical solution for a population (solid red line) agrees with the simulated histories but not with the lineage verifying the ergodic principle. For comparison, we also show the analytical solution of a lineage (solid blue line). (c) Distributions obtained irrespectively of cell age from forward lineages (shaded blue) and across the population (shaded blue), both of which are in excellent agreement with the simulations. In all panels, the distribution of division time is lognormal with unit mean and s.d. 3 resulting in a growth rate of approximately 1.25. Parameters for the simulation of the full gene expression model (2.26) are k0 = 20, k1 = 20 and k2 = 100.Deriving analytical expressions for the distribution of proteins is more involved because the amount of produced protein generally depends on the amount of inherited mRNA. We use the simplifying assumption that the mRNA half-life is much shorter than the cell-cycle time, for which the reactions (2.26) reduce to a single reaction [47,48]with proteins being produced in stochastic bursts of size m with distribution ν. For the reactions (2.26), the burst size distribution ν is geometric with mean b = k2/k1 corresponding to the mean number of proteins produced per mRNA lifetime [48].The rate of change for the generating function of this process satisfieswhere . It can be verified using equation (2.28) and the boundary condition (2.17) that the exact generating function solution iswhere and is the Laplace transform of ρ. Marginalizing the above equation over the age distribution, we obtain the generating function for the protein number distribution in the populationAn explicit expression for the Laplace transform of the age-distribution is given in appendix D, equation (D 3).The distributions corresponding to the generating function, equation (2.29), are obtained numerically via the inverse transformNote that because the moment-generating function does not exist for the lognormal distribution, we computed the Laplace transform of the division-time distribution directly. In practise, it was sufficient to truncate the sum in K(z) after the first 10 terms. In figure 5b, we show the resulting distribution of inherited protein in a population (τ = 0, red line). We observe excellent agreement between the theoretical solution and the history statistics obtained from stochastic simulations of 40 population trees (shaded red area), which verifies the ergodic principle. Instead, the distribution obtained across forward cell lineages shows a broader distribution with significant distribution tails (blue line) due to larger division time variability, in agreement with simulations (shaded blue area). Similar to the previous example, cells in histories and the population inherit fewer molecules than in forward lineages. We also obtain good agreement with the snapshot distributions measured across the cell population shown in figure 5c (red line: theory, red shaded area: simulation) by instead using equation (2.30) in the transformation (2.31).
Breakdown of the ergodic principle under selection
So far, we considered traits that do not affect the cell division rate. To test the validity of the principle in the opposite case, we simulate the production of a biomolecule whose levels increase division rate. We find that the distributions in histories do not coincide with the age-sorted population (figure 6a). The history distribution exhibits significantly larger tails resulting in a higher mean number of molecules (21.6 in histories versus 12.4 in population). This breakdown suggests that one could in principle discriminate the factors that affect cell fitness.
Figure 6.
The ergodic principle breaks down in the presence of selection. (a) Simulated distributions of inherited molecules for a biomolecule that increases division rate. It is seen that histories and age-sorted distributions are different and thus the ergodic principle is violated. The ratio of these probabilities yields the reproductive value, ν(x | 0), which increases with the number of molecules in the presence of selection (inset). In the simulation molecules are produced with rate k0 and divisions occur after a period τd = 50K10/(K10 + x100), where k0 = 100, K = 10 and x0 is the molecule count of that cell at birth. (b) Statistics of tree data of an antibiotic-resistance gene in the absence of antibiotic taken from Nozoe et al. [42]. The history distribution agrees very well with the population distribution. The ergodic principle is obeyed and the reproductive value of cells is independent of the gene expression (inset). (c) By contrast, in the presence of the drug, the distributions of histories and the age-sorted population disagree. The increase of the reproductive value with fluorescence indicates selection on high protein levels (inset).
The ergodic principle breaks down in the presence of selection. (a) Simulated distributions of inherited molecules for a biomolecule that increases division rate. It is seen that histories and age-sorted distributions are different and thus the ergodic principle is violated. The ratio of these probabilities yields the reproductive value, ν(x | 0), which increases with the number of molecules in the presence of selection (inset). In the simulation molecules are produced with rate k0 and divisions occur after a period τd = 50K10/(K10 + x100), where k0 = 100, K = 10 and x0 is the molecule count of that cell at birth. (b) Statistics of tree data of an antibiotic-resistance gene in the absence of antibiotic taken from Nozoe et al. [42]. The history distribution agrees very well with the population distribution. The ergodic principle is obeyed and the reproductive value of cells is independent of the gene expression (inset). (c) By contrast, in the presence of the drug, the distributions of histories and the age-sorted population disagree. The increase of the reproductive value with fluorescence indicates selection on high protein levels (inset).To investigate this effect, we consider Fisher's reproductive value [49], denoted by ν, which counts the relative number of future offspring for a cell in a given phenotypic state. Interestingly, it can be defined as the ratio of the distribution of histories and the age-sorted population statistics [50]. For a given trait x observed in cells of age τ, we here use their conditional distributions that provide the following factorizationThe first factor is the reproductive value due expression of x, involving the distributions across the age-sorted population Π(x | τ) and across histories ρ(x | τ). The second factor, ν(τ) = Π(τ)/Π(τ), denotes the contribution from other factors affecting the division rate and is given by the ratio of the age distribution of histories Π(τ) and the age distribution of the population Π(τ), which are generally different [10,40]. According to the ergodic principle the first contribution equals 1, when x does not affect the division rate, and thus the reproductive value is solely determined by ‘other’ factors that are not explicitly known and modelled via a stochastic interdivision time. When x increases division rate, higher levels of x can be selected upon and the reproductive value ν(x | τ) increases with x (figure 6a, inset). Thus, when the molecule levels are selected upon, ancestral cells contain more molecules on average than cells of the present population (figure 6a).We investigate this dependence using recently acquired tree data of the antibiotic-resistance gene smR conveying resistance to the antibiotic streptomycin [42]. The protein is fused to a fluorescent reporter allowing direct observation. We obtain age-sorted distributions using measured total fluorescence after cell division (τ = 0), while lineage-weighted distributions are computed using equations (2.13) and (2.14). In the absence of antibiotic treatment, the distributions of new-born cells in the age-sorted, histories and forward lineages are essentially the same (figure 6b). In agreement with the ergodic principle, the reproductive value due to the expression of the protein is constant (figure 6b, inset). In the presence of sub-inhibitory antibiotic doses, the distribution in histories differ from the one acquired across the population (figure 6c) and, moreover, the reproductive value increases with fluorescence (inset), which indicates selection on higher expression levels. These results are in agreement with the known function of the gene and highlight that testing the proposed ergodic principle can be used to probe selection in cell populations.
Discussion
We presented a modelling approach for stochastic biochemical reactions in dividing cell populations. The theory enabled us to identify an ergodic principle between single cells and the population, which can be stated as the average of the histories of single cells in a population obtained over many cell divisions equals the average over an age-sorted cell population at any single time point. Cell histories capture the statistics of typical lineages in a population, which are obtained by choosing an arbitrary individual of a clonal population and tracking back its evolution to the ancestor from which the population originated.This principle provides an interpretation of population snapshot data because it identifies the sample paths that are associated with age-sorted snapshot data with typical cell histories. Histories thus allow us to characterize the progression of single cells in the population through the cell cycle. We emphasized that the statistical difference between forward lineages and histories arises from variable division times, cells dividing faster than the population growth rate being over-represented in histories as compared to forward lineages. These deviations are expected to be significant since generation times of single cells are highly variable in microbial cell populations.It is important to point out that the principle requires the cell age to be known. The exception to this rule is the case of exponential division times for which the time-average of histories equals the one across snapshots. In the general case, gating techniques as used in flow cytometry can be used to narrow the range of physiological differences and thus correspond to age-sorted distributions, which according to the principle agree with the histories of single cells. Mother machines, on the other hand, provide statistics of individual forward lineages assuming that cells divide symmetrically. Thus our findings suggest potential differences when studying the dynamics of gene expression in different experimental devices, even when cell-cycle positions are explicitly known.Different notions of ergodicity have been discussed in the literature. Rocco et al. [17], for instance, challenged the ergodic hypothesis between an ensemble of individual lineages and their time-averages. They found that the hypothesis only holds true for fast degrading molecules. We found a similar dependence for the ergodicity between snapshots of a growing population and individual histories, which is justified when studying the dynamics of short-lived molecules such as mRNAs. The majority of proteins in E. coli and yeast, however, have half-lives much longer than typical population doubling times [51,52], for which the present ergodic principle accounts for. We demonstrated this dependence using analytical solutions to lineage, history and population distributions of simple stochastic models of gene expression.Although we developed the approach for biochemical reactions, the general principle applies to heritable non-genetic traits, such as plasmid numbers or chromatin states. Specifically, it applies to any trait of interest that does not affect the cell division rate. A counterexample of such a trait is cell size to which the history analysis has recently been applied [43]. Here, we found the that the likelihood to find an ancestral cell with a positively selected trait value is higher than for cells of the present population. Thus testing the ergodic principle could be used to experimentally probe this effect on cell fitness for any trait of interest.
Authors: Andrew S Kennard; Matteo Osella; Avelino Javer; Jacopo Grilli; Philippe Nghe; Sander J Tans; Pietro Cicuta; Marco Cosentino Lagomarsino Journal: Phys Rev E Date: 2016-01-19 Impact factor: 2.529
Authors: John R S Newman; Sina Ghaemmaghami; Jan Ihmels; David K Breslow; Matthew Noble; Joseph L DeRisi; Jonathan S Weissman Journal: Nature Date: 2006-05-14 Impact factor: 49.962