Literature DB >> 33748708

Cell size distribution of lineage data: analytic results and parameter inference.

Chen Jia¹, Abhyudai Singh², Ramon Grima³.

Abstract

Recent advances in single-cell technologies have enabled time-resolved measurements of the cell size over several cell cycles. These data encode information on how cells correct size aberrations so that they do not grow abnormally large or small. Here, we formulate a piecewise deterministic Markov model describing the evolution of the cell size over many generations, for all three cell size homeostasis strategies (timer, sizer, and adder). The model is solved to obtain an analytical expression for the non-Gaussian cell size distribution in a cell lineage; the theory is used to understand how the shape of the distribution is influenced by the parameters controlling the dynamics of the cell cycle and by the choice of cell tracking protocol. The theoretical cell size distribution is found to provide an excellent match to the experimental cell size distribution of E. coli lineage data collected under various growth conditions.

Entities: Chemical Disease Species

Keywords: Cell Biology; In Silico Biology; Systems Biology

Year: 2021 PMID： 33748708 PMCID： PMC7961097 DOI： 10.1016/j.isci.2021.102220

Source DB: PubMed Journal: iScience ISSN： 2589-0042

Introduction

Cell size plays an important role in cellular processes; e.g. changes in cell volume or surface area have profound effects on metabolic flux and nutrient exchange Marshall et al. (2012), and therefore, it stands to reason that cell size should be actively maintained. In order for cells to achieve and maintain some characteristic size (size homeostasis), the amount of growth produced during the cell cycle must be controlled such that, on average, large cells at birth grow less than small ones. There are three popular phenomenological models of cell size control leading to size homeostasis Vargas-Garcia et al. (2018): (i) the timer strategy which implies a constant time between successive divisions, (ii) the sizer strategy which implies cell division upon attainment of a critical size, and (iii) the adder strategy which implies a constant size addition between consecutive generations. The timer strategy is not viable for exponentially growing cells; in this case, size fluctuations diverge as the square root of the number of consecutive cell divisions implying that the timer strategy cannot maintain stable size distributions Jun and Taheri-Araghi (2015). In contrast, if cells grow linearly, a timer strategy is viable as a means to maintain size homeostasis Conlon and Raff (2003). Several studies have proposed that the sizer and adder strategies can explain experimental data in bacteria, yeast, and mammalian cells Campos et al. (2014); Taheri-Araghi et al. (2015); Tanouchi et al. (2015); Soifer et al. (2016); Chandler-Brown et al. (2017); Cadart et al. (2018). Cell size control mechanisms likely vary depending on growth conditions, strains, and species; for instance, in Escherichia coli (E. coli), evidence suggests a sizer mechanism in slow growth conditions and an adder in fast growth conditions Wallden et al. (2016). Cell size statistics can be computed using data from cell lineages or population snapshots. To observe a single cell lineage, at each cell division event, one keeps track of only one of the newborn cells (daughter cells); thus, at an arbitrary time point, only a single cell is observed. To observe population snapshots, one tracks both daughters of each mother cell in the population and thus the evolution of the whole population over time. Recently, mathematical models have shown that cell size statistics calculated using lineage data, e.g. collected using mother machines, can vary considerably from those obtained from population snapshot data, e.g. collected using flow cytometry Thomas (2018); García-García et al. (2019); Totis et al. (2020a). In fact, differences between these two types of measurements are also observable in protein and mRNA count statistics Thomas (2017); Beentjes et al. (2020). Modeling has elucidated various other interesting insights into cell size statistics; however, to our knowledge, no study thus far has attempted to explain the complex shapes of cell size distributions computed from many generations of cell lineage measurements. This is because such high throughput data have become available only recently Tanouchi et al. (2017) and also since the majority of modeling approaches have analytically derived expressions for the first few moments of cell size statistics—these are not enough to characterize the highly non-Gaussian distribution of cell size computed from a cell lineage (see Figure 1A for a typical distribution for an E. coli lineage). These histograms are characterized by three features: a fast increase in the size count for small cells, a slow decay in the size count for moderately large cells, and a fast decay in the size count for large cells. We note that these distributions contain much more information than birth size distributions previously derived Amir (2014) since they reflect the full cell cycle dynamics.

Figure 1

Cell size dynamics and a stochastic model describing it

(A) Single-cell time course data of cell length along a typical cell lineage measured in E. coli at 37°C (upper) and the histogram of cell sizes along all cell lineages (lower). The data shown are published in Tanouchi et al. (2015). When plotting the histogram, we use the data of all cell lineages at that temperature (160 lineages), each of which is recorded every minute over 70 generations. The cell size distribution computed from cell lineage measurements has an uncommon shape that is characterized by three features: a fast increase in the size count for small cells, followed by a slow decay for moderately large cells and a fast decay for large cells.

(B) Schematic illustrating a detailed model of cell size dynamics describing cell growth, multiple effective cell cycle stages, cell size control, and symmetric or asymmetric partitioning at cell division (see inset graph). Each cell can exist in N effective cell cycle stages. The transition rate from one stage to the next at a particular time t is proportional to the αth power of the cell size V(t) with α > 0 being the strength of cell size control and a>0 being the proportionality constant. This guarantees that larger cells at birth divide faster than smaller ones to achieve size homeostasis. At stage N, a mother cell divides into two daughters that are typically different in size via asymmetric cell division. Symmetric division is the special case where daughters are equisized.

Cell size dynamics and a stochastic model describing it (A) Single-cell time course data of cell length along a typical cell lineage measured in E. coli at 37°C (upper) and the histogram of cell sizes along all cell lineages (lower). The data shown are published in Tanouchi et al. (2015). When plotting the histogram, we use the data of all cell lineages at that temperature (160 lineages), each of which is recorded every minute over 70 generations. The cell size distribution computed from cell lineage measurements has an uncommon shape that is characterized by three features: a fast increase in the size count for small cells, followed by a slow decay for moderately large cells and a fast decay for large cells. (B) Schematic illustrating a detailed model of cell size dynamics describing cell growth, multiple effective cell cycle stages, cell size control, and symmetric or asymmetric partitioning at cell division (see inset graph). Each cell can exist in N effective cell cycle stages. The transition rate from one stage to the next at a particular time t is proportional to the αth power of the cell size V(t) with α > 0 being the strength of cell size control and a>0 being the proportionality constant. This guarantees that larger cells at birth divide faster than smaller ones to achieve size homeostasis. At stage N, a mother cell divides into two daughters that are typically different in size via asymmetric cell division. Symmetric division is the special case where daughters are equisized. Here, we develop a complete analytical theory of the cell size distribution in cell lineages. We formulate and solve a piecewise deterministic Markov model describing the evolution of the cell size over many generations, for all three size homeostasis strategies (timer, sizer, and adder). The model takes into account the major features responsible for the underlying dynamics: cell birth following division (including the asymmetric case and partitioning noise), exponential cell growth (including the case of noisy growth rates), variability in the duration of the cell cycle, and the user-defined choice of single-cell tracking protocols when division occurs, e.g., tracking always the smaller daughters, tracking always the larger daughters, or randomly picking one of the two daughters. The analytical solutions for the cell size distribution enable us to understand how the highly non-Gaussian shape of the distribution emerges from the underlying biophysical processes. Finally, by matching the analytical to the experimental cell size and doubling time distributions, we infer the values of various model parameters in E. coli for three different growth conditions.

Results

Model specification

Here, we consider a detailed model of cell size dynamics across the cell cycle which is similar to the model proposed in Nieto et al. (2020a) but has more complicated cell division mechanisms such as asymmetric and stochastic partitioning (see Figure 1B for an illustration). The model is based on a number of assumptions that are closely tied to experimental data. The assumptions are as follows, and the specific meaning of all model parameters is listed in Table 1.

Table 1

Model parameters and their meaning

Parameters	Meaning
g	Exponential growth rate of cell size
N	Number of effective cell cycle stages
a	Proportionality constant for the transition rate between stages
α	Strength of cell size control
A	Mean generalized added size in each generation
p	Mean partition ratio of cell size at division

The size of each cell grows exponentially in each generation with growth rate g. This assumption is supported by experiments in many cell types Godin et al. (2010). Each cell can exist in N effective cell cycle stages, denoted by 1,2, …,N. The transition rate from one stage to the next at a particular time is proportional to the αth power of the cell size at that time, with a>0 being the proportionality constant Nieto et al. (2020a). In other words, the transition rate between stages at time t is equal to aV(t), where α > 0 is the strength of cell size control and V(t) is the cell size at that time. Under this assumption, larger cells at birth have larger transition rates between stages and thus, on average, have shorter cell cycle duration and lesser volume change than smaller ones; in the way size homeostasis is achieved. Model parameters and their meaning Examples of possible biophysical mechanisms that can explain the power law form of the transition rate have been discussed in Nieto et al. (2020a). For instance, recent studies have suggested the accumulation of certain proteins up to a critical threshold as a possible mechanism in bacterial and fission yeast cell division Ghusinga et al. (2016); Sekar et al. (2018); Si et al. (2019); Patterson et al. (2019). Suppose that the concentration c of this “division protein” remains constant as the cell grows Weart and Levin (2003). Then, the number of molecules n of this protein is proportional to the cell volume V, i.e., n = cV. If this division protein makes polymers of α subunits, then the production rate of polymers will be proportional to n, which is proportional to V. Biologically, the multiple cell cycle stages considered here may be interpreted as different levels of the division protein polymer and cell division occurs when a certain number of polymers is reached. Under this mechanism, the rate of moving from one stage to the next is proportional to V. We emphasize that while this power law is compatible with certain biophysical mechanisms, it could also simply be understood as a phenomenological means to model size homeostasis. Let V and V denote the cell sizes at birth and at division in a particular generation, respectively. Then, the increment in the αth power of the cell size across the cell cycle, , has an Erlang distribution with shape parameter N and mean A = Nαg/a (see Section 1 in transparent methods for the proof). The quantity Δ will be referred to as the generalized added size in what follows. In our model, the noise in the generalized added size, characterized by the coefficient of variation squared, is equal to 1/N. As N increases, the generalized added size, as well as V and V themselves, has smaller fluctuations. Since the cell cycle duration is given by T=(1/g)log(V/V), an increasing N also results in lesser fluctuations in the doubling time. Hence, our model allows the investigation of the influence of cell cycle duration variability on cell size dynamics. We next focus on three crucial special cases. When α→0, the transition rate between stages is a constant and thus the doubling time has an Erlang distribution that is independent of the birth size; this corresponds to the timer strategy. When α = 1, the added size V−V has an Erlang distribution that is independent of the birth size; this corresponds to the adder strategy. When α→∞, the αth power of the cell size at division, , has an Erlang distribution that is independent of the birth size; this corresponds to the sizer strategy. Intermediate strategies are naturally obtained for intermediate values of α; timer-like control is obtained when 0 < α < 1 and sizer-like control is obtained when 1<α<∞ Nieto et al. (2020a). Cell division occurs when the cell transitions from effective stage N to the next stage 1. At division, most previous papers assume that the mother cell divides into two daughters that are exactly the same in size via symmetric partitioning Amir (2014); Vargas-García and Singh (2018); Nieto-Acuna et al. (2019); Totis et al. (2020b); Nieto et al. (2020b); however, asymmetric cell division is common in biology. For instance, Saccharomyces cerevisiae divides asymmetrically into two daughters with different sizes. Escherichia coli may also undergo asymmetric division with old daughters receiving fewer gene products than new daughters Shi et al. (2020). Here, we follow the methodology that we devised in Jia and Grima (2020) and extend previous models by considering asymmetric partitioning at cell division: the mother cell divides into two daughters with different sizes. If the partitioning of the cell size is symmetric, we track one of the two daughters randomly after division Brenner et al. (2015); Robert et al. (2018); if the partitioning is asymmetric, we either track the smaller daughter or track the larger daughter after division Zopf et al. (2013); Crane et al. (2014). Hence, our model corresponds to cell lineage measurements performed using a mother machine. Let V and denote the cell sizes at division and just after division, respectively. If the partitioning is deterministic, then we have , where 0 < p < 1 is a constant with p = 1/2 corresponding to the case of symmetric division, p < 1/2 corresponding to smaller daughter tracking, and p > 1/2 corresponding to larger daughter tracking. However, in naturally occurring systems, the partitioning is appreciably stochastic. In this case, we assume that the partition ratio has a beta distribution with mean p Nieto-Acuña et al. (2020), whose probability density function is given as follows:where B is the beta function, q = 1−p, and ν > 0 is referred to as the sample size parameter. The reason behind this assumption is that the partition ratio should be a random number between 0 and 1, which is an important property of the beta distribution Nieto-Acuña et al. (2020). Then, the change in the logarithm of the cell size at division, , has the probability density function μ(w) = e−f(e−), which can be written more explicitly as follows: When ν→∞, the variance of the beta distribution tends to zero and thus stochastic partitioning reduces to deterministic partitioning, i.e., f(z) = δ(z−p) and μ(w) = δ(w + log p). We next describe our stochastic model of cell size dynamics across the cell cycle. The microstate of the cell can be represented by an ordered pair (k,y), where k is the cell cycle stage which is a discrete variable and y is the cell size which is a continuous variable. Let denote the probability density function of the cell size when the cell is in stage k. Note that the cell undergoes deterministic exponential growth in each stage and the system can hop between successive stages stochastically. Hence, the evolution of the cell size dynamics can be described by a piecewise deterministic Markov process whose Kolmogorov forward equation is given as follows:where f(z) is the function defined in Equation (1). Similar hybrid models have, for example, been used to describe demographic noise in ecosystems Realpe-Gomez et al. (2012) and single-cell stochastic gene expression Lin and Buchler (2018); Jia et al. (2019). In the first equation above, the first term on the right-hand side represents the exponential growth of the cell size with growth rate g, and the second and third terms represent the transition between stages whose transition rate is proportional to the α th power of the cell size y. In the second equation, the second term corresponds to the partitioning of the cell size at division. To solve Equation (3), the key step is to consider the dynamics of the logarithmic cell size, x = log y, rather than the original cell size y. This is because the dynamic equation for the former is easier to solve. Let p(x) denote the probability density function of the logarithmic cell size when the cell is in stage k. Since the probability density functions of the original and logarithmic cell sizes are related by , it follows from Equation (3) that the evolution of the logarithmic cell size is governed by the following master equation:where μ(w) is the function defined in Equation (2).

Analytical distribution of the cell size along a cell lineage under deterministic partitioning

Recall that any probability distribution is fully determined by its characteristic function. Let denote the probability density function of the logarithmic cell size. To obtain the analytical distribution of the cell size along a cell lineage, we introduce the characteristic function , which is nothing but the inverse Fourier transform of p(x). For simplicity, we first focus on deterministic partitioning at cell division, i.e., ν→∞. Despite the biological complexity described by our model, the characteristic function can still be solved exactly in steady-state conditions (see Section 2 in transparent methods for the proof):where C = k!/l!(k-l)! is the combinatorial number, a(u) = (1 + Au/N)− is a function of u, andis a normalization constant. Since the Fourier transform and the inverse Fourier transform are inverses of each other, taking the Fourier transform of the characteristic function gives the steady-state probability density function p(x) of the logarithmic cell size. Finally, the probability density function of the original cell size y = e along a cell lineage is given as follows: The analytical solution is ideal since it allows a fast exploration of large swathes of parameter space without performing stochastic simulations. To gain deeper insights into the cell size distribution, we next consider the limiting case of N→∞. In this case, the generalized added size Δ, as well as the cell cycle duration T, becomes deterministic, and thus, the system does not involve any stochasticity. As N→∞, we have a(u) = e−, and thus, the characteristic function can be simplified to a large extent as follows (see Section 2 in transparent methods for the proof):whereare two constants. Taking the Fourier transform of G(λ) shows that the logarithmic cell size has the uniform distributionand thus the original cell size y = e has the following distribution:where I(x) is the indicator function which takes the value of 1 when x∈B and the value of 0 otherwise. This indicates that when cell cycle duration variability is small, the cell size has a distribution that is concentrated on the finite interval , where and are the typical cell sizes at birth and at division, respectively. Figures 2A–2C illustrate the distribution of the original cell size as a function of the parameters N, α, and p. It can be seen that as cell cycle duration variability become smaller (N increases), the analytical distribution given in Equation (6) converges to the limit distribution given in Equation (9). The cell size distribution has a regular shape for small N. As N increases, the shape of the distribution becomes more complicated. In particular, the distribution has three apparent sections: an exponential increase for small sizes, a power law decay for moderate sizes, and an exponential decay for large sizes. As N→∞, the dynamics becomes deterministic and the distribution has a compact support, characterized by infinite slopes of the two shoulders. In addition, we find that the influence of α on the cell size distribution is similar to the influence of N. Finally, increasing p gives rise to a distribution that is more symmetric and more concentrated.

Figure 2

Influence of model parameters on the cell size distribution

(A) Cell size distribution as N increases. The red curve shows the analytical distribution given in Equation (6), and the red circles show the distribution obtained using the stochastic simulation algorithm proposed in Nieto et al. (2020b). The parameters are chosen as α = 2, p = 0.5.

(B) Cell size distribution as α varies. The parameters are chosen as N = 20, p = 0.5.

(D) Comparison of the cell size distributions for the model with stochastic partitioning (blue curve and red circles) and the model with deterministic partitioning (solid gray region). The blue curve shows the approximate distribution given in Equation (20), and the red circles show the distribution obtained from simulations.

(E) Comparison of the cell size distributions for the model with stochastic growth rate (blue curve and red circles) and the model with deterministic growth rate (solid gray region). In (D) and (E), the parameters are chosen as N = 30, α = 3, p = 0.5. In (A)-(E), the growth rate is chosen as g = 0.02 and the parameters A and a are chosen so that for the model with deterministic growth rate and deterministic partitioning. In (D), the standard deviation of the partition ratio is 10% of the mean; here, we assume that the partition ratios for different generations are i.i.d. normally distributed random variables. In (E), the standard deviation of the growth rate is 10% (red circles) or 50% (blue curve) of the mean; here, we assume that the growth rates for different generations are i.i.d. normally distributed random variables.

Influence of model parameters on the cell size distribution (A) Cell size distribution as N increases. The red curve shows the analytical distribution given in Equation (6), and the red circles show the distribution obtained using the stochastic simulation algorithm proposed in Nieto et al. (2020b). The parameters are chosen as α = 2, p = 0.5. (B) Cell size distribution as α varies. The parameters are chosen as N = 20, p = 0.5. (C) Cell size distribution as p varies. The parameters are chosen as N = 20, α = 2. (D) Comparison of the cell size distributions for the model with stochastic partitioning (blue curve and red circles) and the model with deterministic partitioning (solid gray region). The blue curve shows the approximate distribution given in Equation (20), and the red circles show the distribution obtained from simulations. (E) Comparison of the cell size distributions for the model with stochastic growth rate (blue curve and red circles) and the model with deterministic growth rate (solid gray region). In (D) and (E), the parameters are chosen as N = 30, α = 3, p = 0.5. In (A)-(E), the growth rate is chosen as g = 0.02 and the parameters A and a are chosen so that for the model with deterministic growth rate and deterministic partitioning. In (D), the standard deviation of the partition ratio is 10% of the mean; here, we assume that the partition ratios for different generations are i.i.d. normally distributed random variables. In (E), the standard deviation of the growth rate is 10% (red circles) or 50% (blue curve) of the mean; here, we assume that the growth rates for different generations are i.i.d. normally distributed random variables.

Moments, noise, and skewness of the cell size distribution

Our analytic results can also be used to derive explicit expressions for several other quantities of interest. Recall that the probability density function p(x) for the logarithmic cell size and the probability density function for the original cell size are related by Equation (6). For any real number λ, the λth moment of the original cell size is given as follows:where F(λ) is the moment generating function of p(x). This shows that the λth moment of the original cell size is exactly the moment generating function of the logarithmic cell size taken value at λ. Since the moment generating function F(λ) and the characteristic function G(λ) are related by G(λ) = F(iλ), replacing the variable iλ in Equation (5) by λ yields the moment generating function. Hence, the λth moment of the original cell size is given as follows: In single-cell experiments, the noise in the cell size, characterized by the coefficient of variation squared, is given as follows:where μ is the mean and σ2 is the variance. Figures 3A and 3B illustrate the noise η as a function of N, α, and p. Clearly, the fluctuations in the cell size become smaller with the increase of all the three parameters (see also Figure 2). This implies that small cell cycle duration variability and sizer-like strategy can lead to a more accurate control of the cell size.

Figure 3

Noise and skewness of the cell size distribution

(A) Heatmap of the noise η versus α and N.

(B) Heatmap of the noise η versus α and p.

(D) Heatmap of the skewness γ versus α and p. The parameters are chosen as p = 0.5 in (A) and (C) and N = 20 in (B) and (D). In (A)-(D), the parameter A is chosen so that the mean cell size .

Noise and skewness of the cell size distribution (A) Heatmap of the noise η versus α and N. (B) Heatmap of the noise η versus α and p. (C) Heatmap of the skewness γ versus α and N. (D) Heatmap of the skewness γ versus α and p. The parameters are chosen as p = 0.5 in (A) and (C) and N = 20 in (B) and (D). In (A)-(D), the parameter A is chosen so that the mean cell size . A special case occurs when the cell cycle duration variability is very small, i.e., . In this case, replacing the variable iλ in the characteristic function Equation (7) by λ yields the following equation: Thus, the noise in the cell size is given as follows:which is a decreasing function of p. Note that when N is small, the noise η is a function of both α and p (Figure 3B). However, when N is large, the noise only depends on p. It is easy to see that the noise in the cell size tends to infinity as p→0 and tends to zero as p→1. For the case of symmetric division (p = 0.5), the noise in the cell size is given by η ≈ 0.04, which shows that the standard deviation of the cell size is roughly 20% of the mean. Recall that the skewness of the cell size distribution is defined as follows: Figures 3C and 3D illustrate the skewness γ as a function of N, α, and p, from which we can see that the skewness increases with the decrease of all the three parameters. This implies that large cell cycle duration variability, timer-like division strategy, and tracking the smaller daughter at division lead to larger skewness of the cell size distribution. Moreover, we find that the skewness is always positive, which means that the cell size distribution is always right skewed. When , it follows from Equation (12) that the skewness only depends on p and is given as follows:which is also a decreasing function of p.

Analytical distribution of the cell cycle duration

In our model, the distribution of the doubling time can also be derived analytically in steady-state conditions. Actually, given that the birth size V is known, the conditional probability density of the cell cycle duration T has been obtained in Nieto et al. (2020a) as follows: Here, we compute the unconditional distribution of the cell cycle duration. To this end, we find that the Laplace transform of is given by the following equation (see Section 3 in transparent methods for the proof): Taking the inverse Laplace transform gives the probability density function of . Finally, the distribution of the cell cycle duration T is given as follows: A special case occurs when α is large (strong cell size control) or when p is small (smaller daughter tracking). Under the large α or small p approximation, the term p is negligible for n ≥ 2 and it suffices to keep only the first term in the infinite product given in Equation (14). In this case, the inverse Laplace transform has an explicit expression and the birth size distribution is given as follows: Inserting this equation into Equation (15) yields the doubling time distribution: We emphasize that in the special case of N = 1 and p = 0.5, our model reduces to the model in Osella et al. (2014) and the above two equations coincide with the results in that paper. Recent experiments Golubev (2016); Yates et al. (2017); Chao et al. (2019); Perez-Carrasco et al. (2020); Gavagnin et al. (2020) have shown that the cell cycle durations in various cell types are all well fitted by a gamma distribution. Therefore, it is natural to ask whether the doubling time in our model shares the same property. To see this, we illustrate the doubling time distribution and its approximation by the gamma distribution as N and α vary (Figure 4). It can be seen that the true distribution is in good agreement with its gamma approximation when α is small (Figures 4A and 4B). This is because a small α implies a timer-like size control, which leads to an approximately Erlang distributed doubling time due to the effect of multiple cell cycle stages and constant transition rates between them. When α is large, there are some slight differences between them for small N (Figure 4C); compared with the gamma approximation, the true distribution is more symmetric around its mean. However, when N is large, they are very close to each other and both well fitted by a normal distribution (any gamma distribution converges to the normal distribution as the shape parameter tends to infinity, see Figure 4D).

Figure 4

Distribution of the cell cycle duration and its approximation by the gamma distribution

We use the information of the sample mean and sample variance of the true distribution to determine the two parameters involved in the gamma approximation.

(A) Large cell cycle duration variability and small size control strength.

(B) Small cell cycle duration variability and small size control strength.

(D) Small cell cycle duration variability and large size control strength. In (A)-(D), the blue curve represents the analytical distribution given in Equation (15), the red circles represent the distribution obtained from simulations, and the gray region represents the gamma approximation. The parameters are chosen as p = 0.5, g = 0.02 and A and a are determined so that .

Distribution of the cell cycle duration and its approximation by the gamma distribution We use the information of the sample mean and sample variance of the true distribution to determine the two parameters involved in the gamma approximation. (A) Large cell cycle duration variability and small size control strength. (B) Small cell cycle duration variability and small size control strength. (C) Large cell cycle duration variability and large size control strength. (D) Small cell cycle duration variability and large size control strength. In (A)-(D), the blue curve represents the analytical distribution given in Equation (15), the red circles represent the distribution obtained from simulations, and the gray region represents the gamma approximation. The parameters are chosen as p = 0.5, g = 0.02 and A and a are determined so that .

Correlation between sizes at birth and sizes at division

The birth size distribution derived above can be applied to study the correlation between birth and division sizes. Let V and V be the birth and division sizes in a particular generation, respectively, and let and be the birth and division sizes in the next generation, respectively. Since the generalized added size is Erlang distributed and , it is easy to obtain from Equation (14) that (see Section 4 in transparent methods for the proof)where ρ(X,Y) denotes Pearson's correlation coefficient between random variables X and Y. This characterizes the correlation between sizes at birth and sizes at division, as well as the correlation between the birth/division sizes for mother and daughter cells. In particular, for the adder strategy (α = 1), we have the following: This implies that the size correlation for the adder only depends on p and is independent of other parameters. For the case of symmetric division (p = 0.5), this is consistent with the result obtained in Amir (2014), where the correlation coefficient between birth and division sizes is found to be approximately 0.5. In the presence of noise in partitioning, the formula for the correlation coefficient should be modified as follows (see Section 4 in transparent methods for the proof):where In this case, and are generally lower than due to fluctuations in partitioning.

Distribution of the cell size along a cell lineage under stochastic partitioning and stochastic growth rate

Thus far, the analytical distribution of the cell size is obtained when the partitioning at division is deterministic. In the presence of noise in partitioning, it is very difficult to obtain the explicit expression of the cell size distribution. Fortunately, in naturally occurring systems, the stochasticity in partitioning is often very small. For example, recent cell lineage data Tanouchi et al. (2015) suggested that the coefficient of variation of the partition ratio in E. coli is about 7%–9%. When noise in partitioning is small, we obtain an approximate expression for the cell size distribution, whose moment generating function is given by the following equation (see Section 5 in transparent methods for the proof):where K is a normalization constant and To see the effect of stochastic partitioning, we illustrate the cell size distributions under deterministic and stochastic partitioning in Figure 2D with the standard deviation of the partition ratio z being 10% of the mean for the latter. Clearly, the approximate solution given in Equation (20) matches the simulation results very well. In addition, it can be seen that noise in partitioning gives rise to larger fluctuations in the cell size, characterized by the smaller slope of the left shoulder of the cell size distribution. In addition to noise in partitioning, there is another important source of stochasticity, i.e., noise in the growth rate g. In many biological systems, such noise is also very small. For example, recent cell lineage data Tanouchi et al. (2015) suggested that the coefficient of variation of the growth rate g in E. coli is about 7%–8%. To see the influence of noise in the growth rate, we illustrate the cell size distributions under deterministic and stochastic growth rates in Figure 2E with the standard deviation of g being 10% or 50% of the mean for the latter (here we assume that the growth rates for different generations are i.i.d. normally distributed random variables). Interestingly, we find that noise in the growth rate has almost no effect on the cell size distribution, even when the noise is very large; this is in sharp contrast to noise in partitioning which has an apparent effect on the cell size distribution.

Random tracking protocol can lead to complex multimodal cell size distributions

If cell division is asymmetric, the two daughters are different in size and thus far we have assumed that the smaller/larger daughter (such as the bud/mother cell in budding yeast) is tracked after division Zopf et al. (2013); Crane et al. (2014). We have seen that whether the smaller or the larger daughter is tracked, the cell size distribution along a cell lineage is always unimodal and right skewed and larger daughter tracking yields lesser fluctuations in size than smaller daughter tracking. Next, we consider another tracking protocol, namely where we track one of the two daughters randomly with probability 1/2 after division Tanouchi et al. (2015); Brenner et al. (2015); Robert et al. (2018). Clearly, the three types of tracking protocols (tracking a random daughter, the smaller daughter, or the larger daughter) are exactly the same for symmetric cell division; however, they are remarkably different for asymmetric cell division. For the random tracking protocol, the probability density function of the partition ratio is given by the following equation (here the noise in partitioning is ignored):where 0 < p ≤ 1/2 is the ratio of the size of the smaller daughter to the size of the mother cell and q = 1−p. Figure 5 illustrates the simulated cell size distribution under the random tracking protocol. Interestingly, we find that the shape of the distribution undergoes two stochastic bifurcations as cell cycle duration variability becomes smaller (N increases). When N is small, the cell size distribution is in general unimodal (Figure 5A), as in the case of smaller/larger daughter tracking. When N is moderate, random tracking is capable of producing a bimodal cell size distribution (Figure 5B), where the two peaks can be attributed intuitively to the subpopulations of smaller and larger daughters, respectively. Surprisingly, when N is large, we find that random tracking can give rise to a complex cell size distribution that displays multiple peaks (Figure 5B), two major peaks and some minor peaks. Increasing cell cycle duration variability (decreasing N) smoothens the cell size distribution, by first removing the smaller peaks and then merging the two major peaks into one.

Figure 5

Cell size distribution for asymmetric cell division under the random tracking protocol

After division, one of the two daughters is randomly tracked with probability 1/2.

(A) Typical stochastic trajectory of the cell size (upper) and the cell size distribution (lower) in the case of large cell cycle duration variability (N = 2).

(B) Same as (A) but for moderate cell cycle duration variability (N = 20).

(C) Same as (A) but for small cell cycle duration variability (N = 200). In (A)-(C), the colored curve and the gray region show the cell size distributions obtained from two independently repeated stochastic simulations. The parameters are chosen as p = 0.3, α = 2, A = 25.

Cell size distribution for asymmetric cell division under the random tracking protocol After division, one of the two daughters is randomly tracked with probability 1/2. (A) Typical stochastic trajectory of the cell size (upper) and the cell size distribution (lower) in the case of large cell cycle duration variability (N = 2). (B) Same as (A) but for moderate cell cycle duration variability (N = 20). (C) Same as (A) but for small cell cycle duration variability (N = 200). In (A)-(C), the colored curve and the gray region show the cell size distributions obtained from two independently repeated stochastic simulations. The parameters are chosen as p = 0.3, α = 2, A = 25.

Parameter inference using synthetic data

Recent breakthroughs in microfluidic devices have made it possible to monitor the single-cell volume dynamics along a cell lineage over many generations. Given such cell lineage data, an important question is whether all the parameters involved in our model can be inferred accurately. Parameter inference is crucial since it provides insights on the strength of cell size control, as well as cell cycle duration variability in various cell types. The steps of our parameter estimation method are described as follows. First, the data of cell sizes at birth and at division in each generation, V and V, can be easily extracted from the cell lineage data. Since is Erlang distributed with shape parameter N and mean A, once the parameter α is determined, both the parameters N and A can be determined by fitting the data of to an Erlang distribution. In addition, since (assuming deterministic partitioning), once the parameter α is determined, the parameter p can also be inferred from the correlation coefficient between V and V. For clarity, let N(α), A(α), and p(α) denote the optimal estimates of N, A, and p given the value of α, respectively. They can be inferred from the data of birth and division sizes as follows: If the partitioning at division is stochastic, then Equation (23) should be replaced by Equation (19). Next, the parameter α is determined by an optimal fit of the experimental to the theoretical cell size distribution using the least square criterion. Specifically, we determine α by solving the following optimization problem:where p(x;α,p,N,A) is the theoretical cell size distribution given the parameters α,p,N,A, is the sample cell size distribution obtained from experiments, x are some reference points, and M is the number of bins chosen. Once α is estimated, the values of p, N, and A, are automatically determined. The reason why we do not estimate p directly as the mean of the partition ratio is that the cell size distribution is sensitive to the value of p. A comparatively small error in p will result in a comparatively large change in the cell size distribution. Since the cell size distribution is a function of A = Nαg/a, which depends on the ratio of g and a, it is impossible to infer the growth rate g from the cell size distribution. Finally, the growth rate g is determined by an optimal fit of the experimental to the theoretical/simulated doubling time distribution using the least square criterion. Once g is inferred, the last parameter a can be determined from the estimated α, N, and A as a = Nαg/A. To verify the effectiveness of our method, we use our model to generating synthetic data of cell size dynamics. To make the time course data better mimic real biological processes, we add some noise to both the growth rate g and the partition ratio z. We then perform parameter inference by fitting the noisy data to two models: the model with deterministic partitioning (model I) and the model with stochastic partitioning (model II). The parameters input to the synthetic data and the parameters estimated using the above method are given in Table 2, where three sets of input parameters are chosen to cover large swathes of parameter space and to include three types of control strategies (timer-like, adder, and sizer-like). It can be seen that fitting the noisy data to both models leads to an accurate estimation of p and g and a relatively accurate estimation of N. However, fitting the data to model I gives rise to a systematic underestimation of α and A and an overestimation of a due to stochasticity in partitioning. Fitting the data to model II can remarkably improve the accuracy of estimation of these three parameters.

Table 2

Parameter inference using synthesis data

	α	p	N	A	g	a
Input parameters	0.5	0.4	30	0.79	0.01	0.191
Estimated parameters using model I	0.44 ± 0.02	0.40 ± 0.0002	28.70 ± 0.82	0.65 ± 0.06	0.0100 ± 0.0001	0.195 ± 0.015
Estimated parameters using model II	0.50 ± 0.04	0.40 ± 0.0003	28.10 ± 1.52	0.79 ± 0.11	0.0100 ± 0.0002	0.179 ± 0.016
	α	p	N	A	g	α
Input parameters	1	0.5	20	2.08	0.02	0.192
Estimated parameters using model I	0.77 ± 0.06	0.50 ± 0.0003	21.15 ± 0.88	1.26 ± 0.18	0.0200 ± 0.0002	0.263 ± 0.026
Estimated parameters using model II	0.99 ± 0.08	0.50 ± 0.0003	19.20 ± 1.14	2.04 ± 0.35	0.0200 ± 0.0003	0.187 ± 0.027
	α	p	N	A	g	α
Input parameters	2	0.6	10	9.39	0.03	0.064
Estimated parameters using model I	1.29 ± 0.04	0.60 ± 0.0005	12.90 ± 0.32	2.73 ± 0.20	0.0301 ± 0.0003	0.183 ± 0.010
Estimated parameters using model II	1.92 ± 0.04	0.60 ± 0.0004	10.40 ± 0.52	8.29 ± 0.57	0.0300 ± 0.0004	0.072 ± 0.004

The cell lineage data are generated from the model with stochastic partitioning, where some noise is added to the growth rate g and the partition ratio z with the coefficients of variation of both parameters being chosen as 7% (here we assume that the growth rate is constant in each generation and the growth rates/partition ratios across different generations are i.i.d. normally distributed random variables). For each set of model parameters, we generate synthetic data simulating 50 cell lineages. For each cell lineage, the model parameters are estimated by fitting the synthetic data to both the model with deterministic partitioning at cell division (model I) and the model with stochastic partitioning (model II). The value in each cell shows the mean and standard deviation of the estimated parameter computed over 50 cell lineages.

Parameter inference using synthesis data The cell lineage data are generated from the model with stochastic partitioning, where some noise is added to the growth rate g and the partition ratio z with the coefficients of variation of both parameters being chosen as 7% (here we assume that the growth rate is constant in each generation and the growth rates/partition ratios across different generations are i.i.d. normally distributed random variables). For each set of model parameters, we generate synthetic data simulating 50 cell lineages. For each cell lineage, the model parameters are estimated by fitting the synthetic data to both the model with deterministic partitioning at cell division (model I) and the model with stochastic partitioning (model II). The value in each cell shows the mean and standard deviation of the estimated parameter computed over 50 cell lineages.

Experimental validation of the theory

To test our theory, we apply it to study the single-cell time course data of the cell size collected for E. coli in Tanouchi et al. (2017). In this data set, the time course data of the cell length were recorded every minute for 279 cell lineages over 70 generations using a mother machine microfluidic device under three different growth conditions (25°C, 27°C, and 37°C). At the three temperatures, there are a total of 65, 54, and 160 cell lineages measured, respectively. Based on such data, it is possible to estimate all the parameters involved in our model at each temperature by fitting the data to both model I and model II. The estimated parameters and the estimation errors are listed in Table 3 and are depicted by the box plots in Figure 6D.

Table 3

Model parameters estimated using E. coli cell lineage data at three different temperatures

Model I	25°C	27°C	37°C
α	0.690 ± 0.0332	0.751 ± 0.0374	0.868 ± 0.1037
p	0.442 ± 0.0020	0.451 ± 0.0016	0.448 ± 0.0039
N	13.761 ± 0.6831	33.147 ± 1.3734	36.768 ± 2.3979
A	1.1357 ± 0.0971	1.1508 ± 0.0995	1.7673 ± 0.4553
g	0.0123 ± 0.0001	0.0153 ± 0.0001	0.0249 ± 0.0002
a	0.1120 ± 0.0176	0.3418 ± 0.0223	0.4713 ± 0.0449
Model II	25°C	27°C	37°C
α	0.833 ± 0.0315	1.010 ± 0.0402	1.151 ± 0.0730
p	0.444 ± 0.0022	0.450 ± 0.0023	0.446 ± 0.0034
N	15.222 ± 0.6791	35.317 ± 1.4840	38.222 ± 1.6418
A	1.5820 ± 0.1152	1.9551 ± 0.1511	3.2246 ± 0.4863
g	0.0124 ± 0.0002	0.0153 ± 0.0002	0.0251 ± 0.0003
a	0.1074 ± 0.0074	0.2883 ± 0.0196	0.3561 ± 0.0430
ν	156.75 ± 5.0945	206.97 ± 7.4727	229.25 ± 9.8141

The parameters p, α, N, A are determined by fitting the experimental to the theoretical cell size distribution. The parameters g and a are determined by fitting the experimental to the theoretical doubling time distribution. Two theoretical models are used: the model with deterministic partitioning (model I) and the model with stochastic partitioning (model II). For model II, once the parameter p is estimated, the sample size parameter ν in Equation (1) can be inferred by fitting the partition ratio data to a beta distribution. The estimation error for each parameter was computed using bootstrap. Specifically, we performed parameter inference 50 times; for each estimation, the theoretical model was fitted to the data of 30 randomly selected cell lineages. The estimation error was then calculated as the standard deviation over 50 repeated samplings. Here, the bootstrap technique is used because each cell lineage was only measured for 70 generations, and the data of a single lineage are insufficient to estimate all the parameters accurately.

Figure 6

Fitting the experimental cell size and doubling time distributions to theory

(A) Experimental cell size distributions (blue bars) at the three temperatures and their optimal fitting to model II (red curve) where partitioning is stochastic. Here, the theoretical distribution is computed using Equation (20).

(B) Experimental doubling time distributions (blue bars) at the three temperatures and their optimal fitting to model II (red curve), where the theoretical distribution is computed using stochastic simulations.

(C) Five typical trajectories of cell size dynamics for cells at 37°C. The red dots show the cell sizes just after division, and the green dots show the minimal cell sizes in each generation. For over 60% generations, there is a small abrupt decline in the cell size after division, shown as the sharp drop from a red dot to a green dot.

(D) Means and standard deviations of all model parameters estimated by fitting the data to model II. The parameter values can be found in Table 3.

Model parameters estimated using E. coli cell lineage data at three different temperatures The parameters p, α, N, A are determined by fitting the experimental to the theoretical cell size distribution. The parameters g and a are determined by fitting the experimental to the theoretical doubling time distribution. Two theoretical models are used: the model with deterministic partitioning (model I) and the model with stochastic partitioning (model II). For model II, once the parameter p is estimated, the sample size parameter ν in Equation (1) can be inferred by fitting the partition ratio data to a beta distribution. The estimation error for each parameter was computed using bootstrap. Specifically, we performed parameter inference 50 times; for each estimation, the theoretical model was fitted to the data of 30 randomly selected cell lineages. The estimation error was then calculated as the standard deviation over 50 repeated samplings. Here, the bootstrap technique is used because each cell lineage was only measured for 70 generations, and the data of a single lineage are insufficient to estimate all the parameters accurately. Fitting the experimental cell size and doubling time distributions to theory (A) Experimental cell size distributions (blue bars) at the three temperatures and their optimal fitting to model II (red curve) where partitioning is stochastic. Here, the theoretical distribution is computed using Equation (20). (B) Experimental doubling time distributions (blue bars) at the three temperatures and their optimal fitting to model II (red curve), where the theoretical distribution is computed using stochastic simulations. (C) Five typical trajectories of cell size dynamics for cells at 37°C. The red dots show the cell sizes just after division, and the green dots show the minimal cell sizes in each generation. For over 60% generations, there is a small abrupt decline in the cell size after division, shown as the sharp drop from a red dot to a green dot. (D) Means and standard deviations of all model parameters estimated by fitting the data to model II. The parameter values can be found in Table 3. From the estimated parameters, it can be seen that both models lead to similar estimation of p, N, and g. However, the introduction of partitioning noise into the model leads to higher estimation of α and A and lower estimation of a; this is consistent with our observation for synthesis data. For model I, the strength of cell size control, α, is estimated to be 0.7–0.9 for the three growth conditions, which are all lower than 1. Incorporating partitioning noise into the model gives rise to a higher estimate of α; for model II, α is estimated to be 0.8–1.2 for the three temperatures, implying that the size control strategy in E. coli is close to the adder. Moreover, higher temperature leads to a higher strength α than lower temperature. Previous papers Tanouchi et al. (2015); Modi et al. (2017); Thomas (2018) have proposed an alternative approach to determining the regulation strength. This approach assumes that the birth size V and the division size V in each generation are related by the following:where 0 ≤ β ≤ 2 and γ ≥ 0 are two constants and ε is Gaussian white noise. Under this assumption, β characterizes the strength of cell size control with β = 0, β = 1, and β = 2 corresponding to the sizer, adder, and timer strategies, respectively. Using the data of birth and division sizes for different generations, the parameter β can be easily determined as the slope of the regression line of V on V. Based on the lineage data, β is estimated to be 1.08 for cells at 25°C, 1.02 for cells at 27°C, and 0.93 for cells at 37°C (Figure S1), all of which are close to 1. This implies a size control strategy close to the adder, which is consistent with the predictions of our model. Compared with this method, our distribution matching method seems to be more reliable since the linear relationships between V and V are actually very weak with numerous outliers and a low R2 around 0.3 (Figure S1). Figures 6A and 6B illustrate the experimental cell size and cell cycle duration distributions using the measurements of all cell lineages versus the theoretical distributions using the estimated parameters. Here, the theoretical distributions are plotted based on model II, but both models give rise to similar distribution shapes. Interestingly, both experimental distributions at all the three temperatures coincide perfectly with our model, which implies that our model can indeed reproduce the cell size dynamics in E. coli very well. In addition, it supports the main assumption of choosing the rate of moving from one effective cell cycle stage to the next to be a power law of the cell size. Based on the lineage data, the correlation coefficients between the birth and division sizes, V and V, for the three temperatures are 0.533 ± 0.099, 0.503 ± 0.121, and 0.431 ± 0.139, respectively, and the correlation coefficients between mother and daughter birth sizes, V and , are 0.475 ± 0.107, 0.445 ± 0.134, and 0.399 ± 0.146, respectively. Here, the errors represent the standard deviations computed over all cell lineages at that temperature. Stochastic simulations based on model II with the parameters estimated in Table 3 show that the correlation coefficients between V and V for the three growth conditions are 0.564, 0.546, and 0.495, respectively, and the correlation coefficients between V and are 0.506, 0.447, and 0.396, respectively. Clearly, our estimated parameters capture the size correlations between mother and daughter cells very well. To further evaluate the performance of our model, we also examine the doubling time correlations between mother and daughter cells. Experimentally, the correlation coefficients between two successive cell cycle durations for the three growth conditions are −0.109 ± 0.109, −0.117 ± 0.114, and −0.152 ± 0.111, respectively, whereas simulations based on model II with the estimated parameters predict the correlation coefficients to be −0.185, −0.160, and −0.173, respectively. Both theory and experiments reveal a negative correlation between successive doubling times, and their values do not differ too much. This again verifies the effectiveness of our model and our parameter inference approach. Typically, a mother cell divides into two daughters that are different in size due to stochasticity in partitioning and possible asymmetric cell division Nieto-Acuña et al. (2020). Note that the data of cell sizes just before division and just after division, V and , can be easily extracted from the cell lineage data and thus the parameter p can be estimated as the mean partition ratio . An interesting characteristic implied by the E. coli data is that at cell division, the smaller daughter is always tracked with the mean partition ratio p being estimated to be about 0.46 for all the three growth conditions (0.459 ± 0.040 for cells at 25°C, 0.461 ± 0.039 for cells at 27°C, and 0.464 ± 0.034 for cells at 37°C). Recall that in our estimation procedure, we do not use the information of V and to determine the parameter p; rather, we infer p by an optimal fit of the experimental to the theoretical cell size distribution. The estimate of p in Table 3 is 0.44–0.45 for the three temperatures, which is slightly lower than the value of 0.46 estimated using V and . The reason of this discrepancy is that after cell division, over 60% generations have a small but sharp decline in the cell size (see Figure 6C for the cell size dynamics of five typical cell lineages with the red dots being the cell sizes just after division and the green dots being the minimal cell sizes in each generation; the small sharp drop in the cell size after division is shown as the transition from a red dot to a green dot). Therefore, the realistic effective partition ratio should be computed using the green dots rather than the red dots. This explains why the parameter p estimated in Table 3 is lower than the mean partition ratio . A possible explanation of these abrupt declines is that the green dots are actually the true beginning of a new cell cycle and the red dots correspond to an intermediate time point during cell division. In our model, division is modeled as an instantaneous transition from stage N to stage 1. However, cell division is never instantaneous in real systems. This finite division time effect may cause the abrupt drops observed in the lineage data. To verify the estimate of p, we compute the ratio of the size associated with a green dot to the division size in the previous generation, which can be viewed as the realistic partition ratio. The mean of the realistic partition ratio is estimated to be 0.44–0.45 for all the three temperatures (0.445 ± 0.038 for cells at 25°C, 0.442 ± 0.034 for cells at 27°C, and 0.450 ± 0.032 for cells at 37°C), which are very close to the estimates of p in Table 3. A natural question is whether the lineage data used here can be described by simpler models with fewer parameters. In fact, the models in many previous papers can be viewed as special cases of our model. In Osella et al. (2014); Nieto-Acuna et al. (2019); Totis et al. (2020a), the authors focus on the special case of only one cell cycle stage (N = 1) and the majority of previous papers focus on the special case of symmetric division (p = 0.5) Kohram et al. (2021). Note that symmetric division mentioned here means that the mean partition ratio equals 0.5 and we allow the partitioning to be stochastic. To see whether the data studied here can be reproduced by simpler models, we fit the cell size distribution to the model with N = 1 as well as the model with p = 0.5 (Figure S2). Both models fail to capture the unusual shape of the cell cycle distribution. This suggests that our model seems to be minimal in order to describe real lineage data.

Discussion

In this work, we have analytically derived the cell size distribution of measurements obtained from a cell lineage. We have solved two models. The first model assumes that (i) the birth size is a fixed (generation independent) fraction of the division size in the last generation; (ii) the cell grows exponentially between birth and division events where the growth rate is a generation independent constant; (iii) the length of the cell cycle is stochastic; (iv) size homeostasis is enforced by timer-like, sizer-like, or adder strategies. A second model was also solved which relaxes the assumption (i) above, namely it allows for a stochastic ratio of the birth to division size. The main features of the experimental cell size distribution in E. coli, namely a fast increase in the size count for small cells, a slow decay for moderately large cells, and a fast decay for large cells, are reproduced by the analytical solution of both models when the parameters N and α are large enough; this implies that these features emerge when the variability in the cell cycle duration is not too large and when adder or sizer-like mechanisms enforce size homeostasis. We also find that noise in partitioning at cell division (noise in the ratio of the birth to division size) has a considerable influence on the shape of the cell size distribution, whereas noise in the growth rate hardly exerts any influence; this is in agreement with an earlier moment-based study Modi et al. (2017). Our theory predicts that large cell cycle duration variability, timer-like division strategy, and tracking the smaller daughter at division lead to larger skewness and coefficient of variation of the cell size distribution. We have furthermore shown that (i) the distribution of cell cycle duration that emerges from our model is well approximated by a gamma distribution that has been measured experimentally for many cell types Golubev (2016); (ii) if cells divide asymmetrically, they are tracked randomly after division, and if cell cycle duration variability is intermediate or low, then the cell size distribution is multimodal. Lastly, we have shown that the theoretical distributions provide an excellent fit to the experimental E. coli cell size and doubling time distributions reported in Tanouchi et al. (2017) for three different growth conditions. This match provides support for the implicit assumption of our model that the speed of the cell cycle (the transition rate between effective stages) monotonically increases with the cell volume and specifically has a power law dependence on the cell volume. Note that while this law is compatible with certain biophysical mechanisms (as discussed earlier in the model specification section), it can also be seen as phenomenological means to model cell size homeostasis; in fact more generally and beyond the context of our model, the usage of kinetic rates with power laws has found widespread applications in the effective modeling of complex biochemical kinetics in cells Savageau and Voit (1987). Finally, based on the matching of the experimental to the theoretical cell size and doubling time distributions, we have estimated all the model parameters directly from E. coli cell lineage data and found that the regulation strength α exhibits a weak increase with temperature. The estimated values of α (using model II, the most accurate model in this paper) ranging between 0.8 and 1.2 confirm the previous results that some E. coli strains use the adder strategy to achieve size homeostasis Tanouchi et al. (2015); Wallden et al. (2016). Simulations with the inferred parameters using distribution matching also captured the size and doubling time correlations between mother and daughter cells—this provides further evidence of the accuracy of the deduced model. In previous studies, simpler models of cell size regulation with fewer parameters have been proposed Amir (2014); Osella et al. (2014); Vargas-García and Singh (2018); Nieto-Acuna et al. (2019); Totis et al. (2020a); Nieto et al. (2020a); Lin and Amir (2020). Here, we have shown that inference using the constraints of previous models such as N = 1 or p = 0.5 do not reproduce the correct shape of the lineage cell size distribution (Figure S2). To gain more insights, we compare our model with the classical model proposed in Amir (2014). In that paper, the doubling time in a particular generation is assumed to be a function of the birth size. Under this assumption, the model is not Markovian; this is because given an arbitrary time t, the evolution of the system after time t not only depends on the cell size at that time but also depends on the cell sizes in the past (since the birth time is generally smaller than time t). In our work, by assuming multiple cell cycle stages and the coupling between the size and the rate of cell cycle progression, we are able to model cell size dynamics as a Markov process. By using the powerful tool of Markov processes, we manage to compute the exact solution of the cell size distribution of lineage measurements, which is difficult for a non-Markovian model like in Amir (2014). In addition, in Amir (2014), (i) the partitioning at cell division is assumed to be symmetric (the mean partition ratio p is assumed to be 0.5) and (ii) in one of the two models considered there, the doubling time distribution is assumed to be Gaussian. In naturally occurring systems, the partitioning at cell division is often asymmetric Shi et al. (2020). Here, we have shown that asymmetry and stochasticity in partitioning will greatly affect the distribution shape and assuming symmetric division will lead to large errors in data fitting. Note also that generally the doubling time distribution is not Gaussian (while it looks like being normal in Figure 6B for E. coli, it is not for most organisms Golubev (2016); Yates et al. (2017); Chao et al. (2019); Perez-Carrasco et al. (2020); Gavagnin et al. (2020)), rather it is right skewed and well approximated by a gamma or Erlang distribution which our model can capture but the method in Amir (2014) cannot. Concluding, the major advance in our work is the analytic derivation of the cell size distribution of lineage measurements, while previous studies focused more on population measurements Xia et al. (2020); Xia and Chou (2021) or moments of lineage measurements. The advantages of the analytical distribution are (i) the ease and speed with which one can explore the dependence of cell size statistics on parameters across large swathes of parameter space (compared to stochastic simulations) and (ii) the reliable estimation of parameters from data based on distribution matching which is generally much more robust than moment-based estimation Munsky et al. (2018); Öcal et al. (2019). The present model presents a framework onto which one can build further biological realism; current research work aims to extend the model to include gene expression and its correlation to cell size resulting in concentration homeostasis of mRNAs and proteins Padovan-Merhar et al. (2015); Shahrezaei and Marguerat (2015); Vargas-Garcia et al. (2018); Bertaux et al. (2018); Lin and Amir (2018); Cao and Grima (2020).

Limitations of the study

A major assumption of our model is that cell growth is exponential. While this is a common assumption and holds for various cell types, it is not universal. Hence, the present model cannot, for example, predict the cell size distributions in Schizosaccharomyces pombe (fission yeast) where the increase of cell size with time after birth is non-exponential Nobs and Maerkl (2014); Nakaoka and Wakamoto (2017).

Resource availability

Lead contact

Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Ramon Grima (ramon.grima@ed.ac.uk).

Material availability

This study did not generate new unique reagents.

Data and code availability

All data needed to evaluate the conclusions in the paper are present in the paper and in Tanouchi et al. (2017).

Methods

All methods can be found in the accompanying Transparent Methods supplemental file.

48 in total

1. A noisy linear map underlies oscillations in cell size and gene expression in bacteria.

Authors: Yu Tanouchi; Anand Pai; Heungwon Park; Shuqiang Huang; Rumen Stamatov; Nicolas E Buchler; Lingchong You
Journal: Nature Date: 2015-06-03 Impact factor: 49.962

2. Demographic noise and piecewise deterministic Markov processes.

Authors: John Realpe-Gomez; Tobias Galla; Alan J McKane
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2012-07-30

3. The Synchronization of Replication and Division Cycles in Individual E. coli Cells.

Authors: Mats Wallden; David Fange; Ebba Gregorsson Lundius; Özden Baltekin; Johan Elf
Journal: Cell Date: 2016-07-28 Impact factor: 41.582

4. Cell-size control and homeostasis in bacteria.

Authors: Sattar Taheri-Araghi; Serena Bradde; John T Sauls; Norbert S Hill; Petra A Levin; Johan Paulsson; Massimo Vergassola; Suckjoon Jun
Journal: Curr Biol Date: 2014-12-24 Impact factor: 10.834

5. Cell size control and gene expression homeostasis in single-cells.

Authors: Cesar A Vargas-Garcia; Khem Raj Ghusinga; Abhyudai Singh
Journal: Curr Opin Syst Biol Date: 2018-02-02

6. Allocation of gene products to daughter cells is determined by the age of the mother in single Escherichia coli cells.

Authors: Chao Shi; Lin Chao; Audrey Menegaz Proenca; Andrew Qiu; Jasper Chao; Camilla U Rang
Journal: Proc Biol Sci Date: 2020-05-06 Impact factor: 5.349