| Literature DB >> 32280434 |
Anel Mahmutovic1, Pia Abel Zur Wiesch1,2,3,4, Sören Abel1,2,4,5.
Abstract
Transposon insertion sequencing methods such as Tn-seq revolutionized microbiology by allowing the identification of genomic loci that are critical for viability in a specific environment on a genome-wide scale. While powerful, transposon insertion sequencing suffers from limited reproducibility when different analysis methods are compared. From the perspective of population biology, this may be explained by changes in mutant frequency due to chance (drift) rather than differential fitness (selection). Here, we develop a mathematical model of the population biology of transposon insertion sequencing experiments, i.e. the changes in size and composition of the transposon-mutagenized population during the experiment. We use this model to investigate mutagenesis, the growth of the mutant library, and its passage through bottlenecks. Specifically, we study how these processes can lead to extinction of individual mutants depending on their fitness and the distribution of fitness effects (DFE) of the entire mutant population. We find that in typical in vitro experiments few mutants with high fitness go extinct. However, bottlenecks of a size that is common in animal infection models lead to so much random extinction that a large number of viable mutants would be misclassified. While mutants with low fitness are more likely to be lost during the experiment, mutants with intermediate fitness are expected to be much more abundant and can constitute a large proportion of detected hits, i.e. false positives. Thus, incorporating the DFEs of randomly generated mutations in the analysis may improve the reproducibility of transposon insertion experiments, especially when strong bottlenecks are encountered.Entities:
Keywords: Bottleneck; DFE; Distribution of fitness effects; Drift; Multinomial random sampling; Population biology; Random birth-death process; Selection; Tn-seq; Transposon insertion sequencing
Year: 2020 PMID: 32280434 PMCID: PMC7138912 DOI: 10.1016/j.csbj.2020.03.021
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Schematic of transposon insertion sequencing workflow. Description of individual steps to create a transposon insertion library and/or define essential genes in a specific condition. Not all steps can be easily observed experimentally; we highlight routinely measured quantities (eye symbol). In the first step (1), transposons (colored rectangles) are delivered to recipient bacteria and integrated into the genome (rings) at different positions (i) out of all possible insertion sites (k; black rectangles), resulting in N mutant cells. Wild-type cells grow with a division rate β and a death rate δ. The transposons disrupt genes which can result in altered division rates (w) and altered death rates (w) that are specific for the cells bearing a transposon at site i. Typically, experimental constraints lead to inadvertent (or sometimes intended) bacterial growth and death before the library can be analyzed (2AB). This typically serves to select against the wild-type (2B) (dead cells are marked by red x) and leads to a distortion of the mutant frequencies present in the library created by mutagenesis. Sampling of cells (3) can lead to additional distortions. Sampling includes various experimental processes for example harvest of the cells, genomic DNA preparation, and the small amount of genomic DNA subjected to PCR amplification. During the last experimental steps (4), the transposon-genome junctions are prepared for sequencing (exact protocol varies by technique) and then sequenced. Since sequencing capacity is typically limiting, the sequencing bottleneck is another sampling event. Finally, the sequencing data are analyzed (5) by mapping them to the genome and by quantifying the number of sequences per transposon insertion site (n) (green bars). The probability of no reads for a transposon insertion site i is given as qi. The probability of no reads in m sites is qj(m). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 2Illustration of mutagenesis model and the resulting correlation between library complexity and number of mutants. (A) Transposon mutagenesis leads to a pool of uniformly distributed mutant cells over transposon insertion sites i. Illustration of the experimental mutagenesis and the multinomial random sampling model that leads to the same distribution. We show three examples of experiments in which three independent transposon mutants N were created. In this example the transposon (colored rectangle) can insert in any of four potential transposon insertion sites k (black rectangles) on the genome (rings) and depending on the insertion site, can lead to different library complexities C. This is equivalent to randomly picking N mutants from an infinite pool (triple dots) of uniformly distributed mutants. Both the experiment and the model lead to the same distribution of cells with mean µ and variance µ. Because the probability of a transposon to insert to any transposons insertion site is equal, the mean and the variance can be determined either over multiple repetitions of the same experiment or over the different transposon insertion sites within a single experiment (arrows). (B) The relationship between the number of mutants after mutagenesis, N, and the library complexity, C, for k potential insertion sites where k was set to 104 (solid), 105 (dashed) and 106 (dotted). Eq. (1) was used and solved for the number of mutants N = kln(1-C) to generate the plot where the range of C was set to 0.001 to 0.999. The larger the number of potential insertion sites, the more mutants are needed to reach a given library complexity shown as a shift of the graph to the right for larger k.
A summary of the variables used in this work.
| Variable | Meaning | Comments |
|---|---|---|
| A potential transposon insertion site in the genome of the cell. | The range is | |
| Number of insertion sites per gene/locus | This variable is used in the supplementary material to quantify extinctions for a subset of insertion sites, | |
| Total number of potential transposon insertion sites in the genome. | The value of | |
| Number of mutants in the mutagenesis step. | ||
| Division rate of wild-type cells. | Defined as the inverse average time it takes for a wild-type cell to divide. | |
| Death rate of wild-type cells. | Defined as the inverse average time it takes for a wild-type cell to die. | |
| Fitness coefficient. | Throughout the paper we assume that the effect of inserting a transposon into the genome of wild-type cells is to modify the net growth rate by | |
| Average number of mutants with a transpons insertion at site | ||
| Subset of insertion sites in one locus that are simultaneously extinct | ||
| Extinction probability of all mutants with transposon insertions in gene i in all m sites. | We use | |
| Number of mutants per potential number of transposon insertion sites. | ||
| Library complexity after mutagenesis. | Defined as the number of transposon insertion sites with at least one transposon insertion divided by the potential number of transposon insertion sites. | |
| Net growth rate of wild-type cells. | The net growth rate is the difference between the division rate | |
| Time of growth | ||
| Selection coefficient | ||
| Total average number of mutants at time t. | ||
| The proportion of mutants with a transposon insertion at site i. | < | |
| Average number of zero reads over | ||
| Sampling size. | ||
| Bottleneck size | Sample size relative to the total mutant population size | |
| The library complexity after growth/death or sampling. | ||
| The base of Eq. | Introduced for notational simplicity to express Eq. |
All cell numbers are implicitly expressed as per unit volume.
Averages over repetitions are denoted with angular brackets <>.
Fig. 3Modelling growth of mutant library. (A) Illustration of the distribution of fitness coefficients (distribution 1). We assume a gamma-distribution for a mutant library created by transposon insertion [53] with a shape parameter of 10 and a scale parameter of 0.09. The fitness of the wild-type, w = 1, is highlighted by the red dashed line. We do not model lethal mutants with a fitness below 0 that would have a negative net growth rate. (B) Same as (A), illustrating the distribution of selection coefficients s = w-1. (C) Illustration of mutant composition of an arbitrary mutant library during exponential growth. The x-axis shows the time in bacterial doubling times, the y-axis shows the number of bacteria. There are six mutants with fitnesses w1 = w2 = 1 (violet and blue), w3 = w4 = w5 = 0.8 (green, yellow and orange), w6 = 0.5 (red). At t = 0, the simulation starts with one mutant of each of the six genotypes and follows them for 10 generations. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Supplementary figure 1
Supplementary figure 2
Supplementary figure 3
Fig. 4Random birth/death process and extinctions for two different DFEs. In this graph we compare the dynamics of two mutant libraries with different DFEs. The DFEs are gamma-distributed with shape parameter 10 and scale parameter 0.09 (distribution 1, black) and 0.04 (distribution 2, red). The number of potential transposon insertion sites (k) was set to 105 with the number of mutant cells set to 5 at t = 0 for i = 1,2,…, 105. In the top panel, the fitness coefficient wi is shown on the x-axis and the number of mutants i with the corresponding fitness is shown on the y axis. The fitness coefficients were binned using a bin width of 0.01 for wi values between 0.01 and 2.5 for both distributions. The binned mean fitness values (magenta and green dashed vertical lines) were calculated by summing wifi over i where i is the number of bins (250) and fi is the proportion of mutant cells in bin i. (A) The number of mutants present at the start of a birth-death process. (B) The distribution of the number of mutants over the fitness coefficients after 2 h of growth (Eq. (2)) with a baseline division rate set to β = 0.03 min−1 and a baseline death rate set to δ = 0.02 min−1. (C) The distribution of the number of mutants over the fitness coefficients after 4 h of growth with the same rates as in (B). The bottom panel shows the extinction probability (y-axis) for all mutants corresponding to 1–4 insertion sites within a gene (x-axis). Eq. (5) was used to calculate the extinction probabilities where wi was either sampled from distribution 1 (black) or distribution 2 (red). All insertion sites within an essential gene have the same fitness cost with wj arbitrarily chosen and set to 0.15 to represent a gene with high fitness costs. (D) Extinction probabilities after 2 h of growth. (E) Same as (D) except the mutant cells have been growing for 4 h. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
. The average extinction probability can be estimated by using the error propagation method for which the simplest estimate is a substitution of the proportion of mutants fi with the average proportion of mutantsin Eq. (7). In the context of library complexity reduction (Section 2.4), we include higher order terms that depends on the variances and covariances in the proportion of mutants to estimate the average extinction probability (Supplementary data S2).
Fig. 5The effects of bottlenecks on mutant extinction. (A) The fitness distributions (DFE) of two mutant populations with high average (black, distribution 1) and low average (red, distribution 2) fitness at the start of the birth-death process and before passage through a bottleneck. This figure is equivalent to Fig. 4A. The binned mean fitness is indicated in dashed magenta for distribution 1 and dashed green for distribution 2. (B) The distribution of the number of mutants over the fitness coefficients after a sample of 106 mutants following 2 h of growth. The distribution of the pre-bottleneck population is shown in Fig. 4B. (C) The distribution of the number of mutants over the fitness coefficients after a sample of 106 mutants following 4 h of growth. The distribution of the pre-bottleneck population is shown in Fig. 4C. (D) The extinction probability (y-axis) for all mutants corresponding to 1–4 insertion sites within a gene (x-axis) after sampling 106 mutants following 2 h of growth. Eq. (7) was used to calculate the extinction probabilities where wi was either sampled from distribution 1 (black) or distribution 2 (red). All insertion sites within a gene have the same fitness cost where we show an example with wj arbitrarily chosen and set to 0.15. (E) Same as (D) except the mutant cells have been growing for 4 h after which 106 mutant cells are sampled. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
,due to a birth-death process where a has been introduced for notational simplicity and is equal to the base of Eq. (5). For a fractional random sampling event, the average extinction probability reads,
Fig. 6Library complexity reduction due to random birth-death events and random sampling events. The initial mutant population emerges from the process of mutagenesis with an average number of mutant cells per potential transposon insertion site µ (x-axis) related to the initial library complexity C through Eq. (1). (A) The mutant population grows on average with a baseline division rate β set to 0.03 min−1 and a baseline death rate δ set to 0.02 min−1 with fitness coefficients sampled from distribution 1 (shape = 10, scale = 0.09). The mutant subpopulations have a chance to go extinct as a consequence of stochastic fluctuations shown as a reduction in library complexity, C’ (y-axis). The theoretical results were calculated using Eq. (9). The black solid line shows the library complexity reduction due to stochastic fluctuations after 2 h of growth and the black dashed line after 4 h of growth. Stochastic tau-leaping simulations were ran for 20 iterations for each value of µ where for each iteration the number of cells for site i = 1,2,…,103 was drawn from a Poisson distribution with mean µ. The mean and the standard error in the library complexity was subsequently calculated and plotted as red circles for cells growing for 2 h and red squares for cells growing for 4 h. (B) Library complexity reduction after sampling 10% (black solid line) and 1% (black dashed line) of the initial mutant population that emerges from the process of mutagenesis. The theoretical results were calculated using Eq. (10). Multinomial random sampling simulations were ran for 20 iterations where for each iteration the number of cells for site i = 1,2,…,105 was drawn from a Poisson distribution with mean µ. The mean and the standard error in the library complexity was subsequently calculated and plotted as red circles and red squares for the 10% bottleneck case and the 1% bottleneck case, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 7Bottlenecks are a major source of false positives during Tn-seq experiments. (A) Distribution of fitness coefficients (DFE) for a library of random mutants. The DFE is the same as distribution 1 in Fig. 4, Fig. 5, i.e. gamma-distributed with shape parameter 10 and scale parameter 0.09. Here, we scaled this distribution to a population size of 106 cells after mutagenesis and assume five mutants per transposon insertion site at the beginning of the experiment. (B) Extinction probabilities (y-axis) of a single insertion site depending on the fitness coefficient (x-axis) and different bottleneck sizes (0.0001% black, 0.01% red, 0.1% green) as predicted by Eq. (12). We assume that bacteria grew for 4 h with a doubling time of 20 min (r = ln(2)/20 min−1). (C) Fitness distribution of the population of mutants that went extinct, i.e. had zero sequence reads at the end of the experiment. This graph was obtained by multiplying the gamma distribution in (A) with the extinction probability in (B). (B) and (C) focus on a single transposon insertion site. The percentage of loci going extinct for a given bottleneck is specified in the graph. (D) Same as (B) for the simultaneous extinction of 30 insertion sites with the same fitness, i.e. located in the same gene. (E) Same as (C), also for 30 insertion sites. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)