| Literature DB >> 24564457 |
Bérénice Batut, David P Parsons, Stephan Fischer, Guillaume Beslon, Carole Knibbe.
Abstract
Comparative genomics has revealed that some species have exceptional genomes, compared to their closest relatives. For instance, some species have undergone a strong reduction of their genome with a drastic reduction of their genic repertoire. Deciphering the causes of these atypical trajectories can be very difficult because of the many phenomena that are intertwined during their evolution (e.g. changes of population size, environment structure and dynamics, selection strength, mutation rates...). Here we propose a methodology based on synthetic experiments to test the individual effect of these phenomena on a population of simulated organisms. We developed an evolutionary model--aevol--in which evolutionary conditions can be changed one at a time to test their effects on genome size and organization (e.g. coding ratio). To illustrate the proposed approach, we used aevol to test the effects of a strong reduction in the selection strength on a population of (simulated) bacteria. Our results show that this reduction of selection strength leads to a genome reduction of ~35% with a slight loss of coding sequences (~15% of the genes are lost--mainly those for which the contribution to fitness is the lowest). More surprisingly, under a low selection strength, genomes undergo a strong reduction of the noncoding compartment (~55% of the noncoding sequences being lost). These results are consistent with what is observed in reduced Prochlorococcus strains (marine cyanobacteria) when compared to close relatives.Entities:
Mesh:
Year: 2013 PMID: 24564457 PMCID: PMC3851946 DOI: 10.1186/1471-2105-14-S15-S11
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Graphical representation of the aevol platform. The underlying algorithm iterates three main steps: (1) genome decoding and evaluation, (2) selection of the best individuals and (3) reproduction with mutations and rearrangements. See the main text for details. The lightning shapes correspond to mutations and rearrangements undergone during reproduction. Cells are colored according to g, red cells being those with lowest g and blue highest.
Parameter values used for the runs detailed in the Results section
| Parameter | Symbol | Value |
|---|---|---|
| Population size | 1,000 | |
| Size of the initial (random) genome | 5,000 base pairs | |
| Promoter sequence | 0101011001110010010110, | |
| Terminator sequences | ||
| Initiation signal for the translation | 011011****000 | |
| Termination signal for the translation | 001 | |
| Genetic code | See Figure 1 | |
| Global set of "cellular processes" | Ω | [0,1] |
| Maximal pleiotropy of the proteins | 5.10-3 | |
| Environmental target fluctuates around... | See Figure 1 | |
| Environmental fluctuations: characteristic time | 2,500 | |
| Environmental fluctuations: standard deviation | 5.10-3 | |
| Selection intensity | 750 initially, then 250 | |
| Point mutation rate | 5.10-6 per bp | |
| Small insertion rate | 5.10-6 per bp | |
| Small deletion rate | 5.10-6 per bp | |
| Large deletion rate | 5.10-5 per bp | |
| Duplication rate | 5.10-5 per bp | |
| Inversion rate | 5.10-5 per bp | |
| Translocation rate | 5.10-5 per bp | |
| Length of small indels | Uniform law bw. 1 and 6 bp |
These parameter values were chosen after preliminary analyses. Some parameters like the structural signals have been shown not to impact the genome structure. The impact of whas been studied [30] as well as the impact of mutation rates and particularly rearrangement rates [26]. Here, the mutation rates and wvalues were chosen to obtain a gene density close to bacterial gene density and with enough genes to allow experiments on reductive genome evolution. The intensity and frequency of environmental variations (σ and τ respectively) are currently under study; k is tested here.
Figure 2Gap with target and genome size over time. The presented data is g and genome size for the best individual of the population every 2,000 generations (red: runs with k = 750, green: runs with k = 250). At t = 150,000, the blue line symbolizes the moment at which k is changed in 4 out of 8 simulations. The insets correspond to a zoom from t = 150,000 to t = 300,000.
Figure 3Several genome architecture characteristics. For each run, estimates of the genomic characteristics at equilibrium were computed by averaging the values of the best individuals of the last 10,000 generations. Each bar represents the mean of those equilibrium values over the 4 repetitions (red: runs with k = 750, green: runs with k = 250). The functional CDS are genes involved in cellular processes. The number of noncoding bases corresponds to bases that are not in any RNA with at least one functional gene. The percentage of coding bases is the ratio between the number of bases involved in functional genes and the genome size.
Figure 4Distribution of triangle areas and number of triangles per cellular process. a. The area of a gene's triangle is a proxy for its impact on phenotype and fitness. For each run, the genes of the final best individual were binned into area classes. The red (resp. green) bar plot is the average of the four distributions obtained from the final best individuals of the four runs where k = 750 (resp. k = 250). b. Distribution of the number of triangles per cellular process for the best individual of one simulation with k = 750 in red and one simulation with k = 250 in green. Under relaxed selection, the number of triangles per process is reduced.
Genome characteristics of Escherichia coli, Buchnera, Prochlorococcus and simulations.
| Genome characteristics | Free-living vs Endosymbionts |
| Experimental evolution | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 5.13 × 10 | 0.586 × 10 | 2.545 × 10 | 1.726 × 10 | 11,836 ±261 | 7,746 ±652 | ||||
| 4.54 × 10 | 0.508 × 10 | 2.136 × 10 | 1.54 × 10 | 6,797 ±320 | 5,453 ±729 | ||||
| 5.882 × 10 | 0.774 × 10 | 4.112 × 10 | 1.849 × 10 | 5,038 ±168 | 2,293 ±329 | ||||
| 88.6 ±1 | 86.5 ±4 | 83.2 ±2.3 | 89.3 ±1.2 | 57.4 ±1.7 | 70.2 ±5 | ||||
| 5,095 ±166 | 545 ±98 | 2,733 ±570 | 1,987 ±153 | 107 ±9 | 91 ±8 | ||||
| 896.2 ±4.7 | 931 ±14.2 | 797 ±85.6 | 769.1 ±36.3 | 66.6 ±6.2 | 67.5 ±14.1 | ||||
Genomic data (genome size, genes) was obtained from the NCBI database. Coding and non coding bases, gene length and percentage of coding bases are computed using custom Python scripts for Escherichia coli (O157:H7 str. Sakai, 55989, E24377A, O127:H6 str. E2348/69, S88), Buchnera aphidicola (str. APS, Cinara tujafilina, Bp, Sg), non reduced Prochlorococcus (MIT9303, MIT9313), reduced Prochlorococcus (str. AS9601, MIT9211, MIT9215, MIT9301, MIT9312, MIT9515, NATL1A, NATL2A, CCMP1375, CCMP1986) and simulations with k = 750 and k = 250. For the simulated genomes the real values are irrelevant and cannot be compared with real organisms. To compare with the evolutionary scenario, we show the percentages of reduction (resp. increase) of the different structural parameters. While the reductive evolution of Buchnera has equally affected all genomic compartments, in Prochlorococcus as well as in aevol the reductive evolution mainly affected the non-coding sequences, resulting in an increase in the coding proportion and a moderate loss of genes.