| Literature DB >> 25894004 |
Oren Ish-Am1, David M Kristensen2, Eytan Ruppin3.
Abstract
One of the basic postulates of molecular evolution is that functionally important genes should evolve slower than genes of lesser significance. Essential genes, whose knockout leads to a lethal phenotype are considered of high functional importance, yet whether they are truly more conserved than nonessential genes has been the topic of much debate, fuelled by a host of contradictory findings. Here we conduct the first large-scale study utilizing genome-scale metabolic modeling and spanning many bacterial species, which aims to answer this question. Using the novel Media Variation Analysis, we examine the range of conservation of essential vs. nonessential metabolic genes in a given species across all possible media. We are thus able to obtain for the first time, exact upper and lower bounds on the levels of differential conservation of essential genes for each of the species studied. The results show that bacteria do exhibit an overall tendency for differential conservation of their essential genes vs. their non-essential ones, yet this tendency is highly variable across species. We show that the model bacterium E. coli K12 may or may not exhibit differential conservation of essential genes depending on its growth medium, shedding light on previous experimental studies showing opposite trends.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25894004 PMCID: PMC4403854 DOI: 10.1371/journal.pone.0123785
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Algorithm for finding a medium with the maximal separation between dN/dS values of essential and nonessential genes.
Metabolic models (a) are first preprocessed and databases added to them, to support fast computation of essential genes (b). Simulated Annealing (c) is the first stage of optimization and the resulting medium found is further purified from redundant compounds (d), resulting in the two desired media: those which maximize the differential conservation of essential genes (e) and those which maximize the differential conservation of the nonessential genes (f).
Fig 2Bacteria model KOR scores across all possible media.
Each horizontal bar represents a KOR score interval—the minimum and maximum scores attained by the model across all possible media. Each model is represented by two bars, one on the left and one on the right (bacteria names are presented in two columns for readability only). The left column of bars shows KOR scores testing the hypothesis that the essential genes are differentially conserved, while the right column of bars shows the Anti-KOR scores, that is, testing the hypotheses that non-essential genes are differentially conserved. Models with left bars extending left of the (left) significance line have a medium under which they follow the KOR hypothesis, and analogously, models with right bars extending right of the (right) significance line have a medium under which they follow the anti-KOR hypothesis. Both E. coli models used in the study are shown in orange (upper one is SEED model). KOR Classes are marked by the blue text boxes. We did not find an organism whose essential genes were differentially conserved in some medium and his nonessential genes were differentially conserved in another medium—this can be seen as no bacterium has both its left and right bars crossing the significance lines.
Fig 3Metabolic model KOR class distribution.
The distribution of metabolic models among KOR classes: The (normally-fitted) distribution tends towards the Strongly-KOR class, showing an overall mild tendency of the bacteria studied to conserve the sequence of their essential genes. No bacterial models were found to be Strongly-anti-KOR.
Fig 4Experimental vs. MVA derived KOR scores.
KOR scores were computed for several gene-essentiality datasets from DEG, which were experimentally determined on synthetic lab media. Each horizontal error bar marks the computationally derived KOR score bounds found by ECOEDS and the small red rectangle marks the experimental DEG KOR score. Where available, both SEED and curated models were used. The DEG KOR scores for all organisms (but one—B. thailandensis, not shown) scored within the predicted computational bounds.
Bacterial tendency to have differentially conserved essential genes.
| KOR score method | Number of models with | Binomial p-value |
|---|---|---|
| AEt | 14 | 8.90E-15 |
| AEr | 6 | 6.36E-05 |
| APEt | 17 | 3.54E-19 |
| APEr | 10 | 1.88E-09 |
| AENEt | 13 | 2.20E-13 |
| AENEr | 7 | 5.79E-06 |
The middle column lists the number of bacterial models with a significant KOR score. The right column shows the Binomial p-values that such a number of conserved models will be obtained by chance.
Summary of metabolic genome partitions into essential and nonessential genes according to Essential Gene Sets.
| Partition Name | ‘essential’ set | ‘nonessential’ set |
|---|---|---|
| AE-partition | AE genes | All-but-AE genes |
| APE-partition | APE genes | NE genes |
| AENE-partition | AE genes | NE genes |
Three methods are presented for partitioning the genome into essential and nonessential gene according to Essential Gene Sets:
AE-partition is equivalent to the partition done in previous related experimental research, had the experimental essential gene set been determined on a rich medium.
APE-partition is similar to the partition done in previous related experimental research, had the experimental essential gene set been determined on a poor medium.
AENE-partition produces a marked separation, possibly helping overcome metabolic model inaccuracy. AE is the core group of essential genes with a higher probability of aligning with essential genes from experimental data. Similarly, NE are more likely to overlap with nonessential experimental genes. This partition does not cover all metabolic genes, leaving out the PE set.
Summary of methods for assigning KOR score with EGS partitions.
| KOR score method | Description | KOR score method | Description |
|---|---|---|---|
| AEr | AE-partition with rank-sum test | AEt | AE-partition with t-test |
| APEr | APE-partition with rank-sum test | APEt | APE-partition with t-test |
| AENEr | AENE-partition with rank-sum test | AENEt | AENE-partition with t-test |
Three ways to partition the genome along with two different statistical tests for significance lead to six methods for assigning a KOR score to a metabolic model genome.