| Literature DB >> 23520492 |
Jingyuan Deng1, Shengchang Su, Xiaodong Lin, Daniel J Hassett, Long Jason Lu.
Abstract
Large-scale systematic analysis of gene essentiality is an important step closer toward unraveling the complex relationship between genotypes and phenotypes. Such analysis cannot be accomplished without unbiased and accurate annotations of essential genes. In current genomic databases, most of the essential gene annotations are derived from whole-genome transposon mutagenesis (TM), the most frequently used experimental approach for determining essential genes in microorganisms under defined conditions. However, there are substantial systematic biases associated with TM experiments. In this study, we developed a novel Poisson model-based statistical framework to simulate the TM insertion process and subsequently correct the experimental biases. We first quantitatively assessed the effects of major factors that potentially influence the accuracy of TM and subsequently incorporated relevant factors into the framework. Through iteratively optimizing parameters, we inferred the actual insertion events occurred and described each gene's essentiality on probability measure. Evaluated by the definite mapping of essential gene profile in Escherichia coli, our model significantly improved the accuracy of original TM datasets, resulting in more accurate annotations of essential genes. Our method also showed encouraging results in improving subsaturation level TM datasets. To test our model's broad applicability to other bacteria, we applied it to Pseudomonas aeruginosa PAO1 and Francisella tularensis novicida TM datasets. We validated our predictions by literature as well as allelic exchange experiments in PAO1. Our model was correct on six of the seven tested genes. Remarkably, among all three cases that our predictions contradicted the TM assignments, experimental validations supported our predictions. In summary, our method will be a promising tool in improving genomic annotations of essential genes and enabling large-scale explorations of gene essentiality. Our contribution is timely considering the rapidly increasing essential gene sets. A Webserver has been set up to provide convenient access to this tool. All results and source codes are available for download upon publication at http://research.cchmc.org/essentialgene/.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23520492 PMCID: PMC3592911 DOI: 10.1371/journal.pone.0058178
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Using the PEC dataset as gold-standards to identify the false essential and non-essential genes in the TM dataset.
| *E – Essential | PEC dataset (Gold Standard) | ||
| *N – Non-essential | E (259) | N (3574) | |
|
| TmEs (615) | 186 (TETmEs) | 429 (FETmEs) |
|
| TmNs (3218) | 73 (FNTmNs) | 3145 (TNTmNs) |
Figure 1Three factors have strong associations with false TM assignments.
(A) Gene length. The lengths of TmEs are significantly shorter than those in the PEC dataset and total genes. Many of these short genes may be false essential genes. (B) Position of insertions. Essential genes mistakenly assigned to be non-essential by TM often have insertions in the 25% extreme-ends (5% in 5′ end and 20% in 3′ end). These insertions do not completely disrupt a gene’s function. (C) Number of insertions. 75% of the essential genes mistakenly assigned to be non-essential by TM only have one insertion in them.
Figure 2Illustration of the statistical model.
In a TM experiment, if a gene has no observed insertions, meaning it is TM essential or TmEs, what could it be? There are two possibilities: (1) Part A: It never had any insertion and was missed by all transposons by chance. This means we do not have useful information to infer what this gene could be, and it is completely blind for us. For any blind gene, we can only try our best guess and assume that the chance of that gene to be essential is equal to the overall essential gene rate (Pr(overall essential)), and that a gene to be non-essential is equal to = 1-. (2) Part B: It actually had insertions, but all inserted mutations died. This means that this gene is truly essential. In this way, we can now split the TM assigned essential genes into two parts, TETmE and FETmE. Similarly, if in the TM experiment, a gene is observed to have insertions, meaning it is TM nonessential, what could it really be? There are also two possibilities: (1) Part C: All these observed insertions are ineffective, and did not interrupt the gene function. This means again we are blind about this gene. So it has a certain chance to be essential , and also has a certain chance to be nonessential . (2) Part D: There was at least one effective insertion, and it did interrupt the gene function. . This means this gene is truly non-essential.
Improvement of overlaps with the PEC dataset using our model.
| TM dataset | Our Statistical Model | PEC dataset (Gold Standard) | |
| (Gerdes set) | E (259) | N (3574) | |
| TmEs (615) | PETmEs (480) | 176 | 304 |
| PNTmEs (135) | 10 | 125 | |
| TmNs (3218) | PETmNs (12) | 5 | 7 |
| PNTmNs (3206) | 68 | 3138 | |
Figure 3Enrichment of true essential genes using different thresholds of the confidence score.
Figure 4Robustness of our model at subsaturation levels of transposon insertions.
The dashed line showed p-values of the Fisher’s exact test to examine whether the true essential rate in PNTmEs is significantly lower than that in the original TmEs set. Similarly, the solid line showed p-values of the Fisher’s exact test to examine whether the true essential rate in our PETmNs is significantly higher than that in the original TmNs set.
Validation using allelic exchange experiments in Pseudomonas aeruginosa PAO1. E – Essential; N – Non-essential.
|
| Length | Local Insertion Density | Assignments by TM | Ranks by Our Model | Assignments by our model | Assignments by Allelic exchange experiments |
| PA3746 | 1374 | 4.9521 | N | 8/4289 | E | E |
| PA4260 | 822 | 3.4102 | N | 103/4289 | N | E |
| PA0985 | 1497 | 7.5415 | N | 113/4289 | N | N |
| PA4238 | 1002 | 3.3564 | E | 46/678 | E | E (Positive Control) |
| PA0723 | 249 | 7.4446 | E | 414/678 | E | E |
| PA2954 | 570 | 2.0336 | E | 588/678 | N | N |
| PA2143 | 288 | 2.1479 | E | 663/678 | N | N |
Figure 5Interface of the EGTEC Web server.