| Literature DB >> 28165493 |
Zheng Rong Yang1, Helen L Bullifent2, Karen Moore1, Konrad Paszkiewicz1, Richard J Saint2, Stephanie J Southern2, Olivia L Champion1, Nicola J Senior1, Mitali Sarkar-Tyson2,3, Petra C F Oyston2, Timothy P Atkins2, Richard W Titball1.
Abstract
Massively parallel sequencing technology coupled with saturation mutagenesis has provided new and global insights into gene functions and roles. At a simplistic level, the frequency of mutations within genes can indicate the degree of essentiality. However, this approach neglects to take account of the positional significance of mutations - the function of a gene is less likely to be disrupted by a mutation close to the distal ends. Therefore, a systematic bioinformatics approach to improve the reliability of essential gene identification is desirable. We report here a parametric model which introduces a novel mutation feature together with a noise trimming approach to predict the biological significance of Tn5 mutations. We show improved performance of essential gene prediction in the bacterium Yersinia pestis, the causative agent of plague. This method would have broad applicability to other organisms and to the identification of genes which are essential for competitiveness or survival under a broad range of stresses.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28165493 PMCID: PMC5292949 DOI: 10.1038/srep41923
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Mutant analysis.
| input1 | input2 | input3 | |
|---|---|---|---|
| site.ALL | 330,017 | 251,995 | 330,050 |
| site.ORF | 252,548 | 191,425 | 253,102 |
| tn.follow.TA.ALL | 32,368 | 25,826 | 32,479 |
| tn.follow.TA.ORF | 23,077 | 18,341 | 23,248 |
| TA.ALL | 195,145 | 195,145 | 195,145 |
| TA.ORF | 69,656 | 69,656 | 69,656 |
“ALL” stands for the statistics across the whole genome. “ORF” stands for the statistics within ORFs, i.e. genes. “site” stands for the number of transposon insertion sites (mutants). “TA” stands for the base pair sequence (TA) in the genome.
Figure 1Noise trimming of the input1 dataset.
The horizontal axis represents the log of the number of transposon insertions per gene. The vertical axis stands for the frequency of the log of transposon insertion number per gene. The vertical dotted line indicates the threshold corresponding to a critical p value 0.05. Genes whose insertion counts were below this threshold were treated as Type II essential genes.
Figure 2Prediction of essential genes for input1 using DEM.
The curve shows the relationship between log MF values and the corresponding false discovery rates (q values). The triangle indicates the boundary of separation between essential genes and non-essential genes. Bars in blue represent the density of the log MF values. The horizontal axis stands for log value of MF and the vertical axis stands for the frequency and q values.
Figure 3A Venn diagram of all essential genes predicted by our system for the three samples.
Essential genes predicted by our system for the three samples. They include all three types of essential genes.
Figure 4Locations of 548 essential genes identified in the Y. pestis chromosome.
Moving out from the centre the layers show; MF values; transposon insertion sites per gene for all genes; insertion counts per gene for all genes; transposon insertion counts per base pair genome-wise. Brown bars indicate Type I essential genes, red bars represent Type II essential genes and blue bars represent Type III essential genes.
Comparison of our DEM essential gene prediction algorithm with the performance of other prediction algorithms.
| Noise trim | Total predicted essential genes | Coincidence rate between samples | Coincidence rate with DEG | |
|---|---|---|---|---|
| Our system | Yes | 548 | 85.1% | 82.0% |
| TraDIS | Yes | 1034 | 79.1% | 32.4% |
| ESSENTIALS | Yes | 499 | 75.8% | 51.7% |
| TraDIS | No | 342 | 75.7% | 32.5% |
| ESSENTIALS | No | 62 | 14.5% | 51.2% |
Genes validated as essential in this study.
| Gene predicted as essential | Phenotype in this study |
|---|---|
| Essential in broth assay | |
| Essential in broth assay | |
| Essential in broth assay | |
| Essential in broth assay | |
| Essential in broth assay | |
| Essential on solid media, not in broth | |
| Essential on solid media, not in broth | |
| Essential on solid media, not in broth |