| Literature DB >> 31214154 |
Qing-Feng Wen1, Shuo Liu1, Chuan Dong1, Hai-Xia Guo1, Yi-Zhou Gao1, Feng-Biao Guo1.
Abstract
Geptop has performed effectively in the identification of prokaryotic essential genes since its first release in 2013. It estimates gene essentiality for prokaryotes based on orthology and phylogeny. Genome-scale essentiality data of more prokaryotic species are available, and the information has been collected into public essential gene repositories such as DEG and OGEE. A faster and more accurate toolkit is needed to meet the increasing prokaryotic genome data. We updated Geptop by supplementing more validated essentiality data into reference set (from 19 to 37 species), and introducing multi-process technology to accelerate the computing speed. Compared with Geptop 1.0 and other gene essentiality prediction models, Geptop 2.0 can generate more stable predictions and finish the computation in a shorter time. The software is available both as an online server and a downloadable standalone application. We hope that the improved Geptop 2.0 will facilitate researches in gene essentiality and the development of novel antibacterial drugs. The gene essentiality prediction tool is available at http://cefg.uestc.cn/geptop.Entities:
Keywords: bioinformatics; essential genes; prediction; prokaryotes; software
Year: 2019 PMID: 31214154 PMCID: PMC6558110 DOI: 10.3389/fmicb.2019.01236
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1The AUC result of 37 genomes among Geptop 1.0, Geptop 2.0. And the abbreviation of 37 bacteria please refers to Supplementary Table S1.
FIGURE 2Prediction performance of BLAST tool, CEG_Match, EGP, SVM, and Geptop 2.0 in the 23 genomes. (A) AUC scores of the gene essentiality prediction by BLAST tool, CEG_Match, EGP, SVM, and Geptop 2.0, respectively, for each genome. (B) Box plot of AUC scores from the prediction of the five tools for the 23 genomes.
FIGURE 3The correlation between minimal species distance and prediction performance among 37 genomes. In each sub-picture, the horizontal axis represents the minimal distance between the query genome and 36 reference genomes. The vertical axis represents the prediction performance estimated by AUC, Sensitivity, Specificity, F-measure and MCC indexes. The “Cor” value in figure is “Pearson correlation coefficient.” (A) The correlation between AUC score and minimal species distance. (B) The correlation between the sensitivity index and minimal species distance. (C) The correlation between the specificity index and minimal species distance. (D) The correlation between the F-measure index and minimal species distance. (E) The correlation between MCC index and minimal species distance.