| Literature DB >> 34255819 |
Valentine U Nlebedim1, Roy R Chaudhuri2, Kevin Walters1.
Abstract
MOTIVATION: Probabilistic Identification of bacterial essential genes using TraDIS data based on Tn5 libraries has received relatively little attention in the literature; most methods are designed for mariner transposon insertions. Analysis of Tn5 transposon-based genomic data is challenging due to the high insertion density and genomic resolution. We present a novel probabilistic Bayesian approach for classifying bacterial essential genes using transposon insertion density derived from transposon insertion sequencing data. We implement a Markov chain Monte Carlo sampling procedure to estimate the posterior probability that any given gene is essential. We implement a Bayesian decision theory approach to selecting essential genes. We assess the effectiveness of our approach via analysis of both simulated data and three previously published Escherichia coli, Salmonella Typhimurium and Staphylococcus aureus datasets. These three bacteria have relatively well characterised essential genes which allows us to test our classification procedure using receiver operating characteristic curves and area under the curves. We compare the classification performance with that of Bio-Tradis, a standard tool for bacterial gene classification.Entities:
Year: 2021 PMID: 34255819 PMCID: PMC8652038 DOI: 10.1093/bioinformatics/btab508
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Shape parameters of the gamma prior distributions of the parameters in
| Dataset | Shape parameters of | |||
|---|---|---|---|---|
| α | β | α | β | |
|
| 2 | 166 | 2 | 17 |
|
| 1 | 182 | 11 | 83 |
|
| 2 | 178 | 3 | 18 |
Numbers of true insertions, spurious insertions and exponential rate parameter, λ, for the four simulated data scenarios
| Dataset | Number of Insertions | Number of spurious insertions | λ |
|---|---|---|---|
| HH | 700 000 | 30 000 | 50 |
| HL | 700 000 | 30 000 | 10 |
| LH | 20 000 | 1000 | 50 |
| LL | 20 000 | 1000 | 10 |
Fig. 1.ROC curves and AUCs (in the legend) for the E. coli, S. Typhimurium and S. aureus datasets using the posterior probability of gene essentiality as the classifier
Fig. 2.The ROC curves for the simulated HH, LH, HL and LL datasets. The AUCs are given in the legend. The classification statistic is the posterior probability of gene essentiality (a) E. coli, (b) S. Typhimurium and (c) S. aureus
Fig. 3.True and false positive rates for Bio-Tradis and INSDENS for four different values of . The point in ROC space is labelled with its numerical R value for INSDENS or with a solid circle for Bio-Tradis. For convenience, the R value shown is 100 times the actual value