| Literature DB >> 27694198 |
Ilham Ayub Shahmuradov1, Rozaimi Mohamad Razali1, Salim Bougouffa1, Aleksandar Radovanovic1, Vladimir B Bajic1.
Abstract
Motivation: The computational search for promoters in prokaryotes remains an attractive problem in bioinformatics. Despite the attention it has received for many years, the problem has not been addressed satisfactorily. In any bacterial genome, the transcription start site is chosen mostly by the sigma (σ) factor proteins, which control the gene activation. The majority of published bacterial promoter prediction tools target σ 70 promoters in Escherichia coli . Moreover, no σ-specific classification of promoters is available for prokaryotes other than for E. coli .Entities:
Mesh:
Substances:
Year: 2017 PMID: 27694198 PMCID: PMC5408793 DOI: 10.1093/bioinformatics/btw629
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1Flow-chart of the algorithm implemented in the bTSSfinder program. T-10 is the threshold for the prediction of the -10 box (specific for every sigma class). Ttotal – the NN threshold for the selection of TSSs
Testing results for five sigma classes of E. coli and cyanobacteria
| σ class | TP | FN | TN | FP | Sn, % | Sp, % | P1, % | P2, % | F1, % |
|---|---|---|---|---|---|---|---|---|---|
| σ70 | 180 | 20 | 377 | 23 | 90.0 | 93.3 | 94.0 | 90.4 | 92 |
| σ38 | 37 | 3 | 75 | 5 | 92.5 | 93.8 | 93.7 | 92.6 | 93.1 |
| σ32 | 46 | 4 | 95 | 5 | 92.0 | 95.0 | 94.9 | 92.2 | 93.4 |
| σ28 | 31 | 4 | 69 | 1 | 88.6 | 98.6 | 98.4 | 89.6 | 93.2 |
| σ24 | 86 | 14 | 193 | 7 | 86.0 | 96.5 | 96.1 | 87.3 | 90.7 |
| σA | 921 | 79 | 1921 | 79 | 92.1 | 96.1 | 95.9 | 92.4 | 94 |
| σC | 36 | 14 | 96 | 4 | 72 | 96 | 94.7 | 75.0 | 81.8 |
| σF | 80 | 20 | 193 | 7 | 80.0 | 96.5 | 95.8 | 82.8 | 87.2 |
| σG | 72 | 28 | 188 | 12 | 72.0 | 94.0 | 92.3 | 77.1 | 80.9 |
| σH | 38 | 12 | 92 | 8 | 76 | 92 | 90.5 | 74.2 | 82.6 |
Test experiments for every sigma class were repeated 10 times for randomly selected negative sets and the means were taken.
Comparison of available promoter prediction programs tested on E. coli’s experimentally validated σ70 promoter sequences and a negative dataset of 251 bp sequences.
| Promoter prediction tool | Genes with ≥1 TSSpr | Total number of TSSpr | TP | FP | FN | Sn, % | P1,% | F1,% |
|---|---|---|---|---|---|---|---|---|
| bTTSSfinder | 197 | 355 | 143 | 212 | 57 | 71.5 | 40.3 | 51.5 |
| BPROM | 200 | 569 | 130 | 439 | 70 | 65.0 | 22.9 | 33.8 |
| NNPP2 | 175 | 460 | 109 | 351 | 91 | 54.5 | 23.7 | 33.0 |
| PromPredict | 74 | 149 | 0 | 149 | 200 | 0.0 | 0.0 | 0.0 |
Prediction is true, if distance between annotated and predicted TSSs is 50 bp or less.TSSpr: predicted TSS.
Comparison of available promoter prediction programs assessed on the 1100 bp upstream region of 200 E. coli σ70 promoters with experimentally validated TSSs.
| Promoter prediction tool | Tp | Fn | Tn | Fp | Sn % | Sp % | F1-score | MCC |
|---|---|---|---|---|---|---|---|---|
| bTTSSfinder | 183 | 17 | 189 | 11 | 91.5 | 94.5 | 0.93 | 0.86 |
| BPROM2 | 152 | 48 | 166 | 34 | 76.0 | 83.0 | 0.79 | 0.59 |
| NNPP22 | 109 | 91 | 176 | 24 | 54.5 | 88.0 | 0.66 | 0.45 |
| PromPredict2 | 0 | 200 | 200 | 0 | 0.0 | 100.0 | n.d. | n.d. |
Prediction is true, if the annotated TSS is exactly predicted.
Prediction is true, if distance between annotated and predicted TSSs is 50 bp or less.
n.d. not determined.
Result of cross-phylum application of bTSSfinder on the positive dataset. Bold refers to sensitivity of the models applied to their intended species.
| Test sets | Sensitivity | |
|---|---|---|
| bTSSfinder models for | bTSSfinder models for cyanobacteria1 | |
| σ70 | 66% | |
| σA | 87.5% | |
| σ38 | 35% | |
| σC | 68% | |
| σ32 | 54% | |
| σH | 58% | |
| σ28 | 42.6% | |
| σF | 37% | |
| σ24 | 31% | |
| σG | 15.8% | |
Sensitivity values obtained with the species-specific bTSSfinder parameters are given in bold.
Fig. 2The scoring landscape of experimentally validated TSSs in E. coli. Top line: the distribution of NN scores that are higher than the threshold for every 300 bp upstream and downstream of a TSSmap. Bottom line: cases where the TSSpred is the TSSmap