| Literature DB >> 29409441 |
Marco Di Salvo1, Eva Pinatel2, Adelfia Talà1, Marco Fondi3, Clelia Peano4,5, Pietro Alifano6.
Abstract
BACKGROUND: Over the last few decades, computational genomics has tremendously contributed to decipher biology from genome sequences and related data. Considerable effort has been devoted to the prediction of transcription promoter and terminator sites that represent the essential "punctuation marks" for DNA transcription. Computational prediction of promoters in prokaryotes is a problem whose solution is far from being determined in computational genomics. The majority of published bacterial promoter prediction tools are based on a consensus-sequences search and they were designed specifically for vegetative σ70 promoters and, therefore, not suitable for promoter prediction in bacteria encoding a lot of σ factors, like actinomycetes.Entities:
Keywords: G-Quadruplex; G4PromFinder; GC-rich genomes; Motif; Promoter elements; Promoters
Mesh:
Year: 2018 PMID: 29409441 PMCID: PMC5801747 DOI: 10.1186/s12859-018-2049-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Method used for the prediction and the validation of putative promoters
Fig. 2Boxplot of IRs length for S. coelicolor (a) and P. aeruginosa (b)
Statistics of predicted promoters by G4PromFinder algorithm
| Bacterial genome | Positive dataset size | Regions with at least one prediction (%) | Regions with more predictions (%) | Total number of prediction |
|---|---|---|---|---|
| 3570 | 91.2 | 13.8 | 3751 | |
| 2117 | 91.5 | 17.4 | 2305 |
- Testing results of G4PromFindera
| Bacterial genome | TP | FN | FP | TN | Precision (%) | Recall (%) | Specificity (%) | Accuracy (%) | |
|---|---|---|---|---|---|---|---|---|---|
| 384 | 164 | 324 | 224 | 54.3 | 70.1 | 40.8 | 55.5 | 0.61 | |
| 233 | 105 | 308 | 30 | 43.1 | 69.0 | 8.9 | 38.9 | 0.53 |
aTest experiments were repeated 10 times for 548 and 338 randomly selected sequences of positive sets of S. coelicolor A3(2) and P. aeruginosa PA14, and the means were taken
– Some features of the validated promoters
| Bacterial genome | Mean GC content of validated promoters (%) | Mean AT content of the AT-rich element of validated promoters (%) | Validated promoters with “-35 consensus” (%) | Validated promoters with “-10 consensus” (%) |
|---|---|---|---|---|
| 64.5 | 48.5 | 6.1 | 40.1 | |
| 59.6 | 53.3 | 28.2 | 7.4 |
Comparison between G4PromFinder, PePPER, PromPredict and bTSSfinder testing resultsa
| Tools | Bacterial genome | ||
|---|---|---|---|
| G4PromFinder | Recall | 0.70 | 0.69 |
| Precision | 0.54 | 0.43 | |
| F1-score | 0.61 | 0.53 | |
| PePPER | Recall | 0.20 | 0.31 |
| Precision | 0.78 | 0.67 | |
| F1-score | 0.32 | 0.42 | |
| PromPredict | Recall | 0.51 | 0.56 |
| Precision | 0.41 | 0.42 | |
| F1-score | 0.46 | 0.48 | |
| bTSSfinder (for | Recall | 0.45 | 0.41 |
| Precision | 0.33 | 0.31 | |
| F1-score | 0.38 | 0.36 | |
| bTSSfinder (for | Recall | 0.29 | 0.30 |
| Precision | 0.27 | 0.26 | |
| F1-score | 0.28 | 0.28 |
aTest experiments were repeated 10 times for 548 and 338 randomly selected sequences of positive sets of S. coelicolor A3(2) and P. aeruginosa PA14, and the means were taken
Comparison between G4PromFinder and available promoter prediction programs assessed on all the samples of the positive sets
| Program | TP | FN | FP | precision | recall | F1-score | |
|---|---|---|---|---|---|---|---|
| G4PromFinder | 2850 | 870 | 901 | 0.76 | 0.76 | 0.76 | |
| PromPredict | 2075 | 1582 | 934 | 0.69 | 0.56 | 0.62 | |
| PePPER | 1538 | 2768 | 683 | 0.69 | 0.35 | 0.47 | |
| bTSSfinder ( | 1449 | 2121 | 974 | 0.59 | 0.40 | 0.48 | |
| bTSSfinder (Cyanob.) | 1166 | 2404 | 1151 | 0.50 | 0.32 | 0.39 | |
| G4PromFinder | 1682 | 563 | 623 | 0.73 | 0.74 | 0.74 | |
| PromPredict | 1351 | 813 | 549 | 0.71 | 0.62 | 0.66 | |
| PePPER | 2015 | 1383 | 954 | 0.67 | 0.59 | 0.63 | |
| bTSSfinder ( | 923 | 1194 | 497 | 0.65 | 0.43 | 0.52 | |
| bTSSfinder (Cyanob.) | 687 | 1430 | 685 | 0.50 | 0.32 | 0.39 |
Fig. 3Distribution of validated promoters in S. coelicolor A3(2) (a) and P. aeruginosa PA14 (b) as a function of their distance from the TSSs obtained by dRNAseq experiments and used for validation.Predicted promoters are grouped based on distances between the AT-rich element 3′-end points and the annotated TSS. A: predicted promoters in S. coelicolor A3(2); B: predicted promoters in P. aeruginosa PA14