| Literature DB >> 29745856 |
Wenying He1, Cangzhi Jia2, Yucong Duan3, Quan Zou4.
Abstract
BACKGROUND: Promoter is an important sequence regulation element, which is in charge of gene transcription initiation. In prokaryotes, σ70 promoters regulate the transcription of most genes. The promoter recognition has been a crucial part of gene structure recognition. It's also the core issue of constructing gene transcriptional regulation network. With the successfully completion of genome sequencing from an increasing number of microbe species, the accurate identification of σ70 promoter regions in DNA sequence is not easy.Entities:
Keywords: PSTNPSS; PseEIIP; SVM; sigma70 promoter
Mesh:
Substances:
Year: 2018 PMID: 29745856 PMCID: PMC5998878 DOI: 10.1186/s12918-018-0570-1
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1Overview of the proposed 70ProPred predictor. The diagram mainly contains datasets, sequence descriptors and 70ProPred prediction system. The optimal encoding combination PSTNPSS and PseEIIP are used as the input to train a SVM classifier. After optimization of the SVM parameters, the best SVM model is constructed based on the jackknife performance
Fig. 2Motif of σ70 promoter samples as found by the MEME system. The corresponding three-motif logos as visualized for σ70 promoter samples (details in Table 1)
Conserved motifs of σ70 promoter samples identified by the MEME system
| Motif | Width | Best possible match | Sites count |
|---|---|---|---|
| 1 | 50 | YTKRMMWNNBNRGNVGVAMTSCGTATWATGCGCCYCCNYBVMCVCGKRVV | 47 |
| 2 | 21 | ATBGTTATCRATHWHATTDKC | 20 |
| 3 | 38 | KKATATTGMHGTTRRWATDAWTAGTMTWAATGCSGCTT | 10 |
Fig. 3promoter-1 converted into promoter-1 AC
EIIP values of nucleotides
| Nucleotide | EIIP(Ry) |
|---|---|
| A | 0.1260 |
| T | 0.1335 |
| G | 0.0806 |
| C | 0.1340 |
Jackknife test performance of PSTNPSS and PSTNPDS
| Features | Sn (%) | Sp (%) | Acc (%) | MCC | SVM |
|---|---|---|---|---|---|
| PSTNPSS (79) | 90.82 | 96.57 | 94.58 | 0.8797 | -c 22.6274 |
| PSTNPDS (79) | 75.98 | 88.57 | 84.21 | 0.6493 | -c 1.4142 |
Fig. 4F-score value of trinucleotides in both PSTNPSS and PSTNPDS
Fig. 5Entropy of trinucleotide in the σ70 promoter and non-promoter
Performances of our model on the jackknife test
| Features | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| PSTNPSS (79) | 90.82 | 96.57 | 94.58 | 0.8797 |
| PSTNPSS (79) + PseEIIP (64) | 93.12 | 96.86 | 95.56 | 0.9018 |
Fig. 6The ROC curves to assess the predictive performance based on different sequences encoding schemes for σ70promoter
Fig. 7A heat map for the F-score values of the 64 trinucleotides with different EIIP values. The blue boxes indicate the features with a lower effect for recognition of the σ70 promoter, while the red boxes indicate the features that are useful for the recognition of the σ70promoter
Comparison of different classifiers for identifying σ70promoter
| Classifier | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| KNN (8) | 87.04 | 96.21 | 93.04 | 0.8450 |
| Naïve Bayes | 91.90 | 89.00 | 90.00. | 0.7891 |
| Random Forest (200) | 85.29 | 97.79 | 93.46 | 0.8548 |
| Ensembles for Boosting (200) | 89.88 | 95.29 | 93.41 | 0.8541 |
| LibD3C | 77.33 | 87.57 | 84.03 | 0.6478 |
| GBDT | 86.50 | 96.14 | 92.81 | 0.8397 |
| SVM | 93.12 | 96.86 | 95.56 | 0.9018 |
performances of our model, Z-curve, PSTNPDS, PseZNC and IPMD on 5-fold cross-validation
| Methods | Sn (%) | Sp (%) | Acc (%) | MCC | AUC |
|---|---|---|---|---|---|
| Z-curve | 74.6 | 79.5 | 77.8 | 0.527 | 0.848 |
| PSTNPDS | 75.9 | 88.0 | 83.8 | 0.641 | 0.911 |
| PseZNC | 80.3 | 86.8 | 84.5 | 0.663 | 0.909 |
| IPMD | 82.4 | 90.7 | 87.9 | 0.731 | – |
| 70ProPred | 92.4 | 96.9 | 95.3 | 0.897 | 0.990 |