| Literature DB >> 31536883 |
Abstract
Promoters are short regions at specific locations of DNA sequences, which are playing key roles in directing gene transcription. They can be grouped into six types (σ24,σ28,σ32,σ38,σ54,σ70). Recently, a predictor called "iPromoter-2L" was constructed to predict the promoters and their six types, which is the first approach to predict all the six types of promoters. However, its predictive quality still needs to be further improved for real-world application requirement. In this study, we proposed the smoothing cutting window algorithm to find the window fragments of the DNA sequences based on the conservation scores to capture the sequence patterns of promoters. For each window fragment, the discriminative features were extracted by using kmer and PseKNC. Combined with support vector machines (SVMs), different predictors were constructed and then clustered into several groups based on their distances. Finally, a new predictor called iPromoter-2L2.0 was constructed to identify the promoters and their six types, which was developed by ensemble learning based on the key predictors selected from the cluster groups. The results showed that iPromoter-2L2.0 outperformed other existing methods for both promoter prediction and identification of their six types, indicating that iPromoter-2L2.0 will be helpful for genomics analysis.Entities:
Keywords: ensemble learning; promoter; smoothing cutting window algorithm
Year: 2019 PMID: 31536883 PMCID: PMC6796744 DOI: 10.1016/j.omtn.2019.08.008
Source DB: PubMed Journal: Mol Ther Nucleic Acids ISSN: 2162-2531 Impact factor: 8.886
A Comparison of iPromoter-2L2.0 with Other Predictors for Identifying Promoters (the First Layer) and Their Types (the Second Layer) via the 5-fold Cross-Validation on the Same Benchmark Dataset
| Method | Acc (%) | MCC | Sn (%) | Sp (%) |
|---|---|---|---|---|
| PCSF | 74.81 | 0.4980 | 78.92 | 70.70 |
| vw Z-curve | 80.28 | 0.6098 | 77.76 | 82.80 |
| Stability | 78.04 | 0.5615 | 76.61 | 79.48 |
| iPro54 | 80.45 | 0.6100 | 77.76 | 83.15 |
| iPromoter-2L1.0 | 81.68 | 0.6343 | 79.20 | 84.16 |
| iPromoter-2L2.0 | 84.98 | 0.6998 | 84.13 | 85.84 |
| iPromoter-2L1.0 | ||||
| σ24 promoter | 93.50 | 0.7338 | 72.52 | 96.93 |
| σ28 promoter | 96.82 | 0.5708 | 42.54 | 99.49 |
| σ32 promoter | 94.41 | 0.6524 | 52.58 | 99.14 |
| σ38 promoter | 94.69 | 0.2962 | 15.34 | 99.48 |
| σ54 promoter | 94.04 | 0.6459 | 53.19 | 99.57 |
| σ70 promoter | 80.66 | 0.6056 | 95.34 | 59.35 |
| iPromoter-2L2.0 | ||||
| σ24 promoter | 94.62 | 0.8053 | 81.82 | 97.22 |
| σ28 promoter | 97.94 | 0.7561 | 71.64 | 99.23 |
| σ32 promoter | 95.38 | 0.7361 | 71.82 | 98.05 |
| σ38 promoter | 94.58 | 0.2242 | 7.36 | 99.85 |
| σ54 promoter | 98.11 | 0.6714 | 59.57 | 99.42 |
| σ70 promoter | 85.94 | 0.7109 | 95.22 | 72.47 |
See Equation 1. Acc, accuracy; Sn, sensitivity; Sp, specificity.
The results reported in Liu et al.
The predictor proposed in this study.
Figure 1A Screenshot of the Homepage of the Web Server for iPromoter-2L2.0
iPromoter-2L2.0 can be accessed at http://bliulab.net/iPromoter-2L2.0/.
Figure 2A Flowchart Shows the Steps of the Proposed Smoothing Cutting Window Algorithm for the First-Layer Prediction
The standard deviations shown in (A) are converted into the smooth standard deviations as shown in (B), based on which the DNA sequences are divided into several fragments, as shown in (C).
Figure 3A Flowchart Shows the Process of the Proposed Smoothing Cutting Window Algorithm for the Second-Layer Prediction
The SDs shown in (A) are converted into the smooth SDs as shown in (B), based on which the DNA sequences are divided into several fragments, as shown in (C).
Figure 4A Flowchart Shows How iPromoter-2L2.0 Is Working
The Six Key Classifiers for the First-Layer Prediction
| Key Classifier | Feature Vector | Dimension |
|---|---|---|
| kmer | 768 | |
| kmer | 396 | |
| kmer | 2,880 | |
| kmer | 624 | |
| PseKNC | 1,080 | |
| PseKNC | 11,880 | |
| PseKNC | 46,440 | |
| PseKNC | 1,566 | |
| PseKNC | 2,808 | |
| PseKNC | 729 |
The parameters used: , , k = 1, , .
The parameters used: , k = 1, , .
The parameters used: , , k = 2, , .
The parameters used: , , k = 1, , .
The parameters used: , , k = 1, λ = 2, w = 0.5,
The parameters used: , , k = 3, λ = 2, w = 0.5, .
The parameters used: , , k = 4, λ = 2, w = 0.5,
The parameters used: , , k = 2, λ = 2, w = 0.5,
The parameters used: , , k = 2, λ = 2, w = 0.5,
The parameters used: , , k = 1, λ = 5, w = 0.5,
The 10 Key Classifiers for the Second-Layer Prediction
| Key Classifier | Feature Vector | Dimension |
|---|---|---|
| kmer | 1,584 | |
| kmer | 2,688 | |
| PseKNC | 11,880 | |
| PseKNC | 1,008 | |
| PseKNC | 3,528 | |
| PseKNC | 1,566 | |
| PseKNC | 2,808 | |
| PseKNC | 729 | |
| PseKNC | 1,296 |
The parameters used: , , k = 2, , .
The parameters used: , , k = 2, , .
The parameters used: , , k = 3, λ = 2, w = 0.5,
The parameters used: , , k = 1, λ = 2, w = 0.5,
The parameters used: , , k = 2, λ = 5, w = 0.5,
The parameters used: , , k = 2, λ = 2, w = 0.5,
The parameters used: , , k = 2, λ = 2, w = 0.5,
The parameters used: , , k = 1, λ = 5, w = 0.5,
The parameters used were as follows: , , k = 1, λ = 5, w = 0.5,