| Literature DB >> 27941893 |
Abstract
Enhancers are cis elements that play an important role in regulating gene expression by enhancing it. Recent study of modifications revealed that enhancers are a large group of functional elements with many different subgroups, which have different biological activities and regulatory effects on target genes. As powerful auxiliary tools, several computational methods have been proposed to distinguish enhancers from other regulatory elements, but only one method has been considered to clustering them into subgroups. In this study, we developed a predictor (called EnhancerPred) to distinguish between enhancers and nonenhancers and to determine enhancers' strength. A two-step wrapper-based feature selection method was applied in high dimension feature vector from bi-profile Bayes and pseudo-nucleotide composition. Finally, the combination of 104 features from bi-profile Bayes, 1 feature from nucleotide composition and 9 features from pseudo-nucleotide composition yielded the best performance for identifying enhancers and nonenhancers, with overall Acc of 77.39%. The combination of 89 features from bi-profile Bayes and 10 features from pseudo-nucleotide composition yielded the best performance for identifying strong and weak enhancers, with overall Acc of 68.19%. The process and steps of feature optimization illustrated that it is necessary to construct a particular model for identifying strong enhancers and weak enhancers.Entities:
Year: 2016 PMID: 27941893 PMCID: PMC5150536 DOI: 10.1038/srep38741
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The prediction performance at different thresholds of F-score for layer I.
Figure 2The prediction performance on different dimensions of BPB feature vector for layer I.
The best performance of EnhancerPred in jackknife test.
| Layer | Features | Sn(%) | Sp(%) | Acc(%) | MCC |
|---|---|---|---|---|---|
| I | BPB(104) | 70.96 | 83.02 | 76.99 | 0.54 |
| BPB(104) + NC(1) | 71.02 | 83.02 | 77.02 | 0.54 | |
| BPB(104) + NC(1) + PseNC(9) | 71.97 | 82.82 | 77.39 | 0.55 | |
| II | BPB(89) | 69.41 | 65.23 | 67.32 | 0.35 |
| BPB(89) + PseNC(10) | 71.16 | 65.23 | 68.19 | 0.36 |
Figure 3F-score values of NC in both layer I and layer II.
Comparison of different classifiers for identifying enhancers and their strength.
| Layer | Classifier | Sn(%) | Sp (%) | Acc(%) | MCC |
|---|---|---|---|---|---|
| I | KNN(23) | 59.43 | 89.82 | 74.63 | 0.52 |
| Naïve Bayes | 75.27 | 76.42 | 75.84 | 0.52 | |
| Random Forest | 73.25 | 76.75 | 75.00 | 0.50 | |
| Ensembles for Boosting | 73.99 | 75.07 | 74.53 | 0.49 | |
| GBDT | 75.81 | 73.45 | 74.63 | 0.49 | |
| libD3C | 66.44 | 63.41 | 64.93 | 0.30 | |
| SVM | 71.97 | 82.82 | 77.39 | 0.55 | |
| II | KNN(45) | 67.79 | 64.56 | 66.17 | 0.32 |
| Naïve Bayes | 74.93 | 58.76 | 66.85 | 0.34 | |
| Random Forest | 66.85 | 59.16 | 63.01 | 0.26 | |
| Ensembles for Boosting | 69.68 | 61.05 | 65.36 | 0.31 | |
| GBDT | 60.51 | 68.19 | 64.35 | 0.29 | |
| libD3C | 55.53 | 54.18 | 54.85 | 0.10 | |
| SVM | 71.16 | 65.23 | 68.19 | 0.36 |
Results of the comparison of EnhancerPred with the predictor iEnhancer-2L on the jackknife test.
| Layer | Methods | Sn(%) | Sp(%) | Acc(%) | MCC |
|---|---|---|---|---|---|
| I | iEnhancer-2L | 78.09 | 75.88 | 76.89 | 0.54 |
| Our method | 71.97 | 82.82 | 77.39 | 0.55 | |
| II | iEnhancer-2L | 62.21 | 61.82 | 61.93 | 0.24 |
| Our method | 71.16 | 65.23 | 68.19 | 0.36 |