| Literature DB >> 29270430 |
Rianon Zaman1, Shahana Yasmin Chowdhury1, Mahmood A Rashid2,3, Alok Sharma3,4,5, Abdollah Dehzangi6, Swakkhar Shatabda1.
Abstract
DNA-binding proteins often play important role in various processes within the cell. Over the last decade, a wide range of classification algorithms and feature extraction techniques have been used to solve this problem. In this paper, we propose a novel DNA-binding protein prediction method called HMMBinder. HMMBinder uses monogram and bigram features extracted from the HMM profiles of the protein sequences. To the best of our knowledge, this is the first application of HMM profile based features for the DNA-binding protein prediction problem. We applied Support Vector Machines (SVM) as a classification technique in HMMBinder. Our method was tested on standard benchmark datasets. We experimentally show that our method outperforms the state-of-the-art methods found in the literature.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29270430 PMCID: PMC5706079 DOI: 10.1155/2017/4590609
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1System diagram of HMMBinder.
Comparison of performances of different features and SVM kernels on the benchmark dataset using 10-fold cross validation.
| Features | Accuracy | Sensitivity | Specificity | auPR | MCC | auROC |
|---|---|---|---|---|---|---|
|
| ||||||
|
| ||||||
| HMM-Monogram | 76.77% |
| 0.6976 | 0.6931 | 0.5367 | 0.8358 |
| PSSM-Monogram | 74.74% | 0.6636 | 0.8362 | 0.8368 | 0.5040 | 0.8105 |
|
| ||||||
| HMM-Bigram | 70.59% | 0.7071 | 0.7049 | 0.7060 | 0.4095 | 0.7511 |
| PSSM-Bigram | 62.20% | 0.6454 | 0.5973 | 0.6025 | 0.2502 | 0.6703 |
|
| ||||||
| HMM (Mono + Bi) |
| 0.8150 |
|
|
|
|
| PSSM (Mono + Bi) | 72.40% | 0.7364 | 0.7120 | 0.7136 | 0.4486 | 0.8028 |
|
| ||||||
|
| ||||||
|
| ||||||
| HMM-Monogram |
|
| 0.7559 | 0.7535 |
|
|
| PSSM-Monogram | 73.71% | 0.6890 | 0.7880 | 0.7903 | 0.4771 | 0.8121 |
|
| ||||||
| HMM-Bigram | 76.68% | 0.7052 |
| 0.8253 | 0.5283 | 0.8318 |
| PSSM-Bigram | 74.92% | 0.7490 | 0.7495 | 0.7516 | 0.4966 | 0.8166 |
|
| ||||||
| HMM (Mono + Bi) | 77.43% | 0.7129 | 0.8324 |
| 0.5440 | 0.8496 |
| PSSM (Mono + Bi) | 72.40% | 0.7363 | 0.7120 | 0.7136 | 0.4486 | 0.8028 |
|
| ||||||
|
| ||||||
|
| ||||||
| HMM-Monogram |
|
|
|
|
| 0.8243 |
| PSSM-Monogram | 66.14% | 0.7290 | 0.5895 | 0.5862 | 0.3173 | 0.7332 |
|
| ||||||
| HMM-Bigram | 72.19% | 0.7553 | 0.6903 | 0.6880 | 0.4400 |
|
| PSSM-Bigram | 71.00% | 0.7854 | 0.6300 | 0.6305 | 0.4174 | 0.7833 |
|
| ||||||
| HMM (Mono + Bi) | 74.43% |
|
| 0.6931 |
| 0.8218 |
| PSSM (Mono + Bi) | 72.68% | 0.7909 | 0.6589 | 0.6645 | 0.4557 | 0.7698 |
|
| ||||||
|
| ||||||
|
| ||||||
| HMM-Monogram | 73.31% | 0.7013 | 0.7632 | 0.7603 | 0.4579 | 0.8026 |
| PSSM-Monogram | 67.07% | 0.7654 | 0.5703 | 0.5737 | 0.3448 | 0.7157 |
|
| ||||||
| HMM-Bigram | 73.97% | 0.7360 | 0.7432 | 0.7396 | 0.4762 | 0.8063 |
| PSSM-Bigram | 70.53% | 0.7436 | 0.6647 | 0.6708 | 0.4116 | 0.7710 |
|
| ||||||
| HMM (Mono + Bi) |
|
|
|
|
|
|
| PSSM (Mono + Bi) | 70.07% | 0.7327 | 0.6666 | 0.6687 | 0.4005 | 0.7887 |
Figure 2Using monogram features. Receiver operating characteristic curves for (a) SVM linear kernel classifier using HMM-Monogram features, (b) SVM linear kernel classifier using PSSM-Monogram features, (c) SVM RBF kernel classifier using HMM-Monogram features, and (d) SVM RBF kernel classifier using PSSM-Monogram features.
Figure 3Using bigram features. Receiver operating characteristic curves for (a) SVM linear kernel classifier using HMM-Bigram features, (b) SVM linear kernel classifier using PSSM-Bigram features, (c) SVM RBF kernel classifier using HMM-Bigram features, and (d) SVM RBF kernel classifier using PSSM-Bigram features.
Figure 4Using (Mono + Bi)gram features. Receiver operating characteristic curves for (a) SVM linear kernel classifier using HMM-Mono + Bigram features, (b) SVM linear kernel classifier using PSSM-Mono + Bigram features, (c) SVM RBF kernel classifier using HMM-Mono + Bigram features, and (d) SVM RBF kernel classifier using PSSM-Mono + Bigram features.
Comparison of performance of the proposed method with other state-of-the-art predictors using jack-knife test on the benchmark dataset.
| Method | Accuracy | Sensitivity | Specificity | MCC | auROC |
|---|---|---|---|---|---|
| iDNAPro-PseAAC | 76.76% | 0.7562 | 0.7745 | 0.53 | 0.8392 |
| DNABinder (dimension 21) | 73.95% | 0.6857 | 0.7909 | 0.48 | 0.8140 |
| DNABinder (dimension 400) | 73.58% | 0.6647 | 0.8036 | 0.47 | 0.8150 |
| DNA-Prot | 72.55% | 0.8267 | 0.5976 | 0.44 | 0.7890 |
| iDNA-Prot | 75.40% | 0.8381 | 0.6473 | 0.50 | 0.7610 |
| iDNA-Prot|dis | 77.30% | 0.7940 | 0.7527 | 0.54 | 0.8310 |
| PseDNA-Pro | 76.55% | 0.7961 | 0.7363 | 0.53 | — |
| Kmer1 + ACC | 75.23% | 0.7676 | 0.7376 | 0.50 | 0.8280 |
| Local-DPP | 79.20% | 0.8400 | 0.7450 | 0.59 | — |
| HMMBinder |
|
|
|
|
|
Comparison of performance of the proposed method with other state-of-the-art predictors on the independent dataset.
| Method | Accuracy | Sensitivity | Specificity | MCC | auROC |
|---|---|---|---|---|---|
| iDNAPro-PseAAC | 69.89% | 0.7741 | 0.6237 | 0.402 | 0.7754 |
| iDNA-Prot | 67.20% | 0.6770 | 0.6670 | 0.344 | — |
| DNA-Prot | 61.80% | 0.6990 | 0.5380 | 0.240 | — |
| DNABinder | 60.80% | 0.5700 | 0.6450 | 0.216 | 0.6070 |
| DNABIND | 67.70% | 0.6670 | 0.6880 | 0.355 | 0.6940 |
| DNA-Threader | 59.70% | 0.2370 |
| 0.279 | — |
| DBPPred | 76.90% | 0.7960 | 0.7420 | 0.538 | 0.7910 |
| iDNA-Prot|dis | 72.00% | 0.7950 | 0.6450 | 0.445 |
|
| Kmer1 + ACC | 70.96% | 0.8279 | 0.5913 | 0.431 | 0.7520 |
| Local-DPP |
|
| 0.6560 |
| — |
| HMMBinder | 69.02% | 0.6153 | 0.7634 | 0.394 | 0.6324 |