| Literature DB >> 34912382 |
Yuran Jia1, Shan Huang2, Tianjiao Zhang1.
Abstract
DNA-binding protein (DBP) is a protein with a special DNA binding domain that is associated with many important molecular biological mechanisms. Rapid development of computational methods has made it possible to predict DBP on a large scale; however, existing methods do not fully integrate DBP-related features, resulting in rough prediction results. In this article, we develop a DNA-binding protein identification method called KK-DBP. To improve prediction accuracy, we propose a feature extraction method that fuses multiple PSSM features. The experimental results show a prediction accuracy on the independent test dataset PDB186 of 81.22%, which is the highest of all existing methods.Entities:
Keywords: DNA-binding protein; feature extraction; multi-feature fusion; position specificity score matrix; random forest
Year: 2021 PMID: 34912382 PMCID: PMC8667860 DOI: 10.3389/fgene.2021.811158
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Framework of KK-DBP. Step A: Construction of Position Specificity Score Matrices for protein sequences. Step B: Extraction of three features: AADP-PSSM, PSSM-COMPOSITION, and RPSSM as the initial feature set for a single sample. Step C: Feature ranking and selection using the MRMD algorithm. Step D: Identification of DBP using random forests.
benchmark datasets used in this paper.
| Data set | PDB1075 | PDB186 |
| Positive | 525 | 93 |
| Negative | 550 | 93 |
| Total | 1075 | 186 |
FIGURE 2ROC curves with different combinations of features on PDB1075.
FIGURE 3Prediction accuracy curve of feature subset.
FIGURE 4Performance of training set PDB1075 on different classifiers.
Performance of this method and other existing methods on PDB186.
| Methods | ACC (%) | MCC | SN (%) | SP (%) |
| IDNA-Prot|dis | 72.0 | 0.445 | 79.5 | 64.5 |
| DBPPred | 76.9 | 0.538 | 79.6 | 74.2 |
| IDNA-Prot | 67.2 | 0.344 | 67.7 | 66.7 |
| DNA-Prot | 61.8 | 0.240 | 69.9 | 53.8 |
| DNAbinder | 60.8 | 0.216 | 57.0 | 64.5 |
| iDNAPro-PseAAC | 71.5 | 0.442 | 82.8 | 60.2 |
| Kmer1+ACC | 71.0 | 0.431 | 82.8 | 59.1 |
| Local-DPP | 79.0 | 0.625 | 92.5 | 65.6 |
| SVM-based method | 75.3 | 0.560 | 96.8 | 53.8 |
| KK-DBP | 81.2 | 0.661 | 97.8 | 64.5 |