| Literature DB >> 18042272 |
Manish Kumar1, Michael M Gromiha, Gajendra P S Raghava.
Abstract
BACKGROUND: Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation, as these proteins play a crucial role in gene-regulation. In this paper, we developed various SVM modules for predicting DNA-binding domains and proteins. All models were trained and tested on multiple datasets of non-redundant proteins.Entities:
Mesh:
Substances:
Year: 2007 PMID: 18042272 PMCID: PMC2216048 DOI: 10.1186/1471-2105-8-463
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The performance of SVM models developed using different types of compositions. These models were trained and tested on DNAset, a dataset of DNA-binding and non-binding protein domains/chains.
| 0.3 | 78.11 | 80.80 | 79.80 | 0.58 | |
| 0.0 | 73.35 | 77.60 | 76.01 | 0.50 | |
| 0.0 | 77.45 | 77.60 | 77.53 | 0.54 | |
| 0.1 | 86.32 | 86.80 | 86.62 | 0.72 |
Figure 1Percentage composition of DNA-binding and non-binding proteins in main dataset (DNAset).
Figure 2Performance of SVM models on DNAset dataset (146 DNA-binding and 250 non-binding proteins) in the form of ROC plot.
The performance of SVM models developed on DNAaset and evaluated using five-fold cross-validation technique.
| 0.2 | 72.51 | 72.33 | 72.42 | 0.45 | |
| 0.1 | 72.59 | 70.59 | 71.59 | 0.43 | |
| 0.0 | 70.85 | 70.24 | 70.55 | 0.41 | |
| -0.3 | 73.53 | 74.92 | 74.22 | 0.49 |
The performance of PSSM based SVM models on proteins with and without PSI-BLAST hits at e-value 0.1 against DNAset. These models were trained and tested on DNAset.
| -1 | 95.83 | 48.21 | 97.54 | 42.78 |
| -0.9 | 95.83 | 53.57 | 96.72 | 44.33 |
| -0.8 | 95.83 | 53.57 | 96.72 | 45.88 |
| -0.7 | 95.83 | 57.14 | 95.90 | 48.97 |
| -0.6 | 91.67 | 62.50 | 95.90 | 51.55 |
| -0.5 | 91.67 | 64.29 | 95.08 | 54.12 |
| -0.4 | 91.67 | 66.07 | 93.44 | 57.73 |
| -0.3 | 87.50 | 66.07 | 92.62 | 60.31 |
| -0.2 | 83.33 | 69.64 | 91.80 | 63.40 |
| -0.1 | 83.33 | 73.21 | 89.34 | 66.49 |
| 0 | 83.33 | 76.79 | 86.89 | 69.07 |
| 0.1 | 83.33 | 82.14 | 83.61 | 71.65 |
| 0.2 | 83.33 | 83.93 | 80.33 | 75.77 |
| 0.4 | 75.00 | 85.71 | 73.77 | 81.44 |
| 0.5 | 75.00 | 85.71 | 69.67 | 82.99 |
| 0.6 | 75.00 | 89.29 | 64.75 | 85.05 |
| 0.7 | 75.00 | 92.86 | 60.66 | 88.66 |
| 0.8 | 62.50 | 94.64 | 56.56 | 89.69 |
| 0.9 | 62.50 | 96.43 | 50.00 | 91.75 |
| 1 | 54.17 | 96.43 | 45.08 | 93.30 |
The Performance of SVM models using amino acid and PSSM profiles on a realistic dataset (DNArset).
| -1.00 | 80.21 | 63.67 | 65.13 | 0.26 | 91.75 | 77.47 | 78.73 | 0.44 |
| -0.90 | 77.47 | 67.00 | 67.92 | 0.26 | 89.70 | 78.60 | 79.58 | 0.44 |
| -0.80 | 76.78 | 70.67 | 71.20 | 0.29 | 87.66 | 80.20 | 80.86 | 0.45 |
| -0.70 | 74.74 | 74.40 | 74.43 | 0.31 | 85.61 | 81.60 | 81.95 | 0.45 |
| -0.60 | 74.05 | 77.07 | 76.79 | 0.33 | 84.23 | 83.20 | 83.29 | 0.46 |
| -0.50 | 71.98 | 79.27 | 78.62 | 0.34 | 78.73 | 84.20 | 83.72 | 0.44 |
| -0.40 | 69.91 | 81.13 | 80.13 | 0.34 | 78.05 | 85.67 | 84.99 | 0.46 |
| -0.30 | 69.22 | 83.60 | 82.32 | 0.37 | 74.60 | 87.27 | 86.15 | 0.46 |
| -0.20 | 65.79 | 85.73 | 83.96 | 0.38 | 73.22 | 89.07 | 87.66 | 0.48 |
| -0.10 | 60.99 | 87.40 | 85.06 | 0.37 | 71.15 | 91.00 | 89.24 | 0.51 |
| 0.00 | 58.23 | 89.13 | 86.39 | 0.38 | 70.46 | 92.27 | 90.34 | 0.53 |
| 0.10 | 53.40 | 90.40 | 87.12 | 0.37 | 68.41 | 93.60 | 91.37 | 0.55 |
| 0.20 | 50.02 | 92.20 | 88.46 | 0.38 | 65.68 | 94.60 | 92.04 | 0.56 |
| 0.40 | 44.53 | 93.93 | 89.55 | 0.38 | 60.16 | 95.87 | 92.71 | 0.56 |
| 0.50 | 41.79 | 94.60 | 89.91 | 0.38 | 53.31 | 96.60 | 92.77 | 0.53 |
| 0.60 | 37.70 | 95.13 | 90.04 | 0.35 | 51.29 | 97.07 | 93.01 | 0.54 |
| 0.70 | 32.90 | 95.73 | 90.16 | 0.33 | 47.86 | 97.53 | 93.13 | 0.52 |
| 0.80 | 30.83 | 96.40 | 90.58 | 0.33 | 42.39 | 97.87 | 92.95 | 0.49 |
| 0.90 | 30.16 | 97.00 | 91.07 | 0.35 | 38.30 | 98.40 | 93.07 | 0.48 |
| 1.00 | 28.78 | 97.33 | 91.25 | 0.35 | 34.87 | 98.67 | 93.01 | 0.47 |
* Sn: Sensitivity; Sp: Specificity; Acc: Accuracy
Figure 3Schematic representation of algorithm used to convert 21*N dimensional PSSM into PSSM-400.