| Literature DB >> 25189131 |
Dong-Jun Yu1, Jun Hu, Hui Yan, Xi-Bei Yang, Jing-Yu Yang, Hong-Bin Shen.
Abstract
BACKGROUND: Vitamins are typical ligands that play critical roles in various metabolic processes. The accurate identification of the vitamin-binding residues solely based on a protein sequence is of significant importance for the functional annotation of proteins, especially in the post-genomic era, when large volumes of protein sequences are accumulating quickly without being functionally annotated.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25189131 PMCID: PMC4261549 DOI: 10.1186/1471-2105-15-297
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Compositions of the training datasets and the corresponding independent validation datasets for the 4 types of vitamin-interacting benchmark datasets
| Dataset | Training Dataset | Independent Validation Dataset | Total No. of Sequences | ||
|---|---|---|---|---|---|
| No. of Sequences | (numP, numN) * | No. of Sequences | (numP, numN) * | ||
| DVI | 187 | (3016, 62122) | 46 | (654, 11676) | 233 |
| DVAI | 31 | (538, 7376) | 15 | (181, 1441) | 46 |
| DVBI | 141 | (2219, 50179) | 27 | (419, 8947) | 168 |
| DPLPI | 71 | (1092, 26638) | 16 | (246, 5935) | 87 |
*numP and numN represent the numbers of positive (binding) and negative (non-binding) samples, respectively.
Figure 1Workflow of the proposed TargetVita. Predicted binding residues, modelled 3D structure, and the vitamin are highlighted in red, green, and yellow colours, respectively. Arrows highlighted in blue colour denote the workflows in the training stage, while arrows in black colour denote the workflows in the prediction stage.
Performance comparisons between residue- and sequence-level five-fold cross-validations on DVI, DVAI, DVBI, and DPLPI under
| Dataset | Method |
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| DVI | VitaPred* | 78.52 | 78.61 | 78.60 | 0.37 | 0.87 | - | - | - | - |
| SVM-R◇ | 77.88 | 81.34 | 81.18 | 0.30 | 0.87 | 2349 | 50530 | 11592 | 667 | |
| SVM-S△ | 77.65 | 80.16 | 80.04 | 0.29 | 0.87 | 2342 | 49797 | 12325 | 674 | |
| DVAI | VitaPred* | 72.70 | 76.89 | 76.51 | 0.32 | 0.83 | - | - | - | - |
| SVM-R◇ | 73.98 | 77.94 | 77.67 | 0.30 | 0.85 | 398 | 5749 | 1627 | 140 | |
| SVM-S△ | 72.12 | 76.34 | 76.06 | 0.28 | 0.82 | 388 | 5631 | 1745 | 150 | |
| DVBI | VitaPred* | 83.33 | 80.51 | 80.77 | 0.42 | 0.90 | - | - | - | - |
| SVM-R◇ | 80.44 | 83.83 | 83.68 | 0.33 | 0.90 | 1785 | 42063 | 8116 | 434 | |
| SVM-S△ | 79.86 | 82.90 | 82.77 | 0.32 | 0.89 | 1772 | 41598 | 8581 | 447 | |
| DPLPI | VitaPred* | 90.20 | 92.61 | 92.40 | 0.67 | 0.97 | - | - | - | - |
| SVM-R◇ | 91.48 | 93.38 | 93.30 | 0.55 | 0.97 | 999 | 24874 | 1764 | 93 | |
| SVM-S△ | 90.38 | 92.62 | 92.53 | 0.52 | 0.96 | 987 | 24672 | 1966 | 105 |
*Data obtained from [30].
◇SVM-R: The re-implementation of VitaPred over residue-level cross-validation.
△SVM-S: The re-implementation of VitaPred over sequence-level cross-validation.
Performance comparisons between residue- and sequence-level five-fold cross-validations on DVI, DVAI, DVBI, and DPLPI under
| Dataset | Method |
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| DVI | VitaPred* | 52.19 | 96.79 | 92.73 | 0.53 | 0.87 | - | - | - | - |
| SVM-R◇ | 52.62 | 98.29 | 96.18 | 0.54 | 0.87 | 1586 | 61063 | 1059 | 1430 | |
| SVM-S△ | 52.29 | 98.32 | 96.19 | 0.54 | 0.87 | 1577 | 61076 | 1046 | 1439 | |
| DVAI | VitaPred* | 42.75 | 97.51 | 92.54 | 0.48 | 0.83 | - | - | - | - |
| SVM-R◇ | 43.49 | 96.39 | 92.80 | 0.41 | 0.85 | 234 | 7110 | 266 | 304 | |
| SVM-S△ | 40.15 | 96.39 | 92.57 | 0.39 | 0.82 | 216 | 7109 | 267 | 322 | |
| DVBI | VitaPred* | 55.57 | 98.04 | 94.18 | 0.61 | 0.90 | - | - | - | - |
| SVM-R◇ | 58.77 | 98.45 | 96.77 | 0.59 | 0.90 | 1304 | 49401 | 778 | 915 | |
| SVM-S△ | 58.18 | 98.40 | 96.69 | 0.58 | 0.89 | 1291 | 49373 | 806 | 928 | |
| DPLPI | VitaPred* | 79.76 | 98.62 | 96.91 | 0.81 | 0.97 | - | - | - | - |
| SVM-R◇ | 79.67 | 99.19 | 98.42 | 0.79 | 0.97 | 870 | 26422 | 216 | 222 | |
| SVM-S△ | 80.86 | 99.07 | 98.36 | 0.79 | 0.96 | 883 | 26391 | 247 | 209 |
*Data excerpted from [30].
◇SVM-R: The re-implementation of VitaPred over residue-level cross-validation.
△SVM-S: The re-implementation of VitaPred over sequence-level cross-validation.
Performance comparisons between + and features on DVI, DVAI, DVBI, and DPLPI datasets over five-fold sequence-level cross-validation under
| Dataset | Feature |
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| DVI |
| 77.65 | 80.16 | 80.04 | 0.29 | 0.87 | 2342 | 49797 | 12325 | 674 |
|
| 78.55 | 82.02 | 81.86 | 0.31 | 0.88 | 2369 | 50951 | 11171 | 647 | |
| DVAI |
| 72.12 | 76.34 | 76.06 | 0.28 | 0.82 | 388 | 5631 | 1745 | 150 |
|
| 72.12 | 78.28 | 77.86 | 0.29 | 0.84 | 388 | 5774 | 1602 | 150 | |
| DVBI |
| 79.86 | 82.90 | 82.77 | 0.32 | 0.89 | 1772 | 41598 | 8581 | 447 |
|
| 80.71 | 85.14 | 84.96 | 0.35 | 0.90 | 1791 | 42724 | 7455 | 428 | |
| DPLPI |
| 90.38 | 92.62 | 92.53 | 0.52 | 0.96 | 987 | 24672 | 1966 | 105 |
|
| 91.48 | 93.09 | 93.03 | 0.54 | 0.97 | 999 | 24798 | 1840 | 93 |
Performance comparisons between + and on DVI, DVAI, DVBI, and DPLPI datasets over five-fold sequence-level cross-validation under
| Dataset | Feature |
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| DVI |
| 52.29 | 98.32 | 96.19 | 0.54 | 0.87 | 1577 | 61076 | 1046 | 1439 |
|
| 53.22 | 98.32 | 96.23 | 0.55 | 0.88 | 1605 | 61080 | 1042 | 1411 | |
| DVAI |
| 40.15 | 96.39 | 92.57 | 0.39 | 0.82 | 216 | 7109 | 267 | 322 |
|
| 44.24 | 96.61 | 93.05 | 0.43 | 0.84 | 238 | 7126 | 250 | 300 | |
| DVBI |
| 58.18 | 98.40 | 96.69 | 0.58 | 0.89 | 1291 | 49373 | 806 | 928 |
|
| 59.58 | 98.33 | 96.69 | 0.59 | 0.90 | 1322 | 49342 | 837 | 897 | |
| DPLPI |
| 80.86 | 99.07 | 98.36 | 0.79 | 0.96 | 883 | 26391 | 247 | 209 |
|
| 81.32 | 99.12 | 98.42 | 0.79 | 0.97 | 888 | 26403 | 235 | 204 |
Figure 2ROC curves of the predictions with and + + features, respectively, on the DVI dataset over sequence-level five-fold cross-validation.
Performance comparisons between with- and without-ensemble on DVI, DVAI, DVBI, and DPLPI datasets over five-fold sequence-level cross-validation under
| Dataset | Ensemble |
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| DVI | No | 78.55 | 82.02 | 81.86 | 0.31 | 0.88 | 2369 | 50951 | 11171 | 647 |
| Yes | 78.45 | 84.17 | 83.90 | 0.34 | 0.89 | 2366 | 52285 | 9837 | 650 | |
| DVAI | No | 72.12 | 78.28 | 77.86 | 0.29 | 0.84 | 388 | 5774 | 1602 | 150 |
| Yes | 72.68 | 79.89 | 79.40 | 0.31 | 0.85 | 391 | 5893 | 1483 | 147 | |
| DVBI | No | 80.71 | 85.14 | 84.96 | 0.35 | 0.90 | 1791 | 42724 | 7455 | 428 |
| Yes | 81.34 | 85.49 | 85.31 | 0.36 | 0.91 | 1805 | 42898 | 7281 | 414 | |
| DPLPI | No | 91.48 | 93.09 | 93.03 | 0.54 | 0.97 | 999 | 24798 | 1840 | 93 |
| Yes | 91.30 | 93.65 | 93.56 | 0.56 | 0.97 | 997 | 24947 | 1691 | 95 |
Figure 3ROC curves of the predictions with ensemble and no ensemble, respectively, on the DVI dataset over sequence-level five-fold cross-validation.
Performance comparisons with SVM-S on DVI, DVAI, DVBI, and DPLPI datasets over five-fold sequence-level cross-validation under
| Dataset | Method |
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| DVI | SVM-S△ | 77.65 | 80.16 | 80.04 | 0.29 | 0.87 | 2342 | 49797 | 12325 | 674 |
| TargetVita | 78.45 | 84.17 | 83.90 | 0.34 | 0.89 | 2366 | 52285 | 9837 | 650 | |
| DVAI | SVM-S△ | 72.12 | 76.34 | 76.06 | 0.28 | 0.82 | 388 | 5631 | 1745 | 150 |
| TargetVita | 72.68 | 79.89 | 79.40 | 0.31 | 0.85 | 391 | 5893 | 1483 | 147 | |
| DVBI | SVM-S△ | 79.86 | 82.90 | 82.77 | 0.32 | 0.89 | 1772 | 41598 | 8581 | 447 |
| TargetVita | 81.34 | 85.49 | 85.31 | 0.36 | 0.91 | 1805 | 42898 | 7281 | 414 | |
| DPLPI | SVM-S△ | 90.38 | 92.62 | 92.53 | 0.52 | 0.96 | 987 | 24672 | 1966 | 105 |
| TargetVita | 91.30 | 93.65 | 93.56 | 0.56 | 0.97 | 997 | 24947 | 1691 | 95 |
△SVM-S: The re-implementation of VitaPred over sequence-level cross-validation.
Performance comparisons with existing predictors on DVI, DVAI, DVBI, and DPLPI datasets over five-fold sequence-level cross-validation under
| Dataset | Method |
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| DVI | SVM-S△ | 52.29 | 98.32 | 96.19 | 0.54 | 0.87 | 1577 | 61076 | 1046 | 1439 |
| TargetVita | 51.06 | 98.59 | 96.39 | 0.55 | 0.89 | 1540 | 61244 | 878 | 1476 | |
| DVAI | SVM-S△ | 40.15 | 96.39 | 92.57 | 0.39 | 0.82 | 216 | 7109 | 267 | 322 |
| TargetVita | 44.43 | 96.81 | 93.25 | 0.44 | 0.85 | 239 | 7141 | 235 | 299 | |
| DVBI | SVM-S△ | 58.18 | 98.40 | 96.69 | 0.58 | 0.89 | 1291 | 49373 | 806 | 928 |
| TargetVita | 56.21 | 98.81 | 97.02 | 0.60 | 0.91 | 1248 | 49582 | 597 | 971 | |
| DPLPI | SVM-S△ | 80.86 | 99.07 | 98.36 | 0.79 | 0.96 | 883 | 26391 | 247 | 209 |
| TargetVita | 74.05 | 99.61 | 98.60 | 0.80 | 0.97 | 812 | 26534 | 104 | 280 |
△SVM-S: The re-implementation of VitaPred over sequence-level cross-validation.
Performance comparisons with existing predictors on the independent validation datasets under
| Dataset | Method |
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| DVI | VitaPred* | 73.70 | 71.98 | 72.07 | 0.22 | - |
|
|
|
|
| SVM-S△ | 75.38 | 78.51 | 78.35 | 0.28 | 0.85 | 493 | 9167 | 2509 | 161 | |
| TargetVita | 80.73 | 81.05 | 81.03 | 0.33 | 0.89 | 528 | 9463 | 2213 | 126 | |
| DVAI | VitaPred* | 73.48 | 72.87 | 72.93 | 0.31 | - | - | - | - | - |
| SVM-S△ | 73.48 | 79.25 | 78.61 | 0.38 | 0.83 | 133 | 1142 | 299 | 48 | |
| TargetVita | 79.01 | 79.18 | 79.16 | 0.41 | 0.86 | 143 | 1141 | 300 | 38 | |
| DVBI | VitaPred* | 83.05 | 68.76 | 69.40 | 0.23 | - | - | - | - | - |
| SVM-S△ | 78.28 | 81.49 | 81.35 | 0.30 | 0.88 | 328 | 7291 | 1656 | 91 | |
| TargetVita | 81.38 | 81.69 | 81.68 | 0.32 | 0.90 | 341 | 7309 | 1638 | 78 | |
| DPLPI | VitaPred* | 84.15 | 83.22 | 83.26 | 0.33 | - | - | - | - | - |
| SVM-S△ | 85.77 | 90.18 | 90.00 | 0.44 | 0.95 | 211 | 5352 | 583 | 35 | |
| TargetVita | 89.02 | 89.30 | 89.29 | 0.44 | 0.96 | 219 | 5300 | 635 | 27 |
*Data excepted from [30].
△SVM-S: The re-implementation of VitaPred.
Performance comparisons with existing predictors on the independent validation datasets under
| Dataset | Method |
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| DVI | VitaPred* | 41.74 | 96.63 | 93.72 | 0.38 | - | - | - | - | - |
| SVM-S△ | 47.09 | 98.40 | 95.68 | 0.52 | 0.85 | 308 | 11489 | 187 | 346 | |
| TargetVita | 47.01 | 98.42 | 95.69 | 0.52 | 0.89 | 308 | 11491 | 185 | 346 | |
| DVAI | VitaPred* | 30.39 | 97.22 | 89.77 | 0.37 | - | - | - | - | - |
| SVM-S△ | 32.04 | 97.09 | 89.83 | 0.38 | 0.83 | 58 | 1399 | 42 | 123 | |
| TargetVita | 38.12 | 96.81 | 90.26 | 0.43 | 0.86 | 69 | 1395 | 46 | 112 | |
| DVBI | VitaPred* | 49.40 | 94.49 | 92.47 | 0.35 | - | - | - | - | - |
| SVM-S△ | 52.03 | 98.25 | 96.18 | 0.53 | 0.88 | 218 | 8790 | 157 | 201 | |
| TargetVita | 51.06 | 98.69 | 96.56 | 0.55 | 0.90 | 214 | 8830 | 117 | 205 | |
| DPLPI | VitaPred* | 65.85 | 98.40 | 97.10 | 0.63 | - | - | - | - | - |
| SVM-S△ | 72.76 | 99.11 | 98.06 | 0.74 | 0.95 | 179 | 5882 | 53 | 67 | |
| TargetVita | 74.39 | 99.07 | 98.09 | 0.75 | 0.96 | 183 | 5880 | 55 | 63 |
*Data excepted from [30].
△SVM-S: The re-implementation of VitaPred.
Figure 4Differences between the values over the cross-validation test and over the independent validation test for VitaPred, SVM-S, and TargetVita on the four considered vitamins under .