| Literature DB >> 27882325 |
Hua Zhang1, Tao Jiang2, Guogen Shan3.
Abstract
Residue fluctuations in protein structures have been shown to be highly associated with various protein functions. Gaussian network model (GNM), a simple representative coarse-grained model, was widely adopted to reveal function-related protein dynamics. We directly utilized the high frequency modes generated by GNM and further performed Gaussian Naive Bayes (GNB) to identify hot spot residues. Two coding schemes about the feature vectors were implemented with varying distance cutoffs for GNM and sliding window sizes for GNB based on tenfold cross validations: one by using only a single high mode and the other by combining multiple modes with the highest frequency. Our proposed methods outperformed the previous work that did not directly utilize the high frequency modes generated by GNM, with regard to overall performance evaluated using F1 measure. Moreover, we found that inclusion of more high frequency modes for a GNB classifier can significantly improve the sensitivity. The present study provided additional valuable insights into the relation between the hot spots and the residue fluctuations.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27882325 PMCID: PMC5110947 DOI: 10.1155/2016/4354901
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
List of top 20 F1 measures based on tenfold cross validations of Gaussian Naive Bayes when using single ith highest mode (i = 1,2,…, 20) inputted into the feature vector, where cutoff means the distance threshold for GNM computation that varies from 6.0 to 8.0 with step size of 0.1 and sw represents the size of the sliding window for the central residue that ranges from 1 to 21 with step size of 2.
| Top | Cutoff |
| sw | sen | spe | pre | acc |
|
|---|---|---|---|---|---|---|---|---|
| 1 | 7.3 | 8 | 3 | 0.1930 | 0.9436 | 0.1250 | 0.9136 |
|
| 2 | 7.1 | 8 | 9 | 0.2515 | 0.9095 | 0.1039 | 0.8831 | 0.1470 |
| 3 | 7.1 | 8 | 7 | 0.2456 | 0.9119 | 0.1042 | 0.8852 | 0.1463 |
| 4 | 7.1 | 8 | 5 | 0.2164 | 0.9263 | 0.1091 | 0.8979 | 0.1451 |
| 5 | 7.1 | 8 | 3 | 0.1696 | 0.9473 | 0.1184 | 0.9162 | 0.1394 |
| 6 | 7.3 | 8 | 5 | 0.1871 | 0.9354 | 0.1077 | 0.9054 | 0.1368 |
| 7 | 8.0 | 3 | 5 | 0.1930 | 0.9310 | 0.1044 | 0.9014 | 0.1355 |
| 8 | 7.3 | 8 | 7 | 0.2164 | 0.9163 | 0.0974 | 0.8883 | 0.1343 |
| 9 | 7.1 | 8 | 13 | 0.2281 | 0.9090 | 0.0947 | 0.8817 | 0.1338 |
| 10 | 7.1 | 8 | 11 | 0.2281 | 0.9071 | 0.0929 | 0.8799 | 0.1320 |
| 11 | 7.0 | 19 | 17 | 0.2456 | 0.8963 | 0.0899 | 0.8703 | 0.1317 |
| 12 | 6.7 | 13 | 3 | 0.1345 | 0.9619 | 0.1285 | 0.9288 | 0.1314 |
| 13 | 7.8 | 3 | 3 | 0.1520 | 0.9507 | 0.1140 | 0.9187 | 0.1303 |
| 14 | 7.0 | 14 | 21 | 0.2339 | 0.901 | 0.0897 | 0.8742 | 0.1297 |
| 15 | 7.0 | 19 | 19 | 0.2456 | 0.8934 | 0.0877 | 0.8674 | 0.1292 |
| 16 | 7.1 | 8 | 15 | 0.2281 | 0.9039 | 0.0901 | 0.8768 | 0.1291 |
| 17 | 7.0 | 4 | 7 | 0.2281 | 0.9022 | 0.0886 | 0.8752 | 0.1277 |
| 18 | 6.6 | 6 | 3 | 0.1520 | 0.9480 | 0.1088 | 0.9162 | 0.1268 |
| 19 | 6.9 | 15 | 21 | 0.2222 | 0.9046 | 0.0886 | 0.8773 | 0.1267 |
| 20 | 7.2 | 14 | 13 | 0.2456 | 0.8897 | 0.0850 | 0.8639 | 0.1263 |
Figure 1Plots of sensitivity (a), precision (b), and F1 values by the single ith highest mode (i = 1,2,…, 20) in three cases of the sliding window sizes (sw) (i.e., sw = 1, 3, 5) for GNB classifiers. The ith highest mode in the figure is denoted as hmi.
List of the top 20 F1 measures based on tenfold cross validations of Gaussian Naive Bayes when using m modes with the highest frequency inputted into the feature vector, where m = {1,2,…, 20}, the distance cutoff in GNM varies from 6.0 to 8.0 with step size of 0.1, and the sliding window size (sw) for multiple high modes ranges from 1 to 21 with step size of 2.
| Top | Cutoff |
| sw | sen | spe | pre | acc |
|
|---|---|---|---|---|---|---|---|---|
| 1 | 7.4 | 10 | 1 | 0.2924 | 0.8992 | 0.1080 | 0.8749 |
|
| 2 | 7.4 | 11 | 1 | 0.3041 | 0.8873 | 0.1012 | 0.8639 | 0.1518 |
| 3 | 7.4 | 13 | 1 | 0.3275 | 0.8736 | 0.0976 | 0.8518 | 0.1503 |
| 4 | 7.2 | 20 | 1 | 0.4269 | 0.8207 | 0.0903 | 0.8049 | 0.1491 |
| 5 | 7.4 | 12 | 1 | 0.3099 | 0.8809 | 0.0980 | 0.8581 | 0.1489 |
| 6 | 7.2 | 19 | 1 | 0.4152 | 0.8239 | 0.0895 | 0.8075 | 0.1473 |
| 7 | 7.3 | 20 | 1 | 0.4152 | 0.8229 | 0.0891 | 0.8066 | 0.1467 |
| 8 | 7.1 | 11 | 1 | 0.2924 | 0.8870 | 0.0975 | 0.8632 | 0.1462 |
| 9 | 7.5 | 15 | 1 | 0.3450 | 0.8592 | 0.0928 | 0.8386 | 0.1462 |
| 10 | 7.1 | 9 | 3 | 0.3977 | 0.8312 | 0.0895 | 0.8138 | 0.1461 |
| 11 | 7.3 | 10 | 1 | 0.2690 | 0.8992 | 0.1002 | 0.8740 | 0.1460 |
| 12 | 7.4 | 15 | 1 | 0.3450 | 0.8585 | 0.0923 | 0.8379 | 0.1457 |
| 13 | 7.1 | 13 | 1 | 0.3158 | 0.8727 | 0.0937 | 0.8504 | 0.1446 |
| 14 | 7.5 | 14 | 1 | 0.3275 | 0.8663 | 0.0927 | 0.8447 | 0.1445 |
| 15 | 7.5 | 16 | 1 | 0.3509 | 0.8529 | 0.0905 | 0.8328 | 0.1439 |
| 16 | 7.4 | 14 | 1 | 0.3275 | 0.8653 | 0.0921 | 0.8438 | 0.1438 |
| 17 | 7.6 | 15 | 1 | 0.3333 | 0.8622 | 0.0916 | 0.8410 | 0.1438 |
| 18 | 7.5 | 10 | 1 | 0.2632 | 0.9000 | 0.0989 | 0.8745 | 0.1438 |
| 19 | 7.3 | 9 | 1 | 0.2456 | 0.9090 | 0.1012 | 0.8824 | 0.1433 |
| 20 | 7.1 | 14 | 1 | 0.3275 | 0.8641 | 0.0914 | 0.8426 | 0.1429 |
Figure 2Plots of sensitivity, precision, and F1 values against m modes with the highest frequency in five cases denoted by the distance cutoffs of 7.1 Å (a), 7.2 Å (b), 7.3 Å (c), 7.4 Å (d), and 7.5 Å (e), respectively, for GNB classifiers.
Performance comparison of the proposed models with the work by Ozbek et al. [12] and the simulated methods proposed by Haliloglu et al. [13] and Demirel et al. [14], where hm1–i means that a total of i high frequency modes (hm1, hm2,…, hmi) are used together.
| Reference | GNM modes | Cutoff | sw | sen | spe | pre | acc |
|
|---|---|---|---|---|---|---|---|---|
| Ozbek et al. [ | hm1 | 0.14 | 0.89 | 0.05 | 0.86 | 0.0737 | ||
| hm2 | 0.16 | 0.80 | 0.05 | 0.85 | 0.0762 | |||
| hm3 | 6.5 Å | 1 | 0.24 | 0.88 | 0.07 | 0.85 | 0.1084 | |
| hm1–3 | 0.25 | 0.86 | 0.07 | 0.83 | 0.1094 | |||
| hm1–5 | 0.29 | 0.84 | 0.07 | 0.81 | 0.1128 | |||
|
| ||||||||
| Haliloglu et al. [ | hm1 | 0.1988 | 0.9019 | 0.0780 | 0.8738 | 0.1120 | ||
| hm1-2 | 0.2690 | 0.8819 | 0.0868 | 0.8574 | 0.1312 | |||
| hm1–3 | 7.0 Å | 1 | 0.3041 | 0.8580 | 0.0820 | 0.8358 | 0.1292 | |
| hm1–4 | 0.3275 | 0.8429 | 0.0800 | 0.8222 | 0.1286 | |||
| hm1–5 | 0.3450 | 0.8339 | 0.0797 | 0.8143 | 0.1295 | |||
|
| ||||||||
| Demirel et al. [ | hm1 | 0.0468 |
| 0.0792 |
| 0.0588 | ||
| hm1-2 | 0.0526 | 0.9697 | 0.0677 | 0.9330 | 0.0592 | |||
| hm1–3 | 7.0 Å | 1 | 0.0409 | 0.9615 | 0.0424 | 0.9246 | 0.0417 | |
| hm1–4 | 0.0819 | 0.9573 | 0.0741 | 0.9222 | 0.0778 | |||
| hm1–5 | 0.0936 | 0.9532 | 0.0769 | 0.9187 | 0.0844 | |||
|
| ||||||||
| This work | hm8 | 7.3 Å | 3 | 0.1930 | 0.9436 |
| 0.9136 | 0.1517 |
| hm8 | 7.1 Å | 9 | 0.2515 | 0.9095 | 0.1039 | 0.8831 | 0.1470 | |
| hm1–10 | 7.4 Å | 1 | 0.2924 | 0.8992 | 0.1080 | 0.8749 |
| |
| hm1–11 | 7.4 Å | 1 | 0.3041 | 0.8873 | 0.1012 | 0.8639 | 0.1518 | |
| hm1–13 | 7.4 Å | 1 | 0.3275 | 0.8736 | 0.0976 | 0.8518 | 0.1503 | |
| hm1–20 | 7.2 Å | 1 |
| 0.8207 | 0.0903 | 0.8049 | 0.1491 | |