| Literature DB >> 28421198 |
Hongjie Wu1, Haiou Li2, Min Jiang3, Cheng Chen1, Qiang Lv2, Chuang Wu1.
Abstract
Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28421198 PMCID: PMC5381204 DOI: 10.1155/2017/7294519
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Algorithm 1SK-means (V, N, K).
Figure 1Algorithm flowcharts of SK-means and K-means++.
Comparison between SK-means, K-means++, and SPICKER on 56 protein decoys.
| Index | PDB | Lena | Sizeb | Bestc |
|
|
| SPICKERg | Randomh |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1abv | 103 | 526 | 0.507 | 0.3701 |
|
| 0.3813 | 0.479 |
| 2 | 1af7 | 72 | 527 | 0.623 |
|
| 0.4820 | 0.4874 | 0.322 |
| 3 | 1ah9 | 63 | 510 | 0.696 |
|
|
| 0.4657 | 0.434 |
| 4 | 1aoy | 65 | 529 | 0.711 | 0.6482 | 0.6695 | 0.6695 | 0.6695 | 0.622 |
| 5 | 1b4bA | 71 | 460 | 0.473 | 0.3815 | 0.4279 | 0.4270 | 0.4501 | 0.379 |
| 6 | 1b72A | 49 | 534 | 0.697 |
| 0.3917 |
| 0.4923 | 0.562 |
| 7 | 1bm8 | 99 | 329 | 0.388 |
|
| 0.3320 | 0.3550 | 0.255 |
| 8 | 1bq9A | 53 | 573 | 0.465 | 0.3540 | 0.3459 |
| 0.3873 | 0.411 |
| 9 | 1cewI | 108 | 452 | 0.748 |
| 0.7154 |
| 0.7187 | 0.617 |
| 10 | 1cqkA | 101 | 284 | 0.885 | 0.8439 | 0.8539 | 0.8539 | 0.8539 | 0.815 |
| 11 | 1csp | 67 | 315 | 0.753 | 0.7158 | 0.7158 | 0.7158 | 0.7158 | 0.686 |
| 12 | 1cy5A | 92 | 273 | 0.893 | 0.8685 | 0.8839 | 0.8680 | 0.8839 | 0.876 |
| 13 | 1dcjA | 73 | 525 | 0.368 |
|
| 0.3170 | 0.3264 | 0.334 |
| 14 | 1di2A | 69 | 374 | 0.843 | 0.7622 | 0.7663 | 0.7620 | 0.7663 | 0.374 |
| 15 | 1dtjA | 74 | 285 | 0.814 | 0.7901 | 0.7581 | 0.7370 | 0.7901 | 0.705 |
| 16 | 1egxA | 115 | 352 | 0.827 | 0.7673 | 0.7673 | 0.7673 | 0.7673 | 0.768 |
| 17 | 1fadA | 92 | 514 | 0.652 | 0.5716 | 0.5755 | 0.5755 | 0.5755 | 0.553 |
| 18 | 1fo5A | 85 | 340 | 0.568 |
|
| 0.5230 | 0.5296 | 0.469 |
| 19 | 1g1cA | 98 | 307 | 0.787 | 0.7473 | 0.7732 |
| 0.7732 | 0.621 |
| 20 | 1gjxA | 77 | 525 | 0.515 | 0.2375 | 0.3807 | 0.3810 | 0.4298 | 0.191 |
| 21 | 1gnuA | 117 | 553 | 0.647 | 0.5353 | 0.5353 | 0.5350 | 0.5456 | 0.509 |
| 22 | 1gpt | 47 | 469 | 0.553 |
|
|
| 0.4927 | 0.517 |
| 23 | 1gyvA | 117 | 337 | 0.776 | 0.7406 | 0.7406 |
| 0.7406 | 0.753 |
| 24 | 1hbkA | 89 | 300 | 0.708 | 0.6633 | 0.6633 | 0.6633 | 0.6633 | 0.599 |
| 25 | 1itpA | 68 | 526 | 0.511 | 0.3069 |
|
| 0.3096 | 0.335 |
| 26 | 1jnuA | 104 | 269 | 0.768 |
| 0.7237 | 0.6980 | 0.7237 | 0.711 |
| 27 | 1kjs | 74 | 548 | 0.5 | 0.3728 | 0.3728 | 0.3580 | 0.3728 | 0.313 |
| 28 | 1kviA | 68 | 550 | 0.79 |
| 0.6774 |
| 0.6774 | 0.642 |
| 29 | 1mkyA3 | 81 | 285 | 0.552 | 0.4155 | 0.4155 | 0.4155 | 0.4155 | 0.384 |
| 30 | 1mla_2 | 70 | 335 | 0.775 |
| 0.6226 | 0.6226 | 0.6226 | 0.609 |
| 31 | 1mn8A | 84 | 545 | 0.457 | 0.2517 |
|
| 0.3285 | 0.310 |
| 32 | 1n0uA4 | 69 | 301 | 0.588 |
|
| 0.4524 | 0.4524 | 0.333 |
| 33 | 1ne3A | 56 | 566 | 0.453 | 0.2523 |
|
| 0.3724 | 0.344 |
| 34 | 1no5A | 93 | 426 | 0.419 | 0.3710 |
|
| 0.4054 | 0.500 |
| 35 | 1npsA | 88 | 469 | 0.800 | 0.7671 | 0.7671 | 0.2810 | 0.7671 | 0.283 |
| 36 | 1o2fB | 77 | 510 | 0.528 |
|
|
| 0.2690 | 0.379 |
| 37 | 1of9A | 77 | 507 | 0.585 |
| 0.494 |
| 0.4940 | 0.554 |
| 38 | 1ogwA | 72 | 520 | 0.890 | 0.7853 | 0.7853 | 0.7850 | 0.8622 | 0.78 |
| 39 | 1orgA | 118 | 442 | 0.816 | 0.7440 | 0.7339 | 0.7440 | 0.7440 | 0.693 |
| 40 | 1pgx | 59 | 562 | 0.551 |
| 0.3216 |
| 0.4446 | 0.51 |
| 41 | 1r69 | 61 | 291 | 0.824 | 0.7007 | 0.7255 | 0.7255 | 0.7255 | 0.827 |
| 42 | 1sfp | 111 | 308 | 0.758 | 0.7453 | 0.7453 | 0.7454 | 0.7454 | 0.749 |
| 43 | 1shfA | 59 | 536 | 0.836 |
| 0.5070 |
| 0.5070 | 0.408 |
| 44 | 1sro | 71 | 515 | 0.648 |
|
| 0.5820 | 0.6158 | 0.583 |
| 45 | 1ten | 87 | 294 | 0.851 | 0.8215 | 0.8215 | 0.7860 | 0.8215 | 0.781 |
| 46 | 1tfi | 47 | 339 | 0.592 | 0.5061 | 0.5576 | 0.5520 | 0.5576 | 0.550 |
| 47 | 1thx | 108 | 302 | 0.865 | 0.8000 | 0.8000 | 0.8000 | 0.8000 | 0.819 |
| 48 | 1tif | 59 | 542 | 0.340 |
| 0.2667 | 0.2660 | 0.3199 | 0.232 |
| 49 | 1tig | 88 | 565 | 0.585 |
|
|
| 0.4176 | 0.517 |
| 50 | 1vcc | 76 | 551 | 0.455 | 0.3973 | 0.4066 | 0.3970 | 0.4066 | 0.291 |
| 51 | 256bA | 106 | 506 | 0.814 |
| 0.7578 |
| 0.7578 | 0.723 |
| 52 | 2a0b | 118 | 282 | 0.838 | 0.8083 | 0.8083 | 0.8083 | 0.8083 | 0.768 |
| 53 | 2cr7A | 60 | 540 | 0.666 | 0.3589 | 0.5059 |
| 0.5136 | 0.365 |
| 54 | 2f3nA | 65 | 485 | 0.758 | 0.6403 |
| 0.6510 | 0.7132 | 0.626 |
| 55 | 2pcy | 99 | 435 | 0.637 | 0.6040 | 0.5795 |
| 0.6233 | 0.527 |
| 56 | 2reb_2 | 60 | 550 | 0.403 |
|
|
| 0.3174 | 0.416 |
aThe length of the protein sequence.
bThe size of the models in the decoy.
cThe best (maximum) TM-score of the models in the decoy.
dThe TM-score of centroid model in the largest cluster selected by K-means++ (bold indicates better than SPICKER).
eThe TM-score of centroid model in the largest cluster selected by SK-means (bold indicates better than SPICKER).
fThe TM-score of centroid model in the largest cluster selected by K-means (bold indicates better than SPICKER).
gThe TM-score of centroid model in the largest cluster selected by SPICKER.
hThe TM-score of centroid model selected by random.
Figure 2TM-score comparison between SPICKER and K-means++ on 12 CASP11 targets.
Figure 3Superimposing of 3D structures of SK-means model1 (red), SPICKER model1 (green), and native (blue) on 1sro and 1tig. The black frames highlight the improvements of SK-means comparing with SPICKER.
Figure 4Visualized comparing in all models of the decoy and superimposing of 3D structures of T0837. (a) The visualization of TM-score and RMSD on the whole decoys. (b) The native structures, model1, identified by K-means++ and SPICKER are represented by blue, red, and green, respectively.