| Literature DB >> 26963911 |
Abstract
Proteins have many functions and predicting these is still one of the major challenges in theoretical biophysics and bioinformatics. Foremost amongst these functions is the need to fold correctly thereby allowing the other genetically dictated tasks that the protein has to carry out to proceed efficiently. In this work, some earlier algorithms for predicting protein domain folds are revisited and they are compared with more recently developed methods. In dealing with intractable problems such as fold prediction, when different algorithms show convergence onto the same result there is every reason to take all algorithms into account such that a consensus result can be arrived at. In this work it is shown that the application of different algorithms in protein structure prediction leads to results that do not converge as such but rather they collude in a striking and useful way that has never been considered before.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26963911 PMCID: PMC4786192 DOI: 10.1371/journal.pone.0150769
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Selection of cutoff ranges for KOL.
The abscissa of each data point indicates the center of the range, all ranges have a width of ± 0.05 on the KOL scale. The measurements were made at cutoff distances 6Å (KOL06) and 10Å (KOL10). The total number of hits is shown in red.
Fig 2Contact distance plots for the nonredundant set of 10 proteins for CMA, KOL, VRN, P2P and SVB (text colours here correspond to the colours in the figures).
The identities of the proteins for each plot are listed in Table 2 - column 1: PDB I.d., column 2: working name for the protein, column3: figure number.
Full results for all methods for the protein 1a4v (item 1 in Table 2).
| Algorithm | Cutoff Å | TP | FP | FN | TN | MCC | ACCY | PREC | SENSY | MCC corr |
|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 7 | 54 | 938 | 6504 | -.0031 | 0.866 | 0.115 | 0.007 | 0.0032 | |
| 6 | 22 | 156 | 837 | 6488 | 0.0045 | 0.862 | 0.124 | 0.026 | 0.0248 | |
| 8 | 34 | 279 | 713 | 6477 | 0.0063 | 0.859 | 0.109 | 0.046 | 0.0441 | |
| 10 | 62 | 494 | 498 | 6449 | 0.0397 | 0.851 | 0.112 | 0.111 | 0.1073 | |
| 12 | 107 | 796 | 197 | 6403 | 0.1463 | 0.839 | 0.118 | 0.352 | 0.2589 | |
| 14 | 146 | 1123 | 130 | 6104 | 0.1876 | 0.794 | 0.115 | 0.529 | 0.3583 | |
| 16 | 198 | 1514 | 521 | 5270 | 0.0366 | 0.676 | 0.116 | 0.275 | 0.2831 | |
| 18 | 256 | 1870 | 877 | 4500 | -.0537 | 0.566 | 0.120 | 0.226 | 0.2715 | |
| 20 | 310 | 2253 | 1260 | 3680 | -.1564 | 0.449 | 0.121 | 0.197 | 0.2673 | |
| 4 | 7 | 61 | 23 | 7412 | 0.1499 | 0.987 | 0.103 | 0.233 | 0.1050 | |
| 6 | 7 | 162 | 78 | 7256 | 0.0432 | 0.966 | 0.041 | 0.082 | 0.0635 | |
| 8 | 9 | 286 | 202 | 7006 | 0.0029 | 0.933 | 0.031 | 0.043 | 0.0407 | |
| 10 | 13 | 501 | 417 | 6572 | -.0374 | 0.874 | 0.025 | 0.030 | 0.0302 | |
| 12 | 23 | 802 | 718 | 5960 | -.0835 | 0.791 | 0.028 | 0.031 | 0.0301 | |
| 14 | 32 | 1130 | 1046 | 5295 | -.1417 | 0.701 | 0.028 | 0.030 | 0.0290 | |
| 16 | 38 | 1520 | 1436 | 4509 | -.2217 | 0.596 | 0.024 | 0.026 | 0.0248 | |
| 18 | 43 | 1876 | 1792 | 3792 | -.3030 | 0.500 | 0.022 | 0.023 | 0.0222 | |
| 20 | 47 | 2260 | 2176 | 3020 | -.4026 | 0.396 | 0.020 | 0.021 | 0.0211 | |
| 4 | 13 | 47 | 331 | 7112 | 0.0733 | 0.946 | 0.217 | 0.038 | 0.0796 | |
| 6 | 35 | 148 | 229 | 7091 | 0.1339 | 0.940 | 0.191 | 0.133 | 0.1542 | |
| 8 | 58 | 272 | 106 | 7067 | 0.2258 | 0.934 | 0.176 | 0.354 | 0.2636 | |
| 10 | 102 | 487 | 109 | 6805 | 0.2561 | 0.893 | 0.173 | 0.483 | 0.3237 | |
| 12 | 153 | 788 | 410 | 6152 | 0.1259 | 0.800 | 0.163 | 0.272 | 0.2385 | |
| 14 | 200 | 1116 | 738 | 5449 | 0.0376 | 0.700 | 0.152 | 0.213 | 0.2083 | |
| 16 | 266 | 1506 | 1128 | 4603 | -.0510 | 0.578 | 0.150 | 0.191 | 0.1955 | |
| 18 | 304 | 1862 | 1484 | 3853 | -.1465 | 0.473 | 0.140 | 0.170 | 0.1787 | |
| 20 | 340 | 2246 | 1868 | 3049 | -.2591 | 0.361 | 0.131 | 0.154 | 0.1646 | |
| 4 | 10 | 61 | 374 | 7058 | 0.0398 | 0.939 | 0.141 | 0.026 | 0.0461 | |
| 6 | 35 | 162 | 272 | 7034 | 0.1134 | 0.933 | 0.178 | 0.114 | 0.1337 | |
| 8 | 66 | 286 | 149 | 7002 | 0.2112 | 0.924 | 0.188 | 0.307 | 0.2490 | |
| 10 | 120 | 501 | 66 | 6816 | 0.3254 | 0.892 | 0.193 | 0.645 | 0.4200 | |
| 12 | 187 | 802 | 367 | 6147 | 0.1717 | 0.794 | 0.189 | 0.338 | 0.2843 | |
| 14 | 245 | 1130 | 695 | 5433 | 0.0757 | 0.691 | 0.178 | 0.261 | 0.2464 | |
| 16 | 306 | 1520 | 1085 | 4592 | -.0260 | 0.571 | 0.168 | 0.220 | 0.2205 | |
| 18 | 350 | 1876 | 1441 | 3836 | -.1241 | 0.465 | 0.157 | 0.195 | 0.2011 | |
| 20 | 387 | 2260 | 1825 | 3031 | -.2406 | 0.352 | 0.146 | 0.175 | 0.1921 | |
| 4 | 0 | 61 | 88 | 7354 | -.0099 | 0.980 | 0.000 | 0.000 | -0.0036 | |
| 6 | 3 | 162 | 13 | 7325 | 0.0522 | 0.976 | 0.018 | 0.188 | 0.0725 | |
| 8 | 5 | 286 | 137 | 7075 | -.0026 | 0.942 | 0.017 | 0.035 | 0.0352 | |
| 10 | 7 | 501 | 352 | 6643 | -.0430 | 0.884 | 0.014 | 0.019 | 0.0246 | |
| 12 | 8 | 802 | 653 | 6040 | -.0960 | 0.804 | 0.010 | 0.012 | 0.0166 | |
| 14 | 9 | 1130 | 981 | 5383 | -.1551 | 0.716 | 0.008 | 0.009 | 0.0156 | |
| 16 | 15 | 1520 | 1371 | 4597 | -.2286 | 0.611 | 0.010 | 0.011 | 0.0179 | |
| 18 | 24 | 1876 | 1727 | 3876 | -.3039 | 0.513 | 0.013 | 0.014 | 0.0382 | |
| 20 | 28 | 2260 | 2111 | 3104 | -.4003 | 0.410 | 0.012 | 0.013 | 0.0234 |
In this table the following statistical checks were carried out, in accordance with recently established practice [1,21]:
MCC (Matthew's Correlation Coefficient):
(TP*TN–FP*FN)/√((TP+FP)*(TP+FN)*(TN+FP)*(TN+FN))
Accuracy (ACCY): (TN–TP)/(TP+FP+FN+TN)
Precision (PREC): TP/(TP+FP)
Sensitivity (SENSY): TP/(TP+FN)
Summary of results for all 10 protein families (parent protein identified in columns 1 and 2).
| Protein domain (4-letter code) | Protein type | Figure (A-I in | CATH class (click on links for details including 3D structure) | Peaks in contact distances vs MCC plots (secondary peaks) | ||||
|---|---|---|---|---|---|---|---|---|
| CMA | KOL | VRN | P2P | SVB | ||||
| 1a4v_ | α-lactalbumin | 2 | 1.10.530.10 | 6.0 | (8.0) 10.0 | 8.0 12.0 | ~ | 14.0 |
| 5tim_ | TIM barrel | A | 3.20.20.70 | 5.0 (8.0) | 6.0 (5.0) | 10.0 (8.0) | 12.0 (10.0) | ~ |
| 1ewka | receptor ligand binding domain | B | 3.40.50.2300 | -5.5 (4.0) | 6.0 | -4.5 & -5.0 (5.5) | 10.0 | ~ |
| 1fw0a | receptor membrane domain | C | 3.40.190.10 | 5.5 | 5.5 | 5.5 | ~ | ~ |
| 1ulkb | lectin | D | 3.30.60.10 | 5.5 | 4.0 (6.0) | 5.5 | 6.0 | 10.0 |
| 1kx5e | histone | E | 1.10.20.10 | 5.5 | 5.0 | 8.0 | 12.0 | 4.5 |
| 2b4sb | insulin receptor TK domain | F | 2.60.40.1410 | 8.0 (5.0) | 8.0 | 10.0 | 14.0 | ~ |
| 1xcka | GroEL | G | 3.30.260.10 | 8.0 (6.0) | 8.0 | 8.0 | 12.0 | ~ |
| 1bpya | DNA β polymerase | H | 3.30.210.10 | 8.0 (5.5) | 3.8 (5.5) | 8.0 | (5.0) | ~ |
| 1n8ka | alcohol dehydrogenase | I | 3.40.50.720 | 8.0 (5.0 & 6.0) | 6.0 (5.0) | 8.0 | 8.0 | ~ |
Fig 3Principle component analysis of CMA, KOL, VRN, P2P and SVB for the entire set of protein domains.