| Literature DB >> 25886990 |
Juyong Lee1,2, Kiho Lee3, InSuk Joung4,5, Keehyoung Joo6,7, Bernard R Brooks8, Jooyoung Lee9,10.
Abstract
BACKGROUND: In template-based modeling when using a single template, inter-atomic distances of an unknown protein structure are assumed to be distributed by Gaussian probability density functions, whose center peaks are located at the distances between corresponding atoms in the template structure. The width of the Gaussian distribution, the variability of a spatial restraint, is closely related to the reliability of the restraint information extracted from a template, and it should be accurately estimated for successful template-based protein structure modeling.Entities:
Mesh:
Year: 2015 PMID: 25886990 PMCID: PMC4374281 DOI: 10.1186/s12859-015-0526-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
20 input features used for Sigma-RF are listed along with their importance estimates
|
|
|
|
|---|---|---|
| F1 | | | 7.51 |
| F2 | | | 2.91 |
| F3 |
| 9.43 |
| F4 |
| 2.55 |
| F5 |
| 16.81 |
| F6 |
| 1.91 |
| F7 |
| 1.36 |
| F8 | 1/| | 0.12 |
| F9 | 1/| | 0.20 |
| F10 |
| 0.37 |
| F11 |
| 0.32 |
| F12 | 1/| | 0.23 |
| F13 | 1/| | 0.49 |
| F14 |
| 0.16 |
| F15 |
| 0.88 |
| F16 |
| 0.53 |
| F17 |
| 0.58 |
| F18 |
| 3.62 |
| F19 |
| 3.02 |
| F20 |
| 4.22 |
I and J (>I) indicate the residue indices in the target sequence, and K and L (>K) indicate those in the template sequence. When two residue pairs [(I, K) and (J, L)] are aligned, we extract the distance information of d between two atoms in the template. N is the chain length of the target sequence. m is the match score of the aligned pair (I, K). In F5, δ(i,j)=1, if residues i,j are aligned, otherwise δ(i,j)=0. is the number of gaps between I and J in the target sequence. I ′, J ′, K ′ and L ′ represent the residue indices of the closest gaps of I, J, K and L, respectively. p(s) represents the PSI-PRED scores of the secondary structure elements, helix (H), strand (E) and coil (C). p(acc) represents the SANN scores of the solvent accessibility states, buried (B) and exposed (E).
Correlation coefficients between predicted values and actual error, | − |, are shown
|
|
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| ||
| T0517 | 2qs7A | 0.2912 |
| 0.2492 |
| 0.2016 |
| 0.2614 |
|
| T0523 | 1ew0A | 0.3518 |
| 0.3382 |
| 0.2505 |
| 0.1397 |
|
| T0527 | 3f1pA | 0.2131 |
| 0.1347 |
| 0.3624 |
| 0.4031 |
|
| T0536 | 1ew0A | 0.1969 |
| 0.2261 |
| 0.4438 |
| 0.1690 |
|
| T0538 | 2kruA | 0.1363 |
| 0.1573 |
| -0.0174 |
|
| -0.1121 |
| T0539 | 1x4jA | 0.2608 |
| 0.2179 |
| 0.2578 |
| -0.0061 |
|
| T0545 | 1wywA | 0.1998 |
| 0.1969 |
| 0.2351 |
|
| -0.1385 |
| T0552 | 2q0zX | 0.1053 |
| 0.1197 |
| -0.0547 |
| 0.0830 |
|
| T0557 | 3lmmA | 0.2984 |
| 0.3258 |
| -0.0896 |
| 0.4558 |
|
| T0559 | 1qbjA | 0.1473 |
| 0.0865 |
| 0.2635 |
| -0.2199 |
|
| T0560 | 2fokA | 0.2076 |
| 0.1677 |
| 0.3255 |
| -0.0587 |
|
| T0566 | 1usuB | 0.3187 |
| 0.3524 |
| 0.2816 |
| 0.3723 |
|
| T0567 | 1ny5A |
| 0.1997 |
| 0.2678 | 0.0558 |
| 0.1784 |
|
| T0580 | 1iibA | 0.0710 |
| 0.1195 |
| -0.0505 |
| 0.1281 |
|
| T0586 | 3by6A | 0.1282 |
| 0.0724 |
| -0.0420 |
| -0.1258 |
|
| T0590 | 1l0qA |
| 0.1218 |
| 0.0431 | 0.3497 |
|
| -0.0593 |
| T0594 | 1x53A | 0.1894 |
| 0.2257 |
| 0.2010 |
|
| 0.1527 |
| T0598 | 2osoA | 0.2631 |
| 0.3188 |
| 0.2556 |
| 0.3168 |
|
| T0610 | 1wdjA | 0.2421 |
| 0.2517 |
| 0.2707 |
| 0.3233 |
|
| T0615 | 1vj7A | 0.3285 |
| 0.3407 |
| 0.1585 |
| 0.2142 |
|
| T0622 | 3c1aA | 0.3945 |
| 0.4249 |
| 0.2729 |
| 0.2606 |
|
| Average | 0.2264 | 0.3547 | 0.2257 | 0.3809 | 0.1872 | 0.4580 | 0.1792 | 0.2748 | |
Better values are shown in bold face.
Figure 1Predicted distance variability values are shown against actual distance errors for T0552 and T0598. The results of T0552 are shown in panel A and B, and those of T0598 are shown in panel C and D. The variability values by Sigma-RF, σ , (green) show better correlation with true distance deviations, σ =|d −d |, than those by Modeller, σ , (red). The blue lines represent the linear correlation, y=x.
Correlation coefficients between predicted values by Sigma-RF and the actual errors for CACA distances of 22 CASP9 targets are shown
|
|
|
|
|---|---|---|
| T0517 | 0.5622 | 0.5774 |
| T0523 | 0.3923 | 0.3041 |
| T0527 | 0.3402 | 0.3355 |
| T0536 | 0.3138 | 0.3438 |
| T0538 | 0.2225 | 0.2998 |
| T0539 | 0.5197 | 0.5093 |
| T0545 | 0.3312 | 0.2289 |
| T0552 | 0.4061 | 0.4277 |
| T0557 | 0.4447 | 0.3720 |
| T0559 | 0.2589 | 0.2237 |
| T0560 | 0.4392 | 0.4080 |
| T0566 | 0.4536 | 0.3619 |
| T0567 | 0.1997 | 0.1960 |
| T0580 | 0.2354 | 0.2948 |
| T0586 | 0.3713 | 0.4038 |
| T0590 | 0.1218 | 0.0670 |
| T0594 | 0.3364 | 0.3330 |
| T0598 | 0.3145 | 0.2489 |
| T0602 | 0.5608 | 0.4723 |
| T0610 | 0.2756 | 0.2825 |
| T0615 | 0.4062 | 0.4177 |
| T0622 | 0.5028 | 0.4853 |
| Average | 0.3640 | 0.3452 |
Results using the full 20 features as well as using top 10 features are shown. On average, by using only half of the features, 95% of the prediction level is achieved.
Average model quality measures of homology modeling results of 46 benchmark targets obtained by ModellerCSA using , , and are shown
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
|
| 0.756 | 0.734 | 0.710 | 0.661 | 0.650 | 0.648 |
|
| 0.730 | 0.722 | 0.707 | 0.636 | 0.630 | 0.626 |
|
| 0.727 | 0.719 | 0.691 | 0.635 | 0.630 | 0.624 |
| No. of improved targets | 32/46 | 33/46 | 34/46 | 29/46 | 29/46 | 30/46 |
Figure 2A comparison of TM-scores and lDDT-scores of 3D models generated by ModellerCSA using and from those using . The TM-score results are shown in panel A and B, and the lDDT-score results are shown in panel C and D. For all plots, X-axes represent the quality measure differences between models obtained by σ and σ . Y-axes represent the differences between models obtained by σ and σ . The green lines represent the y=x line, which corresponds to the identical model quality. The number of dots over the green line corresponds to the targets that are improved by using σ .
Average model quality measures of homology modeling results of 46 benchmark targets obtained by original Modeller using , , and are shown
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
|
| 0.764 | 0.744 | 0.743 | 0.635 | 0.617 | 0.616 |
|
| 0.741 | 0.719 | 0.719 | 0.609 | 0.595 | 0.593 |
|
| 0.735 | 0.721 | 0.719 | 0.607 | 0.595 | 0.592 |
| No. of improved targets | 36/46 | 21/46 | 22/46 | 27/46 | 22/46 | 29/46 |
Figure 3A comparison of TM-scores and lDDT-scores of 3D models generated by Modeller using and from those using . The TM-score results are shown in panel A and B, and the lDDT-score results are shown in panel C and D. For all plots, X-axes represent the quality measure differences between models obtained by σ and σ . Y-axes represent the differences between models obtained by σ and σ . The green lines represent the y=x line, which corresponds to the identical model quality. The number of dots over the green line corresponds to the targets that are improved by using σ .
Figure 4A comparison of template-based modeling results of T0517 and T0523 by the and values. The energy landscapes of template-based modeling results of (A) T0517 and (D) T0523 by σ , σ and σ . The representative structures of low and high TM-score results are superposed: (B) T0517 and (E) T0523. The average restraint energy differences, E −E , of the mirror-image structures of (C) T0517 and (F) T0523 evaluated by σ and σ are shown as 3D histogram plots. Positive z-axis values indicate that corresponding distance restraints are favored by σ and disfavored by σ .