| Literature DB >> 26249814 |
Mario Abdel Messih1, Rosalba Lepore1, Anna Tramontano2.
Abstract
MOTIVATION: Predicting the structure of protein loops is very challenging, mainly because they are not necessarily subject to strong evolutionary pressure. This implies that, unlike the rest of the protein, standard homology modeling techniques are not very effective in modeling their structure. However, loops are often involved in protein function, hence inferring their structure is important for predicting protein structure as well as function.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26249814 PMCID: PMC4653384 DOI: 10.1093/bioinformatics/btv438
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Loop number distribution. The bars represent the number of available training/template loops for each length range
Fig. 2.Illustration of the geometric parameters defined in Marti-Renom
Fig. 3.Training procedure workflow
Performance comparison between the LoopIng, DisGro (DG) and LoopWeaver (LW) methods on the CASP10 dataset
| Loop length (# of cases) | LoopIng (a) (Å) | DG (b) (Å) | LW (c) (Å) | Prediction ≤ 1 Å (d) (%) | LoopIng < DG (e) (%) | LoopIng < LW (f) (%) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | Mean | SD | LoopIng | DG | LW | LoopIng < DG (I) | (DG – LoopIng) ≥ 1 Å (II) | LoopIng < LW (I) | (LW – LoopIng) ≥ 1 Å (II) | |
| 4 (79) | 0.74 | 0.63 | 1.43* | 0.68 | 0.93 | 0.64 | 70 | 27 | 47 | 75 | 38 | 52 | 16 |
| 5 (81) | 0.85 | 0.70 | 1.77* | 0.79 | 1.16* | 0.76 | 62 | 15 | 42 | 76 | 51 | 68 | 23 |
| 6(57) | 1.06 | 0.75 | 2.06* | 0.86 | 1.8* | 0.87 | 58 | 12 | 21 | 83 | 52 | 79 | 38 |
| 7(51) | 1.6 | 0.88 | 2.05* | 0.83 | 2.5* | 0.70 | 29 | 9 | 7 | 61 | 32 | 81 | 39 |
| 8(35) | 1.88 | 0.98 | 2.47* | 0.88 | 2.6* | 0.74 | 24 | 4 | 4 | 72 | 40 | 76 | 40 |
| 9(30) | 1.7 | 1.2 | 2.63* | 1.00 | 3.2* | 0.62 | 45 | 5 | 0 | 60 | 50 | 90 | 45 |
| 10(19) | 2.4 | 1.23 | 3.45* | 1.62 | 3.4* | 0.85 | 22 | 0 | 0 | 76 | 45 | 67 | 56 |
| 11(19) | 2.38 | 1.4 | 3.2 | 1.9 | 3.0 | 1.02 | 33 | 0 | 11 | 76 | 56 | 78 | 22 |
| 12(23) | 1.8 | 1.65 | 3.55* | 1.6 | 2.69 | 1.4 | 46 | 0 | 15 | 77 | 69 | 77 | 46 |
| 13(13) | 3.1 | 1.39 | 3.9 | 1.8 | 3.2 | 1.75 | 0 | 0 | 0 | 60 | 0 | 53 | 33 |
| 1.29 | 1.07 | 2.09 | 1.44 | 1.98 | 1.71 | 51 | 14 | 25 | 73 | 44 | 71 | 31 | |
Asterisks indicate a statistically significant difference (95% confidence level) with respect to the LoopIng method based on an unpaired t-test. (a, b, c) Mean RMSD and Standard Deviation for LoopIng, DG, and LW respectively. (d) Percentage of cases where LoopIng, DG and LW were able to give a prediction closer than 1 Å with respect to the native loop. (e, f) percentage of cases where LoopIng was more accurate (LoopIng < DG) and significantly better (ΔRMSD ≥ 1 Å) compared to DG and LW respectively.
Fig. 4.Dependence of the model performance expressed in terms of the average RMSD values (x-axis) from the average target-template sequence identity (y-axis) for each loop length group (values inside the bubbles)
Performance of the LoopIng method on the FREAD benchmark
| Loop length | Original FREAD | LoopIng | LEAP | |||
|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | Mean | SD | |
| 4 | 1.29* | 1.14 | 0.61 | 0.55 | ||
| 5 | 2.19* | 2.02 | 0.68 | 0.52 | ||
| 6 | 1.79* | 1.37 | 1.01 | 0.63 | ||
| 7 | 2.53* | 2.34 | 1.26 | 0.9 | ||
| 8 | 2.88* | 2.37 | 1.47 | 1.07 | ||
| 9 | 3.08* | 2.60 | 1.71 | 1.23 | ||
| 10 | 4.25* | 3.58 | 1.90 | 1.34 | ||
| 11 | 4.55* | 3.63 | 2.24 | 1.08 | ||
| 12 | 3.99* | 3.88 | 3.14 | 2.52 | ||
| 13 | 5.54* | 4.25 | 2.91 | 2.62 | ||
| 14 | 6.07* | 4.36 | 4.44* | 3.70 | ||
| 15 | 6.41* | 5.05 | 4.58 | 4.16 | ||
| 16 | 7.50* | 6.15 | 4.90* | 4.43 | ||
| 17 | 7.84* | 5.27 | 5.66* | 5.50 | ||
| 18 | 5.48 | 5.64 | 6.53* | 6.30 | ||
| 19 | 7.67* | 5.27 | 5.87 | 4.64 | ||
| 20 | 7.64* | 6.43 | 8.21* | 7.82 | ||
For each length range the number of tested loops is 30. The columns report the average and standard deviation RMSD values measured between the model and native loop backbone conformations. Asterisks indicate statistically significant differences (95% confidence level) based on an unpaired t-test with respect to the LoopIng model. Underlined values represent the best results between LoopIng and LEAP. The values reported in the FREAD and LEAP columns are taken from Choi and Deane (2010) and Liang , respectively.
LoopIng performance using native and modeled protein structure
| Loop length (# of cases) | Looping_Native (a) Å | Looping_Model (b) Å | ΔRMSD ≤ 0.1 Å (c) | ΔRMSD ≤ 0.5 Å (d) | ||
|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | (%) | (%) | |
| 4 (47) | 0.58 | 0.44 | 0.59 | 0.67 | 54 | 78 |
| 5 (33) | 0.40 | 0.39 | 0.48 | 0.42 | 79 | 92 |
| 6(21) | 1.23 | 0.71 | 1.61 | 0.57 | 48 | 88 |
| 7(20) | 1.44 | 1.00 | 1.77 | 0.65 | 45 | 65 |
| 8(9) | 1.55 | 0.99 | 2.08 | 0.93 | 33 | 33 |
| Overall (130) | 0.97 | 0.80 | 1.1 | 0.75 | 55 | 73 |
The performance, in terms of backbone RMSD with respect to the native loop conformation, using (a) the native structure for the remaining portion of the protein (LoopIng_Native) and (b) the best CASP10 predicted model (LoopIng_Model). (c) Percentage of cases where the RMSD difference between LoopIng_Native and LoopIng_Model is ≤ 0.1 Å. (d) Percentage of cases where the RMSD difference between LoopIng_Native and LoopIng_Model ≤ is 0.5 Å.
Fig. 5.Model performance using native and modeled protein structures from CASP10 dataset. The stem distance error (x-axis) is calculated as the difference in stem distance between the modeled and native stem structures. The RMSD error (y-axis) is calculated as the difference in backbone RMSD between LoopIng_Native and LoopIng_Model