| Literature DB >> 21291572 |
Eric di Luccio1, Patrice Koehl.
Abstract
BACKGROUND: The analysis of protein structures provides fundamental insight into most biochemical functions and consequently into the cause and possible treatment of diseases. As the structures of most known proteins cannot be solved experimentally for technical or sometimes simply for time constraints, in silico protein structure prediction is expected to step in and generate a more complete picture of the protein structure universe. Molecular modeling of protein structures is a fast growing field and tremendous works have been done since the publication of the very first model. The growth of modeling techniques and more specifically of those that rely on the existing experimental knowledge of protein structures is intimately linked to the developments of high resolution, experimental techniques such as NMR, X-ray crystallography and electron microscopy. This strong connection between experimental and in silico methods is however not devoid of criticisms and concerns among modelers as well as among experimentalists.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21291572 PMCID: PMC3213331 DOI: 10.1186/1471-2105-12-48
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A schematic flow chart of the homology modeling method. Example of the T0295 target (Pf-KsgA from Plasmodium falciparum) from CASP7. The sequence alignment between the sequences of T0295 and the template protein in PDB file 1ZQ9 was generated using ClustalW V.2.0.10 [87] and the model was built using MODELLER 9v5 [36]. The X-ray structure of T0295 (PDB 2H1R) is displayed within brackets for visual comparison with one of the T0295 models.
Figure 2Impact of a misalignment in homology modeling. The target protein T0295 from CASP7 is modeled using the crystal structure of human dimethyladenosine transferase (PDB 1ZQ9) as a template. In panel (A), we show the structural alignment of the template 1ZQ9 with the best (yellow) and worst (blue) models generated by MODELLER 9v5 for T0295. The structural diversity between these models is low; Thr31 for example superposes very well in the three structures. In panel (B), we show the effect of an error in the sequence alignment that serves as input to the modelling process. We shifted the alignment by one residue at the level of Thr31, and generated new models for T0295. The superposition of the template 1ZQ9 with both the best and the worst models shows locally a structural heterogeneity in the loop that contains Thr31. Thr31 is being shift by 3.4 Å due to one single error in the sequence alignment.
Figure 3The side-chains positioning problem. We compare 10 models constructed for target T0295 with the native structure of T0295, available in the PDB as file 2H1R (shown in red). We show only the backbones of each model and the native structure, except for the sidechain of Tyr64. The diversity observed for the conformation of this sidechain emphasizes the difficulty to model sidechains accurately.
Figure 4Flowchart for computing the H-factor. A. The scoring functions and are sequence-based: score compares the secondary structure prediction for the target sequence with the actual secondary structure assignment of the template protein, while score evaluates the sequence similarity between the target and template sequence. B. The scoring functions and evaluate the structural models: score quantifies the structural diversity among the models, while score identifies the pfam domain in the target protein, collects the structures of these domains from the models to be tested, and compares these structures with those observed for the same domains in the PDB.
Test set based on CASP targets.
| Test case (CASP ID) |
|
|
|
|
|
|---|---|---|---|---|---|
| T0295 | 1ZQ9 (1.90) | 46 | 1-275 | pairwise | 80-86; 163-166; 179- |
| T0375 | 2DCN (2.25); 1RKD (1.84); 1V1A (2.10); | 17 | 1-296 | multiple | 33-37; 75-78; 100-103 |
| T0287 | 1V55 (1.90) | 16 | 1-199 | pairwise | 33-37; 75-78; 100-103; |
Comparing the H-factor with cRMS, DOPE and QMEAN scores to assess models generated for CASP7 targets.
| CASP 7 target | Scoring function (a) | H-factor (%) | cRMS (Å) (b) | % ID (c) | DOPE (d) | QMEANnorm (e) | |||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| ||||||
| T0295 | 1.0 | 1.9 | 1.3 | 3.8 | 19 | 1.67 | 46 | -33940 | 0.735 |
| T0295* (f) | 1.0 | 1.9 | 1.9 | 3.8 | 21 | 1.71 | 46 | -24317 | 0.196 |
| T0375 | 1.1 | 8.6 | 2.9 | 3.8 | 41 | 3.16 | 17 | -28442 | 0.546 |
| T0287 | 7.0 | 8.1 | 7.5 | 7.5 | 75 | 5.81 | 16 | -18444 | 0.285 |
(a) The four scoring functions included in the H-factor measure the quality of the secondary structure prediction (score ), the diversity of the sequence alignment (score ), the structural diversity of the models generated (score ), and the similarity of the predicted structures for the functional domains in the target, compared to the structures of the same domains found in the PDB (score ) (see text for detail)
(b) Average cRMS (over Cα) between the 20 different models generated and the actual experimental structure for the target.
(c) Sequence identity between the target sequence and the template sequence
(d) DOPE scores from MODELLER [94]
(e) QMEAN normalized score [95]
(f) T0295* is the toy experiment in which the sequence alignment between the target T0295 and its framework has been deliberately modified (see text for details).
Comparison of the H-factor with cRMS, DOPE and QMEAN scores to assess models generated for CASP7 targets from the free-modeling category
| CASP 7 target | H-Factor (%) | cRMS (Å) (a) | DOPE (b) | QMEANnorm (c) |
|---|---|---|---|---|
| T0356_D1 | 48 | 4.9 | -8908 | 0.242 |
| T0356_D2 | 48 | 4.9 | -13013 | 0.209 |
| T0316_D3 | 49 | 6.0 | -5488 | 0.176 |
| T0356_D3 | 49 | 4.9 | -9274 | 0.285 |
| T0316_D2 | 51 | 5.8 | -2980 | 0.110 |
| T0307_D1 | 62 | 5.5 | -12623 | 0.385 |
| T0307 | 63 | 6.0 | -13816 | 0.407 |
| T0309 | 65 | 5.5 | -4613 | 0.180 |
| T0314 | 65 | 5.4 | -9248 | 0.372 |
| T0296 | 67 | 5.6 | -35782 | 0.260 |
| T0299 | 67 | 5.9 | -17649 | 0.334 |
| T0306 | 68 | 4.8 | -7884 | 0.287 |
| T0316_D1 | 68 | 5.6 | -16102 | 0.214 |
| T0299_D2 | 69 | 6.0 | -7742 | 0.302 |
| T0316 | 69 | 3.8 | -31249 | 0.172 |
| T0299_D1 | 70 | 6.2 | -7641 | 0.250 |
(a) cRMS (over Cα) between the average model and the actual experimental structure for the target.
(b) DOPE scores from MODELLER [94]
(c) QMEAN normalized score [95]
H-factor applied to NMR structures with 20 models or more.
| PDB id | Scoring function (a) | H-factor (%) | cRMS (Å) (b) | |||
|---|---|---|---|---|---|---|
|
|
|
|
| |||
| 1.03 | n/a | 2.13 | 3.43 | |||
| 1.24 | n/a | 1.48 | 3.8 | |||
| 0.74 | n/a | 2.27 | 4.5 | |||
| 1.38 | n/a | 2.33 | 4.67 | |||
| 0.99 | n/a | 2.38 | 4.67 | |||
| 0.37 | n/a | 2.67 | 4.8 | |||
(a) See text for the description of the scoring functions and ; these two functions measure the structural dispersion of the models considered, as well as their similarities to homologue structures in the PDB; note that scores and usually present in the H-factor are meaningless for this test set.
(b) Average RMS between each model and the "mean" structure computed from all the models.
Figure 5Comparison between the H-Factor and the experimental R-factor and R-free. A set of 445 randomly chosen X-ray crystallography structures of 6 or more identical chains have been scored with the H-factor and results are compared to both the experimental R-Factors and R-Frees. R-values are scaled in the interval [0,100%] to allow for direct comparison. Note that we computed a H-factor only from functions and since scores and that are usually in the calculation of the H-factor are meaningless for this test set.
Figure 6Comparison between the H-Factor and ProSA. ProSA-Web analysis of the average model generated for three CASP7 targets: A. T0287 and T0375. B. T0295 and T0295*. The test case T0295* corresponds to T0295 with a modified sequence alignment used to generate the models (see figure 2 for details). The H-factors for the same models are given for comparison.