| Literature DB >> 16872519 |
Ingolf Sommer1, Stefano Toppo, Oliver Sander, Thomas Lengauer, Silvio C E Tosatto.
Abstract
BACKGROUND: In the area of protein structure prediction, recently a lot of effort has gone into the development of Model Quality Assessment Programs (MQAPs). MQAPs distinguish high quality protein structure models from inferior models. Here, we propose a new method to use an MQAP to improve the quality of models. With a given target sequence and template structure, we construct a number of different alignments and corresponding models for the sequence. The quality of these models is scored with an MQAP and used to choose the most promising model. An SVM-based selection scheme is suggested for combining MQAP partial potentials, in order to optimize for improved model selection.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16872519 PMCID: PMC1579234 DOI: 10.1186/1471-2105-7-364
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The parameters used in the different model generating procedures.
| Model Generating Procedure | Description |
| Arby default | Generates the default Arby models [23], exactly one for each target. Default parameters are: gap insertion 14.7, gap extension 0.37, substitution matrix Blosum62, Henikoff position-specific sequence weighting, relative weight of secondary structure is 0.24. |
| Generates models resulting from alignments with parameters multiplied by a factor varying from 0.5 to 1.5 with a step width of 0.25. This results in an ensemble of alternative models for each target. | |
| Generates models resulting from alignments with parameters multiplied by a factor from 0.2 to 2.4 with a step width of 0.2. This results in an ensemble of alternative models for each target. |
Summary statistics for the model generation procedures.
| Model Generation Procedure | Number of Models per Target | Overall Number of Models (∑ | |||
| min | median | mean | max | ||
| Arby default | 1 | 1.0 | 1.0 | 1 | 1612 |
| 1 | 3.0 | 4.3 | 28 | 6980 | |
| 1 | 7.0 | 12.6 | 136 | 20234 | |
How good are the generated models? Description of the distributions of the TM-score quality.
| Model Generation Procedure | |||||
| 0.36 | 0.22 | -0.013 | 0.47 | 0.019 | |
| 0.51 | 0.26 | -0.031 | 0.59 | 0.026 |
Figure 1Overview of model quality improvement with respect to the model difficulty. Left for PVS, right for PVH analogously. Each dot corresponds to a model where the x-coordinate is the TM-score of the corresponding default Arby model and the y-coordinate is the TM-score improvement with respect to this default model. Smoothed quantile lines are shown for the 10% (lower dashed), 50% (middle), 90% (upper dashed) quantiles of the models within a sliding window of size 0.15. Black lines represent all models, red lines represent the models selected using FRST, green lines represent the models selected using the SVM approach. For the smoothing evaluations are made at 1000 equidistant points and the resulting quantiles are smoothed with a lowess function (local linear scatter plot smoother). Interpretation: The TM-score of the Arby default gives an indication of how difficult it is to find the right template for a target. For the selection methods random, FRST, and SVM, this plot shows the potential improvement with respect to difficulty of the target. For PVH, more models are generated below default. For both PVS and PVH, the SVM selection performs better than FRST selection, and FRST performs better than random.
How well does model selection work? Description of distributions when selecting models according to the FRST potentials and the SVM. is the relative frequency of targets for which a selection procedure suggests improved models. <, =, and >, are the relative frequencies of selected models with decreased, equal, or increased TM-score quality, respectively. min qim and max qim are the minimal and maximal quality improvements achieved per target. is the average quality improvement over all targets. is the average quality improvement for the targets that the selection procedure suggests improved models for.
| e | min | max | ||||||
| 0.51 | 0.23 | 0.50 | 0.27 | -0.51 | 0.35 | 0.0016 | 0.0031 | |
| SVM | 0.40 | 0.14 | 0.61 | 0.25 | -0.21 | 0.29 | 0.0064 | 0.0160 |
| e | min | max | ||||||
| 0.70 | 0.35 | 0.31 | 0.34 | -0.43 | 0.29 | 0.00047 | 0.00068 | |
| SVM | 0.58 | 0.22 | 0.43 | 0.35 | -0.27 | 0.35 | 0.00774 | 0.01339 |
Figure 2(Left) Average increase in TM-score, for ranges of difficulty. Targets are binned according to the TM-score of the default Arby model. Within each bin the average increase in quality is plotted. Bins are enumerated horizontally, the two outer bins were concatenated with their neighbors as each contained less than 100 target samples. Models are selected from PVS using the SVM. For comparison the average increase in quality obtained on this benchmark set by performing loop modeling is 0.003. (Right) Maximum increase in TM-score, for the same ranges of difficulty. The maximum increase in quality max qim within each bin is visualized as a line above the box representing the average increase (which is the same as on the left side, just the scale is different).