| Literature DB >> 22662161 |
Shweta B Shah1, Nikolaos V Sahinidis.
Abstract
Protein structure alignment is the problem of determining an assignment between the amino-acid residues of two given proteins in a way that maximizes a measure of similarity between the two superimposed protein structures. By identifying geometric similarities, structure alignment algorithms provide critical insights into protein functional similarities. Existing structure alignment tools adopt a two-stage approach to structure alignment by decoupling and iterating between the assignment evaluation and structure superposition problems. We introduce a novel approach, SAS-Pro, which addresses the assignment evaluation and structure superposition simultaneously by formulating the alignment problem as a single bilevel optimization problem. The new formulation does not require the sequentiality constraints, thus generalizing the scope of the alignment methodology to include non-sequential protein alignments. We employ derivative-free optimization methodologies for searching for the global optimum of the highly nonlinear and non-differentiable RMSD function encountered in the proposed model. Alignments obtained with SAS-Pro have better RMSD values and larger lengths than those obtained from other alignment tools. For non-sequential alignment problems, SAS-Pro leads to alignments with high degree of similarity with known reference alignments. The source code of SAS-Pro is available for download at http://eudoxus.cheme.cmu.edu/saspro/SAS-Pro.html.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22662161 PMCID: PMC3360771 DOI: 10.1371/journal.pone.0037493
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Contour plot of the landscape of the RMSD function for 1B00 and 1DBW proteins in the rotation angles plane.
Average (standard deviation) RMSD value, SI score, SAS score, and match with reference alignments for the Sokol and Skolnick data sets for similar and dissimilar protein pairs.
| Sokol set | Skolnick set | |||
| Similar | Dissimilar | Similar | Dissimilar | |
| RMSD | 0.60 (0.4) | 2.9 (1.45) | 1.72 (0.78) | 3.94 (0.6) |
| SI | 1.17 (0.4) | 7.04 (1.45) | 3.15 (1.23) | 9.77 (3.9) |
| SAS | 1.61 (0.7) | 7.37 (1.78) | 2.19 (0.89) | 8.51 (2.9) |
| % agreement with optimal alignment | 96 | N.A. | 96 | N.A. |
Comparison of SAS-Pro with CE, SSM, and STSA for the similar protein pairs of the Sokol and Skolnick data sets using RMSD, SI, and SAS measures.
| % Problems where | ||||||
| SAS-Pro is better | SAS-Pro is at par | |||||
| Solver | RMSD | SI | SAS | RMSD | SI | SAS |
| CE | 57 | 51 | 51 | 12 | 12 | 12 |
| SSM | 47 | 36 | 36 | 12 | 12 | 12 |
| STSA | 44 | 40 | 40 | 21 | 21 | 21 |
| Average (standard deviation) improvement obtained by SAS-Pro (Å) | ||||||
| Solver | RMSD | SI | SAS | RMSD | SI | SAS |
| CE | 0.45 (0.46) | 0.3 (0.41) | 0.3 (0.42) | N.A. | N.A. | N.A. |
| SSM | 0.26 (0.2) | 0.2 (0.12) | 0.16 (0.1) | N.A. | N.A. | N.A. |
| STSA | 0.4 (0.15) | 0.4 (0.15) | 0.21 (0.1) | N.A. | N.A. | N.A. |
The table presents the percentage of problems where SAS-Pro performed better than, or at par with CE, SSM, and STSA. In addition, the table presents the average improvement in the RMSD, SI, SAS scores for these problems when SAS-Pro is used instead of other solvers.
Figure 2Distribution of SAS values obtained by SAS-Pro for similar and dissimilar proteins in the Skolnick data set.
The means (standard deviations) for the similar and dissimilar protein pairs are 2.19 (0.89) and 8.51 (2.9) Å, respectively.
Comparison of performance of alignment tools for aligning 2LH3:A and 2HPD:A proteins.
| Alignment tool | RMSD (Å) | N | SAS |
| SAS-Pro | 3.17 | 126 | 2.5 |
| SARF2 | 3.05 | 108 | 2.8 |
| STSA | 3.37 | 117 | 2.9 |
| STRUCTAL | 2.27 | 56 | 4 |
| CE | 4.05 | 91 | 4.4 |
| DALI | 4.8 | 87 | 5.5 |
(All results, except SAS-Pro, taken from [33].)
Figure 3Box and whisker plot for the performance of different alignment tools for the RIPC data set.
The red line represents the mean and the dot represents the median of the box. (All results, except for SAS-Pro and CE, were taken from [33]).
Figure 4Alignments obtained by SAS-Pro for the RIPC data set.
These alignments are in 100% agreement with the reference alignments [45].