| Literature DB >> 20122279 |
Ivaylo Ilinkin1, Jieping Ye, Ravi Janardan.
Abstract
BACKGROUND: An algorithm is presented to compute a multiple structure alignment for a set of proteins and to generate a consensus (pseudo) protein which captures common substructures present in the given proteins. The algorithm represents each protein as a sequence of triples of coordinates of the alpha-carbon atoms along the backbone. It then computes iteratively a sequence of transformation matrices (i.e., translations and rotations) to align the proteins in space and generate the consensus. The algorithm is a heuristic in that it computes an approximation to the optimal alignment that minimizes the sum of the pairwise distances between the consensus and the transformed proteins.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20122279 PMCID: PMC2829528 DOI: 10.1186/1471-2105-11-71
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Algorithm MAPSCI: Multiple Alignment of Protein Structures and Consensus Identification
| 1. Choose initial consensus structure |
| 2. Do |
| 3. if |
| 4. else use standard dynamic programming to align |
| 5. |
| 6. Compute correspondence |
| 7. Compute optimal translation matrix |
| 8. Post-process ℳ |
| 9. Compute new consensus structure |
| 10. Until |
Figure 1Web server screenshots. Screenshots from the web server: main page (top left), results page (bottom left), structure view (top right), sequence view (bottom right).
Remote access to the server
An example of using the programming language Python to retrieve the transformed coordinates (in PDB format) for the multiple alignment of the structures from the HOMSTRAD CUB family. Additional examples and the complete set of options for remote access can be found at the server web page (see the Availability section).
Figure 2HOMSTRAD dataset comparison. Comparison based on the strict core metric (expressed in percent of the size of the shortest protein) and the strict core RMSD on the HOMSTRAD dataset.
Figure 3SABmark dataset comparison. Comparison based on the strict core metric (expressed in percent of the size of the shortest protein) and the strict core RMSD on the SABmark dataset.
Benchmark datasets performance
| HOMSTRAD | SABmark | |||
|---|---|---|---|---|
| Average Core (%) | Average Core RMSD | Average Core (%) | Average Core RMSD | |
| MAPSCI | 70.99 | 0.83( | 48.89 | 1.00( |
| MAMMOTH | 66.74 | 0.83( | 44.55 | 0.99( |
| MATT | 63.79 | 0.85( | 47.88 | 0.99( |
Statistics for the performance of the three methods on the benchmark datasets. The subscripts in the Average Core RMSD columns indicate how many values were used in computing the statistics, since the algorithms failed to compute a core for some of the data sets. For the Average Core (%) columns all reported values were used and therefore n = 232 and n = 425 for the HOMSTRAD and SABmark datasets, respectively.
Figure 4Execution time. The actual execution time of MAPSCI for all families in the benchmark datasets plotted in terms of the total number of residues per family.
Figure 5Consensus choice comparison. Comparison between the sizes of the aligned cores for different choices of initial consensus protein.