| Literature DB >> 18287115 |
Jimin Pei1, Bong-Hyun Kim, Nick V Grishin.
Abstract
Although multiple sequence alignments (MSAs) are essential for a wide range of applications from structure modeling to prediction of functional sites, construction of accurate MSAs for distantly related proteins remains a largely unsolved problem. The rapidly increasing database of spatial structures is a valuable source to improve alignment quality. We explore the use of 3D structural information to guide sequence alignments constructed by our MSA program PROMALS. The resulting tool, PROMALS3D, automatically identifies homologs with known 3D structures for the input sequences, derives structural constraints through structure-based alignments and combines them with sequence constraints to construct consistency-based multiple sequence alignments. The output is a consensus alignment that brings together sequence and structural information about input proteins and their homologs. PROMALS3D can also align sequences of multiple input structures, with the output representing a multiple structure-based alignment refined in combination with sequence constraints. The advantage of PROMALS3D is that it gives researchers an easy way to produce high-quality alignments consistent with both sequences and structures of proteins. PROMALS3D outperforms a number of existing methods for constructing multiple sequence or structural alignments using both reference-dependent and reference-independent evaluation methods.Entities:
Mesh:
Year: 2008 PMID: 18287115 PMCID: PMC2367709 DOI: 10.1093/nar/gkn072
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Flowchart of PROMALS3D method.
Tests on SABmark database
| Method | SABmark-twi (209/10667) | SABmark-sup (425/19092) | ||
|---|---|---|---|---|
| GDT-TS | GDT-TS | |||
| PROMALS3D (D + S) | 0.602 | 0.805 | 0.417 | |
| PROMALS3D (F + S) | 0.555 | 0.220 | 0.779 | 0.390 |
| PROMALS3D (T + S) | 0.540 | 0.249 | 0.766 | 0.412 |
| PROMALS3D (D + F + S) | 0.611 | 0.256 | 0.414 | |
| PROMALS3D (D + T + S) | 0.603 | 0.805 | ||
| PROMALS3D (F + T + S) | 0.595 | 0.251 | 0.800 | 0.413 |
| PROMALS3D (D + F + T + S) | 0.260 | 0.420 | ||
| 3DCoffee (D + S) | 0.574 | 0.252 | 0.802 | |
| 3DCoffee (SAP + S) | 0.553 | 0.222 | 0.786 | 0.390 |
| Expresso webserver | 0.508 | 0.206 | – | – |
| PROMALS3D (D/2 + S) | 0.475 | 0.198 | 0.716 | 0.364 |
| 3DCoffee (D/2 + S) | 0.261 | 0.100 | 0.573 | 0.294 |
| 3DCoffee (D/2 + SAP) | 0.255 | 0.095 | 0.572 | 0.289 |
| PROMALS | 0.393 | 0.154 | 0.665 | 0.336 |
| SPEM | 0.326 | 0.124 | 0.628 | 0.318 |
| MUMMALS | 0.196 | 0.081 | 0.522 | 0.278 |
| ProbCons | 0.166 | 0.058 | 0.485 | 0.246 |
| MAFFT-linsi | 0.184 | 0.070 | 0.510 | 0.264 |
| MUSCLE | 0.136 | 0.056 | 0.433 | 0.233 |
| T-Coffee | 0.134 | 0.048 | 0.429 | 0.223 |
| ClustalW | 0.127 | 0.057 | 0.390 | 0.221 |
| MUSTANG | 0.550 | 0.230 | 0.779 | 0.404 |
| PROMALS3D (D) | 0.594 | 0.252 | 0.802 | 0.415 |
The first 13 methods for MSAs use both sequence and 3D structural information. The last two methods assemble multiple alignments solely from structural constraints. The other methods construct multiple alignments using only sequence information (PROMALS and SPEM also use predicted secondary structures). The letters inside the parenthesis after the method names are: ‘D’, using DaliLite structural constraints; ‘F’, using FAST structural constraints; ‘T’, using TM-align structural constraints; ‘S’, using sequence information; ‘SAP’, using SAP structural alignments; ‘D/2’, using DaliLite alignments for half of the sequences; ‘SAP/2’, using SAP alignments for half of the sequences. Q-score is the alignment quality score defined as the number of correctly aligned residue pairs divided by the total number of residue pairs in a reference alignment. GDT-TS is a reference-independent measure of alignment quality based on structural similarity of two structures superimposed according to a test alignment. The ‘twi’ stands for ‘twilight-zone’ set and ‘sup’ stands for ‘superfamilies’ set. The number of multiple alignment tests and pairwise reference alignments are shown in parentheses. The best scores are in bold letters.
Figure 2.The effect of using distant homolog3Ds on SABmark ‘superfamilies’ set.
Test on PREFAB database
| Method | Set 1 (0.121/420) | Set 2 (0.185/421) | Set 3 (0.248/420) | Set 4 (0.527/421) | All (0.270/1682) |
|---|---|---|---|---|---|
| PROMALS3D (D + S) | 0.817 | 0.879 | 0.954 | 0.893 | |
| PROMALS3D (F + S) | 0.745 | 0.850 | 0.896 | 0.947 | 0.859 |
| PROMALS3D (T + S) | 0.766 | 0.856 | 0.902 | 0.950 | 0.869 |
| PROMALS3D (D + F + S) | 0.818 | 0.886 | 0.919 | 0.952 | 0.894 |
| PROMALS3D (D + T + S) | 0.834 | 0.884 | 0.922 | 0.953 | 0.898 |
| PROMALS3D (F + T + S) | 0.794 | 0.875 | 0.909 | 0.952 | 0.883 |
| PROMALS3D (D + F + T + S) | 0.917 | ||||
| PROMALS | 0.570 | 0.771 | 0.875 | 0.946 | 0.790 |
| SPEM | 0.536 | 0.756 | 0.865 | 0.940 | 0.774 |
| MUMMALS | 0.457 | 0.693 | 0.834 | 0.939 | 0.731 |
| ProbCons | 0.428 | 0.672 | 0.826 | 0.936 | 0.716 |
| MAFFT-linsi | 0.443 | 0.681 | 0.826 | 0.938 | 0.722 |
| MUSCLE | 0.372 | 0.631 | 0.787 | 0.930 | 0.680 |
| ClustalW | 0.299 | 0.536 | 0.726 | 0.906 | 0.617 |
The first seven methods (PROMALS3D) for MSAs use both sequence and 3D structural information. The other methods construct multiple alignments using only sequence information (PROMALS and SPEM also use predicted secondary structures). For the meaning of the letters inside the parenthesis after the method names, refer to Table 1. Average Q-score (see Table 1 for definition) is reported. The total 1682 PREFAB alignments are divided to four semi-equal-sized sets according to sequence identity of the reference alignment. The average sequence identity and the number of alignments are in parentheses beneath the set names. The best scores are in bold letters.