| Literature DB >> 17452345 |
Jimin Pei1, Bong-Hyun Kim, Ming Tang, Nick V Grishin.
Abstract
Multiple sequence alignments are essential in homology inference, structure modeling, functional prediction and phylogenetic analysis. We developed a web server that constructs multiple protein sequence alignments using PROMALS, a progressive method that improves alignment quality by using additional homologs from PSI-BLAST searches and secondary structure predictions from PSIPRED. PROMALS shows higher alignment accuracy than other advanced methods, such as MUMMALS, ProbCons, MAFFT and SPEM. The PROMALS web server takes FASTA format protein sequences as input. The output includes a colored alignment augmented with information about sequence grouping, predicted secondary structures and positional conservation. The PROMALS web server is available at: http://prodata.swmed.edu/promals/Entities:
Mesh:
Substances:
Year: 2007 PMID: 17452345 PMCID: PMC1933189 DOI: 10.1093/nar/gkm227
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Evaluation of alignment methods on SABmark and PREFAB benchmarks
| Method | SABmark- twi(209/7.7) | SABmark- sup(425/8.3) | PREFAB (1682/45.2) |
|---|---|---|---|
| PROMALS | |||
| SPEM | 0.326 | 0.628 | 0.774 |
| MUMMALS | 0.196 | 0.522 | 0.731 |
| ProbCons | 0.166 | 0.485 | 0.716 |
| MAFFT-linsi | 0.184 | 0.510 | 0.722 |
| MUSCLE | 0.136 | 0.433 | 0.680 |
| ClustalW | 0.127 | 0.390 | 0.617 |
Average Q-scores of two SABmark data sets (‘twi’ for ‘twilight zone’ set, ‘sup’ for ‘superfamily’ set) and the PREFAB 4.0 data set are shown. Q-score is the number of correctly aligned residue pairs in the test alignment divided by the total number of aligned residue pairs in the reference alignment. For each data set, the two numbers in the parentheses separated by a slash are the number of alignments tested and the average number of sequences per alignment, respectively. For each data set, PROMALS yields statistically higher accuracy (bold numbers) than any other method (P-value < 0.000001) according to Wilcoxon signed rank test. PROMALS and SPEM use secondary structure prediction and database homologs in alignment process, while the other five methods only utilize the input sequences.
Figure 1.Front page of the PROMALS server. The main section allows the user to paste or upload sequences and enter an email address for the results. Options to modify alignment parameters, PSI-BLAST searches and output format are provided. A brief description of each option is available by clicking on the option's name. A document with detailed description of the server is provided. The stand-alone versions of PROMALS can be downloaded from this page.
Figure 2.An example of colored alignment produced by the PROMALS server. These sequences are adenylate/guanylate cyclase catalytic domains selected from the PFAM database (Accession number: PF00211) (23). The first line in each alignment block begins with ‘Conservation:’ and shows conservation index numbers for conserved positions. The last line in each block begins with ‘Consensus_ss:’ and shows the consensus secondary structure predictions (‘h’: α-helix; ‘e’: β-strand). Each representative sequence has a magenta name and is colored according to PSIPRED secondary structure predictions (red: α-helix, blue: β-strand). A representative sequence and the immediate sequences below it with black names, if there are any, form a closely related group (determined by the option ‘Identity threshold’). Sequences within each group are aligned in a fast way. The groups are aligned using profile consistency with enhanced information from database searches and secondary structure predictions.