| Literature DB >> 18503087 |
Jimin Pei1, Ming Tang, Nick V Grishin.
Abstract
Multiple sequence alignments are essential in computational sequence and structural analysis, with applications in homology detection, structure modeling, function prediction and phylogenetic analysis. We report PROMALS3D web server for constructing alignments for multiple protein sequences and/or structures using information from available 3D structures, database homologs and predicted secondary structures. PROMALS3D shows higher alignment accuracy than a number of other advanced methods. Input of PROMALS3D web server can be FASTA format protein sequences, PDB format protein structures and/or user-defined alignment constraints. The output page provides alignments with several formats, including a colored alignment augmented with useful information about sequence grouping, predicted secondary structures and consensus sequences. Intermediate results of sequence and structural database searches are also available. The PROMALS3D web server is available at: http://prodata.swmed.edu/promals3d/.Entities:
Mesh:
Year: 2008 PMID: 18503087 PMCID: PMC2447800 DOI: 10.1093/nar/gkn322
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Evaluation of alignment methods on SABmark and PREFAB benchmarks
| Method | SABmark-twi (209/7.7) | SABmark-sup (425/8.3) | PREFAB (1682/45.2) |
|---|---|---|---|
| PROMALS3D | 0.616 | 0.812 | 0.900 |
| PROMALS | 0.391 | 0.665 | 0.790 |
| SPEM ( | 0.326 | 0.628 | 0.774 |
| MUMMALS ( | 0.196 | 0.522 | 0.731 |
| ProbCons ( | 0.166 | 0.485 | 0.716 |
| MAFFT | 0.184 | 0.510 | 0.722 |
| MUSCLE | 0.136 | 0.433 | 0.680 |
| ClustalW | 0.127 | 0.390 | 0.617 |
Average Q-scores of two SABmark (25) data sets (‘twi’ for ‘twilight zone’ set, ‘sup’ for ‘superfamilies’ set) and the PREFAB 4.0 (5) data set are shown. Q-score is the number of correctly aligned residue pairs in the test alignment divided by the total number of aligned residue pairs in the reference alignment. For each data set, the two numbers in the parentheses separated by a slash are the number of alignments tested and the average number of sequences per alignment, respectively. For each data set, PROMALS3D yields statistically higher accuracy than any other method (P-value <0.000001) according to Wilcoxon signed rank test.
Figure 1.Deducing alignment constraints using homologs with 3D structures (homolog3Ds). S1 and S2 are two target sequences. T1 and T2 are their homolog3Ds. The alignment between two sequences S1 and S2 is deduced from two sequence-based sequence-to-homolog3D alignments and one structure-based homolog3D-to-homolog3D alignment. The three aligned residue pairs (A, B), (B, C) and (C, D) indicate that the pair (A, D) is aligned in the deduced alignment between two targets.
Figure 2.(a) Output page of the PROMALS3D server. This page allows access of PROMALS3D alignments in three formats with various adjustable parameters. Links to inputs and intermediate results are also provided. (b) An example of colored alignment. This alignment is for input of four SH2 domain sequences selected from PFAM database (23) and four SH2 structures (pdb ids are 1aya, 1jyr, 1lkk and 1mil). The first line in each alignment block begins with ‘Conservation:’ and shows conservation index numbers for conserved positions. The line in each block beginning with ‘Consensus_ss:’ shows the consensus secondary structure predictions (‘h’: α-helix; ‘e’: β-strand). The line in each block beginning with ‘Consensus_aa’ shows consensus amino acids. If the weighted frequency of certain type of residues is above a certain threshold, the consensus symbol of that type is displayed. Symbols are provided for the following types: conserved amino acid residues: bold and uppercase letters; aliphatic residues (I, V, L): l; aromatic residues (Y, H, W, F): @; hydrophobic residues (W, F, Y, M, L, I, V, A, C, T, H): h; alcohol residues (S, T): o; polar residues (D, E, H, K, N, Q, R, S, T): p; tiny residues (A, G, C, S): t; small residues (A, G, C, S, V, N, D, T, P): s; bulky residues (E, F, I, K, L, M, Q, R, W, Y): b; positively charged residues (K, R, H): +; negatively charged residues (D, E): −; charged (D, E, K, R, H): c. Each representative sequence has a magenta name and is colored according to PSIPRED secondary structure predictions (red: α-helix, blue: β-strand). A representative sequence and the immediate sequences below it with black names, if there are any, form a closely related group and they are aligned in the first stage.