| Literature DB >> 26335387 |
Jing Tong1, Jimin Pei2, Nick V Grishin3,4.
Abstract
BACKGROUND: Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26335387 PMCID: PMC4558796 DOI: 10.1186/s12859-015-0711-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Flowchart of the SFESA web server. The sequence that is found to be the closest to the provided structure or the structure database is assigned as the Template (T). The other sequence is assigned as the Query (Q)
Fig. 2An example showing the output of the SFESA server and its ability to improve the alignment. (a) Output of the starting alignment and SFESA-refined alignment with secondary structure and colored alignment block. Predicted secondary structures for the query and the real secondary structures for the template are shown (“H”-Helix, “S”-Strand and “C”- Coil). “Number” shows the position number of the residue above the query and below the template, respectively. “Cm1” and “Cm2” represent the positional differences between the refined alignment and starting alignment. “Cm1” shows the sign of the query residue shifting (“ + ”: query residue shifted towards C-terminus; “-”: query residue shifted towards N-terminus) while “Cm2” shows the query residue shift number. If the query residue is aligned to a gap in both the starting and refined alignments, “Cm1” is left blank and “Cm2” shows the gap character “-”. If the query residue is aligned to one residue in the starting alignment but aligned to a gap in the refined alignment, “Cm1” is left blank and “Cm2” shows “*”. If a template residue is aligned to a gap in the starting alignment, both “Cm1” and “Cm2” are left blank. α-helix alignment blocks are shown alternately in red and orange. β-strand alignment blocks are shown alternately in blue and dark green. The refined alignment blocks are marked with underscores. (b) A table summarizing refinement results for the evaluated alignment blocks. The alignment block number is ordered from N-terminus to C-terminus. The sixth column indicates the refinement results of this alignment block. If refined, a format of “Gap mode [shift number]” is shown. Rows of the refined alignment blocks are colored red. (c) One example of the scoring details of shifts for alignment block number 4. This table contains the original alignment block and all alignment variants. The first column in the table is gap mode. There are three gap modes if there are gaps in this alignment block: Original (no change of the original alignment block), Left (residues in alignment blocks are aligned all the way to the left while all gaps are put to the opposite side before shifting) and Right (residues in alignment blocks are aligned all the way to the right while all gaps are put to the opposite side before shifting). The second column is the shift number. The third column indicates if such a variant is a unique one or the same as a variant shown previously. The fourth column shows the alignment variants with extended residues in both ends. The residues in the original alignment block are colored blue (query) and pink (template). The last four columns show the sequence score, structure score, combined score I and combined score II of each alignment variant. The row colored red corresponds to the alignment variant that is the final choice in the refined alignment. (d). Structure superpositions of query structure models (light grey ribbon) and query real structure (dark grey ribbon). Structure models were generated by MODELLER based on the starting alignment (left panel) and the SFESA-refined alignment (right panel). The strand (“QLNYAFSR”) in alignment block number 4 is highlighted. This strand is shown in red and green in the structure model and the real structure, respectively. Blue spheres and yellow spheres mark the N-terminal boundary (“Q”) and the C-terminal boundary (“R”), respectively
Evaluation of alignment methods on the SABmark benchmark
| Methods | SABmark_TWI (209) | SABmark_SUP (425) |
|---|---|---|
| PROMALS | 46.2 | 71.10 |
| SFESA (O) + PROMALS | 47.3 | 71.30 |
| SFESA (O + G) + PROMALS | 48.0 | 71.80 |
| SFESA (O + G + M) + PROMALS | 47.9 | 71.90 |
| SFESA (O + G + M + S) + PROMALS |
|
|
| HHpred | 40.7 | 68.9 |
| SFESA (O) + HHpred | 40.6 | 69.0 |
| SFESA (O + G) + HHpred | 41.3 | 69.1 |
| SFESA (O + G + M) + HHpred |
|
|
| SFESA (O + G + M + S) + HHpred | 41.3 | 69.4 |
| CNFpred | 41.5 | 66.1 |
| SFESA (O) + CNFpred | 41.6 | 66.4 |
| SFESA (O + G) + CNFpred | 42.3 | 67.0 |
| SFESA (O + G + M) + CNFpred |
|
|
| SFESA (O + G + M + S) + CNFpred | 42.2 | 66.9 |
Average Q-scores of two SABmark [25] data sets (‘TWI’ for ‘Twilight Zone’ set, ‘SUP’ for ‘Superfamilies’ set) are shown. The Q-score is the number of correctly aligned residue pairs in the test alignment divided by the total number of aligned residue pairs in the reference alignment. One pair of domains is selected randomly from each group in the SABmark sets. For each set, the number in the parentheses is the number of alignments tested. Bold numbers indicate the best performance in the subsection