| Literature DB >> 25947299 |
Igor B Kuznetsov1, Michael McDuffie2.
Abstract
BACKGROUND: Alignment of amino acid sequences is the main sequence comparison method used in computational molecular biology. The selection of the amino acid substitution matrix best suitable for a given alignment problem is one of the most important decisions the user has to make. In a conventional amino acid substitution matrix all elements are fixed and their values cannot be easily adjusted. Moreover, most existing amino acid substitution matrices account for the average (dis)similarities between amino acid types and do not distinguish the contribution of a specific biochemical property to these (dis)similarities.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25947299 PMCID: PMC4477417 DOI: 10.1186/s13104-015-1152-6
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
The optimized property weights and gap penalties for the four default amino acid properties
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| 0-10% | 0.7 | 0.15 | 0.1 | 0.05 | 0.8 | 0.2 | 1,282 |
| 10-20% | 0.3 | 0.2 | 0.15 | 0.35 | 0.6 | 0.1 | 2,023 |
| 20-30% | 0.3 | 0.2 | 0.15 | 0.35 | 0.7 | 0.1 | 1,674 |
|
|
|
|
|
|
|
|
|
| Above 40% | 0.2 | 0.2 | 0.25 | 0.35 | 0.6 | 0.1 | 705 |
The four default amino acid properties are hydrophobicity [29], size [30], coil propensity [31] and the presence of the thiol group. By default, PR2ALIGN uses the combination of property weights and gap penalties optimized for aligning sequences with pair-wise sequence identity between 30 and 40 percent (highlighted in boldface type).the column “N pairs” shows the total number of sequence pairs in each benchmark dataset.
The optimized gap penalties for the VTML200 matrix
|
|
|
|
|
|---|---|---|---|
| 0-10% | −17 | −1 | 1,282 |
| 10-20% | −17 | −1 | 2,023 |
| 20-30% | −16 | −1 | 1,674 |
| 30-40% | −16 | −1 | 1,100 |
| Above 40% | −15 | −1 | 705 |
The column “N pairs” shows the total number of sequence pairs in each benchmark dataset.
Figure 1Comparison of the PR2ALIGN alignments with the alignments obtained using the VTML200 substitution matrix. Results are reported for five ranges of the percentage of pair-wise sequence identity. Green bars show the percentage of sequence pairs on which PR2ALIGN performs better than VTML200. Yellow bars show the percentage of sequence pairs on which PR2ALIGN and VTML200 show identical performance. Red bars show the percentage of sequence pairs on which VTML200 performs better than PR2ALIGN. The comparison was performed using the entire super-family (SUP) subset of the SABmark database [35]: 3,761 sequence pairs for the 0-10% range dataset, 8,521 for 10-20%, 4,181 for 20-30%, 1,952 for 30-40%, and 884 for 40-50%.
Figure 2The input page of the web-server implementation of PR2ALIGN.
Figure 3An example of the PR2ALIGN output for the human prion protein and chicken prion protein.