| Literature DB >> 23469223 |
Marcus Stamm1, René Staritzbichler, Kamil Khafizov, Lucy R Forrest.
Abstract
Few sequence alignment methods have been designed specifically for integral membrane proteins, even though these important proteins have distinct evolutionary and structural properties that might affect their alignments. Existing approaches typically consider membrane-related information either by using membrane-specific substitution matrices or by assigning distinct penalties for gap creation in transmembrane and non-transmembrane regions. Here, we ask whether favoring matching of predicted transmembrane segments within a standard dynamic programming algorithm can improve the accuracy of pairwise membrane protein sequence alignments. We tested various strategies using a specifically designed program called AlignMe. An updated set of homologous membrane protein structures, called HOMEP2, was used as a reference for optimizing the gap penalties. The best of the membrane-protein optimized approaches were then tested on an independent reference set of membrane protein sequence alignments from the BAliBASE collection. When secondary structure (S) matching was combined with evolutionary information (using a position-specific substitution matrix (P)), in an approach we called AlignMePS, the resultant pairwise alignments were typically among the most accurate over a broad range of sequence similarities when compared to available methods. Matching transmembrane predictions (T), in addition to evolutionary information, and secondary-structure predictions, in an approach called AlignMePST, generally reduces the accuracy of the alignments of closely-related proteins in the BAliBASE set relative to AlignMePS, but may be useful in cases of extremely distantly related proteins for which sequence information is less informative. The open source AlignMe code is available at https://sourceforge.net/projects/alignme/, and at http://www.forrestlab.org, along with an online server and the HOMEP2 data set.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23469223 PMCID: PMC3587630 DOI: 10.1371/journal.pone.0057731
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Comparison of alignment accuracy when using single input descriptors in AlignMe.
The total alignment accuracy score (AD score) for all α-helical proteins in the HOMEP2 dataset is plotted for each of the input descriptors using their optimized gap penalties, and arranged according to increasing score for different (a) substitution matrices, (b) hydrophobicity scales (with no smoothing), (c) other transmembrane predictions or (d) secondary structure predictions. Sequence segments with hydrophobic, helical or transmembrane scores above a given threshold could be assigned the same (gray bars; without threshold) or different (black bars; with threshold) gap penalty values from segments below that threshold (see Methods for definition of threshold values and abbreviations).
Figure 2Comparison of alignment accuracy when using multiple input descriptors in AlignMe.
Combinations included: (a) PSSMs with hydrophobicity descriptors or transmembrane predictions; (b) secondary structure prediction with PSSMs or transmembrane predictions; or (c) PSSMs, PSIPRED and OCTOPUS together. The scores obtained using PSSMs or PSIPRED alone are indicated with gray lines for reference. Gap penalties were assigned differently to sequence segments above or below a threshold (black bars), and the threshold was defined using the inputs marked by *. For example, in the PSIPRED* & OCTOPUS combination, the threshold was assigned using PSIPRED. See legend for Figure 1 for further details.
Accuracy of alignments generated using different methods on the HOMEP2 data set.
| 0–15% (44) | 15–30% (71) | 30–85% (62) | ||||
| % correct | shift | % correct | shift | % correct | shift | |
| AlignMeP |
| 4.31 |
| 1.15 |
|
|
| AlignMePS |
| 3.35 |
| 1.16 |
| 0.28 |
| AlignMePST |
|
| 70.4 |
| 87.5 |
|
| AlignMePST x-fold | 30.3 | 2.89 | 70.4 | 0.89 | 87.3 | 0.30 |
| MSAProbs | 28.3 | 7.22 | 68.6 | 1.08 | 85.7 |
|
| HHalign | 17.3 | 10.50 | 61.8 | 1.75 | 86.5* |
|
| HMAP | 24.9 | 7.00 | 68.6 | 1.27 | 85.3 | 0.32 |
| MUSCLE | 26.4 | 9.41 | 68.5 | 1.13 | 85.5 | 0.31 |
| Muscle profile-profile | 25.6 | 9.77 | 63.6 | 1.65 | 75.6 | 0.86 |
| ProbCons | 26.7 | 8.30 | 67.0 | 1.34 | 84.2 | 0.31 |
| T-Coffee | 25.3 | 7.55 | 66.5 | 1.27 | 83.4 | 0.32 |
| T-Coffee profile-profile | 14.5 | 35.22 | 55.9 | 2.25 | 70.7 | 1.09 |
Results are sorted according to the level of sequence similarity of the sequence pair, in percentage identity. The number of pairwise alignments is shown in parentheses. The percentage of correctly aligned residues (% correct) and average shift error size (shift) with respect to the structure-based reference alignments (see Methods) are reported. *Values marked with an asterisk in this and all other tables are not significantly different from those of AlignMePST (p-value >0.05) based on a pairwise Wilcoxon signed rank test. All other values are significantly different from those of AlignMePST. Entries in bold in this table, and all subsequent tables, indicate the highest or best scores in that column, including all values that are not significantly different from the best scores.
Percentage of transmembrane segments in the HOMEP2 set that are correctly aligned by each method.
| 0–15% (44) | 15–30% (71) | 30–85% (62) | ||||
|
|
|
|
|
|
| |
| AlignMeP | 93.65 | 52.80 | 98.64 | 95.54 |
|
|
| AlignMePS | 97.00 |
|
|
|
|
|
| AlignMePST |
|
|
|
|
|
|
| MSAProbs | 90.42 | 53.01 |
|
|
|
|
| HHalign | 70.50 | 28.61 | 97.05 | 76.97 |
|
|
| HMAP | 85.83 | 54.31 |
|
|
|
|
| MUSCLE | 82.92 | 49.59 |
| 93.89 |
| 99.04 |
| MUSCLE profile-profile | 82.20 | 48.90 | 98.08 | 86.30 |
| 88.16 |
| ProbCons | 89.73 | 52.17 |
|
|
| 99.04 |
| T-Coffee | 88.02 | 51.18 |
| 95.42 |
| 98.85 |
| T-Coffee profile-profile | 38.32 | 18.75 | 95.56 | 66.68 | 97.33 | 73.12 |
Transmembrane segment definitions are taken from the structures according to the PDB_TM database (see Methods); matching is defined as correct if 50% (f 50) or 90% (f 90) of the residues are aligned. Results are sorted according to the level of sequence similarity of the sequence pair. The number of pairwise alignments is shown in parentheses.
Accuracy of homology models constructed based on HOMEP2 data set alignments from different methods.
| 0–15% (88) | 15–30% (142) | 30–85% (124) | ||||
| GDT_TS | AL4 | GDT_TS | AL4 | GDT_TS | AL4 | |
| AlignMeP | 34.74 | 73.97 |
| 90.75 |
| 97.65 |
| AlignMePS |
|
|
| 90.52 |
| 97.33 |
| AlignMePST | 36.30 |
|
|
|
|
|
| MSAProbs |
| 75.00 |
| 90.81 |
| 97.76 |
| HHalign | 25.08 | 59.06 | 61.38 | 87.71 | 83.12 | 97.63 |
| HMAP |
| 74.97 |
| 90.44 | 83.25 | 97.04 |
| MUSCLE | 32.95 | 69.02 | 66.00 | 90.66 | 82.89 | 97.31 |
| Muscle profile-profile | 32.56 | 69.35 | 62.19 | 88.82 | 75.75 | 94.24 |
| ProbCons | 35.28* | 72.78 |
| 90.22 | 83.29 | 97.46 |
| T-Coffee | 35.30* | 72.20 | 66.78 | 90.42 | 83.38 | 97.57 |
| T-Coffee profile-profile | 18.27 | 37.85 | 59.30 | 86.58 | 73.03 | 92.95 |
| SKA structure-based | 46.38 | 85.42 | 71.12 | 93.99 | 85.51 | 98.18* |
Reference alignments generated by the structure alignment program, SKA. The number of models is shown in parentheses.
Percentage of residues that are correctly aligned in pairwise sequence alignments from the BAliBASE reference set 7, sorted by sequence identity of the protein families.
| ion | Nat | ptga | 7tm | dtd | acr | photo | msl | mean | |
| AlignMeP | 38.9 | 43.5 | 42.1 | 42.5 | 67.1 | 87.0 |
|
| 61.4 |
| AlignMePS | 45.2 |
|
|
|
|
| 87.6 |
|
|
| AlignMePST |
| 58.6 | 58.8 | 59.4 | 71.2 | 86.3 | 82.9 | 76.5 | 67.7 |
| MSAProbs | 24.5 | 53.3 | 45.9 | 54.7 | 64.4 | 89.0 | 73.4 | 70.6 | 59.5 |
| HHalign | 39.1 | 48.9 | 42.3 | 38.4 | 42.7 | 49.5 | 67.3 | 59.9 | 48.5 |
| HMAP | 32.8 | 61.9 | 54.9 | 61.4 | 65.3 | 87.6 | 83.4 | 78.5 | 65.7 |
| MUSCLE | 27.9 | 56.8* | 48.4 | 56.6 | 70.3 | 89.5 | 80.5 | 76.1* | 46.7 |
| MUSCLE profile-profile | 18.5 | 47.1 | 39.7 | 48.2 | 67.4 | 88.5 | 70.4 | 64.1 | 55.5 |
| ProbCons | 23.8 | 52.0 | 44.1 | 54.4 | 63.7 | 88.7 | 69.3 | 66.8 | 57.9 |
| T-Coffee | 25.5 | 50.6 | 44.2 | 55.1 | 63.7 | 88.8 | 67.5 | 67.5 | 57.9 |
| T-Coffee profile-profile | 10.8 | 14.5 | 27.0 | 40.2 | 52.9 | 86.2 | 52.1 | 53.0 | 42.1 |
| Numbera | 1326 | 1711 | 1275 | 8128 | 1485 | 903 | 528 | 91 | |
| Sequence identity (%)b | 11.7±13.8 | 14.3±10.8 | 15.9±12.1 | 18.2±9.7 | 18.7±11.5 | 26.9±11.3 | 27.3±16.9 | 35.3±13.5 |
Mean = mean percentage of correctly-aligned residues over averages for eight families. aNumber of pair-wise alignments. bMean (±standard deviation) of the percentage sequence identity between pairs of alignments in each family.
Percentage of residues that are correctly aligned in pairwise sequence alignments assigned to the same subgroup within the BAliBASE reference set 7, sorted by sequence identity of the alignments in each protein family.
| ion | ptga | 7tm | Nat | acr | msl | dtd | photo | mean | |
| AlignMeP | 62.8 | 83.4 | 67.6 | 80.6 | 93.4 |
|
|
| 81.8 |
| AlignMePS |
|
|
|
|
|
| 89.6 | 94.0 |
|
| AlignMePST | 62.9 | 81.7 | 68.4 | 79.3 | 92.4 | 78.3 | 86.9 | 91.4 | 80.2 |
| MSAProbs | 44.3 | 67.5 | 62.5 | 71.1 | 92.5* | 74.4 | 84.5 | 88.8 | 73.2 |
| HHalign | 51.6 | 52.0 | 43.9 | 64.8 | 56.0 | 58.6 | 66.4 | 84.3 | 59.7 |
| HMAP | 50.6 | 75.2 | 69.2* | 77.5* | 91.7 |
| 82.8 | 90.6* | 77.3 |
| MUSCLE | 47.0 | 72.3 | 62.6 | 72.4 | 93.0 | 78.0* | 85.0 | 88.6 | 74.9 |
| MUSCLE profile-profile | 25.1 | 60.8 | 53.5 | 54.3 | 91.6 | 62.6 | 74.7 | 74.1 | 62.1 |
| ProbCons | 43.8 | 66.5 | 62.1 | 69.7 | 92.2 | 69.9 | 83.7 | 83.6 | 71.4 |
| T-Coffee | 45.9 | 69.8 | 64.7 | 72.5 | 92.2 | 76.8 | 85.2 | 87.0 | 74.3 |
| T-Coffee profile-profile | 45.3 | 66.3 | 63.5 | 70.4 | 92.1 | 71.4 | 84.1 | 83.6 | 72.1 |
| Number | 551 | 559 | 1082 | 282 | 420 | 51 | 84 | 122 | |
| Sequence identity (%) | 22.1±16.6 | 26.7±11.0 | 28.0±20.0 | 31.3±16.7 | 34.4±12.9 | 43.6±12.7 | 49.5±19.1 | 52.2±18.1 |
See legend to Table 4 for more details.
Percentage of residues that are correctly aligned in pairwise sequence alignments assigned to different subgroups within the BAliBASE reference set 7, sorted by sequence identity of the alignments in each protein family.
| ion | ptga | Nat | 7tm | dtd | photo | acr | msl | mean | |
| AlignMeP | 21.9 | 9.9 | 36.2 | 38.6 | 65.7 |
| 81.4 |
| 52.9 |
| AlignMePS | 31.2 |
|
|
|
|
| 86.0 |
|
|
| AlignMePST |
| 41.0 | 54.5 | 58.0 | 70.3 | 80.3 | 81.0 | 74.2 | 62.1 |
| MSAProbs | 10.5 | 29.0 | 49.8 | 53.5 | 63.2 | 68.8 | 85.9 | 65.9 | 53.3 |
| HHalign | 30.2 | 34.8 | 45.8 | 37.6 | 41.3 | 62.2 | 43.8 | 61.6 | 44.6 |
| HMAP | 20.1 | 39.2 | 58.9 | 60.2 | 64.3 | 81.3 | 83.9 | 75.5* | 60.4 |
| MUSCLE | 14.3 | 29.8 | 53.7 | 55.7 | 69.4* | 78.1 |
| 73.7* | 57.7 |
| MUSCLE profile-profile | 13.7 | 23.2 | 45.7 | 47.4 | 67.0 | 69.3 | 85.7 | 66.1 | 52.3 |
| ProbCons | 9.5 | 26.6 | 48.6 | 53.2 | 62.5 | 65.0 | 85.8 | 62.9 | 51.7 |
| T-Coffee | 13.5 | 34.3 | 46.5 | 55.3 | 63.6 | 72.8 | 86.1 | 69.7 | 55.2 |
| T-Coffee profile-profile | 11.5 | 26.9 | 46.7 | 53.8 | 62.5 | 62.6 | 85.9 | 62.7 | 51.6 |
| Number | 775 | 716 | 1429 | 7046 | 1401 | 406 | 483 | 40 | |
| Sequence identity (%) | 4.3±1.0 | 7.5±1.6 | 10.9±3.9 | 16.7±5.4 | 16.8±7.6 | 19.8±5.5 | 20.4±1.6 | 24.7±3.3 |
See legend to Table 4 for more details.
Percentage of residues that are correctly aligned in the predicted transmembrane regions of pairwise sequence alignments from the BAliBASE reference set 7, sorted by protein family name.
| 7tm | acr | dtd | ion | msl | Nat | photo | ptga | mean | |
| AlignMeP | 54.6 | 96.0 | 76.5 | 36.1 | 96.7 | 44.6 | 91.8 | 40.3 | 67.1 |
| AlignMePS | 92.6 |
|
| 58.3 |
|
|
| 67.2 | 84.1 |
| AlignMePST | 87.0 | 95.6 | 86.2 | 57.8 | 95.7 | 64.2 | 93.9 | 58.1 | 79.8 |
| MSAProbs |
| 98.0 | 89.5 | 62.7 |
| 69.5 | 91.7 |
|
|
| HHalign | 51.9 | 37.6 | 51.8 | 37.1 | 76.3 | 50.0 | 71.6 | 31.5 | 51.0 |
| HMAP | 95.1 | 97.6 | 82.8 | 61.5 | 96.0* | 72.4 |
| 69.3 | 83.9 |
| MUSCLE | 89.5 | 97.6 | 89.1 | 49.7 | 95.0* | 64.9 | 91.7 | 57.2 | 79.3 |
| MUSCLE profile-profile | 79.9 | 97.4 | 89.0 | 30.2 | 92.9 | 53.9 | 85.8 | 47.6 | 72.1 |
| ProbCons | 95.7 | 97.9 | 89.6 | 61.6 |
| 67.9 | 90.6 | 69.8 | 83.7 |
| T-Coffee | 95.8 |
| 89.9 |
|
| 66.5 | 88.2 | 69.8 | 83.8 |
| T-Coffee profile-profile | 75.5 | 98.0 | 83.9 | 12.3 | 91.2 | 18.2 | 71.8 | 49.0 | 62.5 |
Mean = mean over averages for eight families.
Average shift error in pairwise alignments of the BAliBASE reference set 7.
| ion | Nat | ptga | 7tm | dtd | acr | photo | msl | mean | |
| AlignMeP | 29.92 | 48.71 | 33.98 | 47.58 | 9.83 | 1.09 |
| 0.59* | 15.38 |
| AlignMePS | 28.83 | 2.46 |
|
|
|
| 0.36 |
| 5.11 |
| AlignMePST |
| 3.24 | 5.39 | 11.82 | 3.46 | 0.42 |
| 0.47 |
|
| MSAProbs | 37.00 | 2.42* | 5.99 | 5.17 | 4.29 | 0.34 | 1.36 | 0.84 | 6.87 |
| HHalign | 15.89 | 4.81 | 7.96 | 9.91 | 6.37 | 1.61 | 0.84 | 1.78 | 6.15 |
| HMAP | 35.66 |
| 6.18 | 4.61 | 6.84 |
| 0.52 | 0.58 | 7.08 |
| MUSCLE | 49.39 | 6.01 | 12.97 | 10.42 | 3.31 | 0.34 | 0.73 | 0.64 | 10.48 |
| MUSCLE profile-profile | 57.33 | 11.53 | 18.23 | 22.06 | 3.86 | 0.40 | 1.28 | 1.20 | 14.49 |
| ProbCons | 41.46 | 3.20 | 7.91 | 5.60 | 4.78 | 0.35* | 1.70 | 1.09 | 8.22 |
| T-Coffee | 39.93 | 4.62 | 6.69 | 4.50 | 4.73 | 0.35* | 1.60 | 1.09 | 7.90 |
| T-Coffee profile-profile | 64.15 | 42.50 | 12.03 | 17.50 | 8.48 | 0.45 | 2.15 | 2.22 | 18.69 |
Families are sorted by the average sequence identity (see Table 4). Mean = mean over averages for eight families.
Average shift error in pairwise alignments assigned to the same subgroup within the BAliBASE reference set 7.
| ion | ptga | 7tm | Nat | acr | msl | dtd | photo | mean | |
| AlignMeP | 12.35 | 0.79 | 16.19 | 1.45* | 0.16* | 0.72 |
| 0.18 | 4.06 |
| AlignMePS | 6.69 | 0.73 |
|
|
|
|
|
|
|
| AlignMePST |
|
| 8.44 | 1.45 | 0.16 |
|
|
| 2.24 |
| MSAProbs | 21.91 | 2.97 | 3.90 | 1.85 | 0.19 | 0.70 | 1.25 | 0.48 | 4.16 |
| HHalign | 6.03 | 3.14 | 8.56 | 2.37 | 1.32 | 1.94 | 2.66 | 0.29 | 3.29 |
| HMAP | 17.91 | 2.03 | 2.93 |
| 0.20 |
| 3.96 | 0.26 | 3.66 |
| MUSCLE | 17.67 | 5.73 | 9.13 | 3.56 | 0.19 | 0.63 | 0.99 | 0.37 | 4.78 |
| MUSCLE profile-profile | 42.37 | 8.06 | 15.17 | 10.81 | 0.23 | 1.31 | 2.36 | 1.01 | 10.16 |
| ProbCons | 23.82 | 3.98 | 4.40 | 2.44 | 0.22 | 1.01 | 1.55 | 0.65 | 4.76 |
| T-Coffee | 19.90 | 1.98 | 3.42 | 2.23 | 0.21 | 0.58 | 1.08 | 0.56 | 3.74 |
| T-Coffee profile-profile | 23.86 | 3.11 | 3.62 | 2.66 | 0.22 | 0.79 | 1.13 | 0.62 | 4.50 |
Families are sorted by the average sequence identity (see Table 5). Mean = mean over averages for eight families.
Average shift error in pairwise alignments assigned to different subgroups within the BAliBASE reference set 7.
| ion | ptga | Nat | 7tm | dtd | photo | acr | msl | mean | |
| AlignMeP | 42.41 | 59.90 | 58.04 | 52.40 | 10.38 |
| 1.90 |
| 28.23 |
| AlignMePS | 44.56 |
|
|
|
| 0.40 | 0.48 |
| 7.39 |
| AlignMePST |
| 9.10 | 3.60 | 12.34 | 3.61 | 0.35 | 0.65 | 0.50 |
|
| MSAProbs | 47.73 | 8.35 | 2.53 | 5.37 | 4.47 | 1.62 | 0.47 | 1.01 | 8.94 |
| HHalign | 22.90 | 11.73 | 5.29 | 10.12 | 6.60 | 1.01 | 1.86 | 1.56 | 7.63 |
| HMAP | 48.28 | 9.43 | 2.06 | 4.87 | 7.01 | 0.60 |
| 0.60* | 9.16 |
| MUSCLE | 71.94 | 18.63 | 6.49 | 10.61 | 3.45 | 0.83 | 0.48 | 0.65 | 14.13 |
| MUSCLE profile-profile | 67.96 | 26.17 | 11.67 | 23.11 | 3.95 | 1.36 | 0.55 | 1.05 | 16.98 |
| ProbCons | 54.01 | 10.98 | 3.35 | 5.81 | 4.97 | 2.01 | 0.46 | 1.20 | 10.35 |
| T-Coffee | 34.91 | 5.23 | 4.90 | 4.32 | 4.05 | 1.67 | 0.44 | 0.75 | 7.03 |
| T-Coffee profile-profile | 51.36 | 9.48 | 5.01 | 4.62 | 4.95 | 1.90 | 0.47 | 1.46 | 9.91 |
Families are sorted by the average sequence identity (see Table 6). Mean = mean over averages for eight families.