| Literature DB >> 19812766 |
Abstract
alpha-helical transmembrane (TM) proteins play important and diverse functional roles in cells. The ability to predict the topology of these proteins is important for identifying functional sites and inferring function of membrane proteins. This paper presents a Hidden Markov Model (referred to as HMM_RA) that can predict the topology of alpha-helical transmembrane proteins with improved performance. HMM_RA adopts the same structure as the HMMTOP method, which has five modules: inside loop, inside helix tail, membrane helix, outside helix tail and outside loop. Each module consists of one or multiple states. HMM_RA allows using reduced alphabets to encode protein sequences. Thus, each state of HMM_RA is associated with n emission probabilities, where n is the size of the reduced alphabet set. Direct comparisons using two standard data sets show that HMM_RA consistently outperforms HMMTOP and TMHMM in topology prediction. Specifically, on a high-quality data set of 83 proteins, HMM_RA outperforms HMMTOP by up to 7.6% in topology accuracy and 6.4% in alpha-helices location accuracy. On the same data set, HMM_RA outperforms TMHMM by up to 6.4% in topology accuracy and 2.9% in location accuracy. Comparison also shows that HMM_RA achieves comparable performance as Phobius, a recently published method.Entities:
Keywords: Alpha helical transmembrane; HMM; Reduced alphabet; Topology prediction
Year: 2008 PMID: 19812766 PMCID: PMC2735969 DOI: 10.4137/bbi.s358
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Reduced alphabet sets from Murphy et al. (2000).
| LV | C | A | G | S | T | P | F | W | E | D | N | Q | K | H | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IM | Y | R | |||||||||||||
| Murphy_15 | L | C | A | G | S | T | P | F | W | E | D | N | Q | K | H |
| Murphy_10 | L | C | A | G | S | P | F | E | K | H | |||||
| Murphy_8 | L | A | S | P | F | E | K | H | |||||||
| Murphy_4 | L | A | F | E | |||||||||||
| Murphy_2 | L | E |
Each reduced alphabet set is given a name, which includes the author’s name followed by a number denoting the size of the alphabet.
Reduced alphabet sets from Li et al. (2003).
| C | F | M | I | G | P | A | N | Q | R | |
|---|---|---|---|---|---|---|---|---|---|---|
| Y | L | V | T | H | E | K | ||||
| W | S | D | ||||||||
| Li_10 | C | Y | L | V | G | P | S | N | E | K |
| Li_9 | C | L | V | G | P | S | N | E | K | |
| Li_8 | C | L | G | P | S | N | E | K | ||
| Li_7 | C | L | G | P | S | N | K | |||
| Li_6 | C | L | G | P | S | N | ||||
| Li_4 | C | L | G | N |
Each reduced alphabet set is given a name, which includes the author’s name followed by a number denoting the size of the alphabet.
Figure 1Architecture of the HMM_RA. The model has 5 modules: inside loop, inside tail, membrane helix, outside tail and outside loop. Each module consists of one or multiple states.
Figure 2The performance of HMM_RA. (A) Various sets of reduced alphabets from Murphy (2000) were used to encode protein sequences; (B) Various sets of reduced alphabets from Li alphabets (Li et al. 2003) were used to encode protein sequences. Set_160 was used to evaluate the method using multiple sequence mode. AA_20: 20 alphabets were used to encode protein sequences. We named each reduced alphabet set using author’s name followed by a number that denotes the size of the alphabet set.
Figure 3HMM_RA achieves better performance in the high-quality data set. (A) Various sets of reduced alphabets from Murphy (2000) were used to encode protein sequences; (B) Various sets of reduced alphabets from Li alphabets (Li et al. 2003) were used to encode protein sequences. Set_160 and set_83 were used to evaluate the method using multiple sequence mode. AA_20: 20 alphabets were used to encode amino acids. We named each reduced alphabet set using author’s name followed by a number that denotes the size of the alphabet set.
Comparisons of different methods using set_83.
| Input mode | Method | Topology Accuracy | Location Accuracy |
|---|---|---|---|
| Single Sequence | HMM_RA (using Li_8) | ||
| HMMTOP | 74.9% | 77.4% | |
| TMHMM | 77.1% | 83.1% | |
| Multiple Sequences | HMM_RA (using Li_8) | ||
| HMMTOP | 80.0% | 83.6% |
Comparisons of different methods using set_160.
| Input mode | Method | Topology Accuracy | Location Accuracy |
|---|---|---|---|
| Single Sequence | HMM_RA (using Li_8) | ||
| HMMTOP | 75.0% | 80.6% | |
| TMHMM | 76.9% | 83.8% | |
| Multiple Sequences | HMM_RA (using Li_8) | ||
| HMMTOP | 76.3% | 81.9% |