| Literature DB >> 19470175 |
Timothy Nugent1, David T Jones.
Abstract
BACKGROUND: Alpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion. Many are also prime drug targets, and it has been estimated that more than half of all drugs currently on the market target membrane proteins. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under-represented in structural databases. In the absence of structural data, sequence-based prediction methods allow TM protein topology to be investigated.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19470175 PMCID: PMC2700806 DOI: 10.1186/1471-2105-10-159
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
SVM per residue accuracy.
| SVM | Window size | Kernel | MCC |
| TM Helix/¬TM Helix | 33 | RBF | 0.80 |
| Inside Loop/Outside Loop | 35 | Polynomial* | 0.63 |
| Re-entrant Helix/¬Re-entrant Helix | 27 | RBF | 0.34 |
| Signal Peptide/¬Signal Peptide | 27 | RBF | 0.76 |
| TM Protein/Globular Protein | 33 | RBF | 0.78 |
Column 2: Window size – the size of the sliding window in residues. Column 3: Kernel – SVM kernel type. RBF = radial basis function. Column 4: MCC – Matthews correlation coefficient. * The Inside Loop/Outside Loop SVM was trained using a third-order polynomial kernel.
Benchmark results for the SVM-based method ('MEMSAT-SVM') against a selection of leading topology predictors
| Method | Algorithm | Correct helix count | Correct helix locations | Correct N-terminal | FP helix | FN helix | Correct SP topology | Correct RE topology | Correct topology |
| MEMSAT-SVM | SVM | 95% | 91% | 91% | 4% | 5% | 93% | 64% | 89% |
| OCTOPUS | NN + HMM | 86% | 83% | 84% | 14% | 2% | 21% | 73% | 79% |
| MEMSAT3 | NN | 84% | 76% | 84% | 8% | 8% | 57% | 64% | 76% |
| ENSEMBLE | NN + HMM | 77% | 76% | 79% | 18% | 5% | 7% | 55% | 67% |
| PHOBIUS | HMM | 75% | 76% | 79% | 9% | 16% | 93% | 36% | 63% |
| HMMTOP | HMM | 77% | 76% | 78% | 18% | 6% | 29% | 64% | 63% |
| PRODIV | HMM | 79% | 64% | 76% | 19% | 8% | 0% | 18% | 57% |
| SVMTOP | SVM | 66% | 64% | 66% | 22% | 22% | 0% | 55% | 53% |
| TMHMM | HMM | 75% | 68% | 72% | 14% | 20% | 29% | 55% | 53% |
| PHDhtm | NN | 75% | 54% | 55% | 23% | 30% | 29% | 18% | 45% |
Column 1: Method – Prediction method. Column 2: Algorithm – Underlying machine-learning algorithm. Column 3: Correct helix count – Fraction of sequences with the correct number of TM helices predicted. Column 4: Correct helix locations – Fraction of sequences with the correct number and locations of TM helices predicted. Column 5: Correct N-terminal – Fraction of sequences with the correct N-terminal location predicted. Column 6: FP helix – Fraction of sequences with at least one over predicted TM helix. Column 7: FN helix – Fraction of sequences with at least one under predicted TM helix. Column 8: Correct SP topology: Fraction of sequences that contain signal peptides that have correct overall topology predicted. Column 9: Correct RE topology: Fraction of sequences that contain re-entrant helices that have correct overall topology predicted. Column 10: Correct topology: Fraction of sequences that have correct overall topology predicted, requiring the correct number and location of TM helices and correct location of the N-terminal. TM helices must overlap their defined positions by at least 5 residues.
Prediction performance using the Möller and TOPDB data sets
| Method | Möller | TOPDB |
| MEMSAT-SVM | 78% | 67% |
| OCTOPUS | 69% | 64% |
| MEMSAT3 | 77% | 66% |
| ENSEMBLE | 61% | 51% |
| PHOBIUS | 67% | 62% |
| HMMTOP | 64% | 57% |
| PRODIV | 46% | 37% |
| SVMTOP | 70% | 42% |
| TMHMM | 60% | 56% |
| PHDhtm | 45% | 49% |
Column 1: Prediction method. Column 2: Results using the Möller data set. Column 3: Results using the TOPDB data set.
Results for TM/globular protein discrimination rates.
| Method | Algorithm | False positive rate | False negative rate |
| MEMSAT-SVM | SVM | 0.00% | 0.44% |
| MEMSAT3 | NN | 0.50% | 0.50% |
| SOSUI | Hydrophobicity analysis | 0.33% | 1.10% |
| OCTOPUS | NN + HMM | 0.00% | 2.51% |
| PHOBIUS | HMM | 2.72% | 0.44% |
The fraction of proteins predicted as transmembrane, and to contain re-entrant helices and signal peptides, in a number of complete genomes.
| Species | Fraction of genome predicted as TM proteins | Fraction of TM proteins predicted to contain re-entrant helices | Fraction of TM proteins predicted to contain signal peptides |
| Caenorhabditis elegans | 33% | 2% | 33% |
| Canis familiaris | 31% | 2% | 27% |
| Danio rerio | 29% | 2% | 26% |
| Drosophila melanogaster | 27% | 2% | 33% |
| Escherichia coli | 24% | 2% | 28% |
| Homo sapiens | 26% | 2% | 35% |
| Mus musculus | 29% | 2% | 30% |
| Pan troglodytes | 26% | 2% | 33% |
| Takifugu rubripes | 33% | 3% | 26% |
| Xenopus tropicalis | 31% | 2% | 23% |
Figure 1Topology prediction results for a number of complete genomes. X-axis: Number of predicted TM helices. Y-axis: Fraction of all predicted TM proteins. Z-axis: Species.
Data set composition.
| Protein class | Number in set |
| Prokaryotic | 92 |
| Eukaryotic | 37 |
| Viral | 2 |
| Single-spanning TM segment | 57 |
| Multiple-spanning TM segments | 74 |
| Contains re-entrant helix | 11 |
| Contains signal peptide | 14 |
| Total | 131 |