| Literature DB >> 23530628 |
Emma M Rath1, Dominique Tessier, Alexander A Campbell, Hong Ching Lee, Tim Werner, Noeris K Salam, Lawrence K Lee, W Bret Church.
Abstract
BACKGROUND: Helical membrane proteins are vital for the interaction of cells with their environment. Predicting the location of membrane helices in protein amino acid sequences provides substantial understanding of their structure and function and identifies membrane proteins in sequenced genomes. Currently there is no comprehensive benchmark tool for evaluating prediction methods, and there is no publication comparing all available prediction tools. Current benchmark literature is outdated, as recently determined membrane protein structures are not included. Current literature is also limited to global assessments, as specialised benchmarks for predicting specific classes of membrane proteins were not previously carried out. DESCRIPTION: We present a benchmark server at http://sydney.edu.au/pharmacy/sbio/software/TMH_benchmark.shtml that uses recent high resolution protein structural data to provide a comprehensive assessment of the accuracy of existing membrane helix prediction methods. The server further allows a user to compare uploaded predictions generated by novel methods, permitting the comparison of these novel methods against all existing methods compared by the server. Benchmark metrics include sensitivity and specificity of predictions for membrane helix location and orientation, and many others. The server allows for customised evaluations such as assessing prediction method performances for specific helical membrane protein subtypes.We report results for custom benchmarks which illustrate how the server may be used for specialised benchmarks. Which prediction method is the best performing method depends on which measure is being benchmarked. The OCTOPUS membrane helix prediction method is consistently one of the highest performing methods across all measures in the benchmarks that we performed.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23530628 PMCID: PMC3620685 DOI: 10.1186/1471-2105-14-111
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Benchmark results showing prediction methods and their scores for benchmark measures
| | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| | |||||||||||
| DAS-TMfilter | 81 | 77 | 91 | 96 | 51 | 88 | | | | 62 | 63 |
| DAS1997 (loose) | 82 | 78 | 67 | 89 | 55 | 72 | | | | 57 | 64 |
| DAS1997 (strict) | 89 | 84 | 39 | 85 | 42 | 37 | | | | 67 | 69 |
| DAS2002 | 82 | 78 | 92 | 56 | 90 | | | | 62 | 62 | |
| deltaG | 89 | 87 | 75 | 96 | 63 | 76 | | | | 72 | 68 |
| Eisen (11,10) | 59 | 57 | 11 | 58 | 10 | 5 | | | | 31 | 27 |
| Eisen (19,10) | 41 | 36 | 9 | 51 | 2 | 1 | | | | 17 | 14 |
| Eisen (7,10) | 74 | 71 | 13 | 60 | 16 | 7 | | | | 45 | 39 |
| ENSEMBLE (in MemPype) | 81 | 81 | 90 | 95 | 52 | 89 | 68 | 62 | 80 | 67 | 62 |
| HMM-TM | 63 | 43 | 82 | 60 | 85 | 76 | 77 | 66 | 53 | 53 | |
| HMMTOP (in TOPCONS-single) | 89 | 86 | 96 | 65 | 76 | 75 | 79 | 72 | 71 | ||
| HMMTOP2 | 90 | 86 | 84 | 96 | 66 | 86 | 79 | 77 | 81 | 72 | 72 |
| KyteD (11,10) | 75 | 70 | 21 | 71 | 26 | 16 | | | | 53 | 46 |
| KyteD (19,10) | 58 | 50 | 16 | 66 | 6 | 1 | | | | 30 | 24 |
| KyteD (7,10) | 84 | 78 | 25 | 75 | 36 | 23 | | | | 62 | 59 |
| MemBrain | 79 | 95 | 69 | 82 | | | | 75 | |||
| MEMSAT (in TOPCONS-single) | 87 | 86 | 93 | 94 | 57 | 92 | 69 | 71 | 79 | 71 | 71 |
| MEMSAT-SVM | 88 | 46 | 97 | 11 | 88 | 84 | 83 | ||||
| MEMSAT3 | 88 | 88 | 46 | 96 | 15 | 50 | 66 | ||||
| OCTOPUS | 83 | 85 | |||||||||
| OCTOPUS (in TOPCONS) | 90 | 88 | 92 | 70 | 91 | 75 | |||||
| OHM (11,10) | 72 | 65 | 16 | 69 | 16 | 10 | | | | 42 | 35 |
| OHM (19,10) | 51 | 46 | 12 | 62 | 7 | 2 | | | | 23 | 17 |
| OHM (7,10) | 83 | 79 | 22 | 73 | 23 | 16 | | | | 53 | 47 |
| PHDhtm (at PBIL) | 77 | 69 | 70 | 94 | 47 | 74 | | | | 58 | 56 |
| PHDThtm (at PBIL) | 82 | 78 | 93 | 95 | 48 | 91 | 54 | 55 | 69 | 66 | 67 |
| Philius | 90 | 86 | 69 | 79 | 79 | 83 | 77 | 73 | |||
| Phobius | 87 | 86 | 60 | 93 | 66 | 65 | 78 | 73 | 69 | ||
| PolyPhobius | 92 | 96 | 67 | 92 | 69 | 70 | 80 | 75 | |||
| PRED-TMR | 83 | 81 | 89 | 96 | 55 | 88 | | | | 67 | 68 |
| PRO-TMHMM (in TOPCONS) | 90 | 88 | 66 | 85 | 72 | 73 | |||||
| PRODIV-TMHMM (in TOPCONS) | 43 | 96 | 32 | ||||||||
| S-TMHMM (in TOPCONS-single) | 85 | 85 | 96 | 57 | 93 | 81 | 83 | 82 | 69 | 68 | |
| SCAMPI | 88 | 87 | 94 | 96 | 65 | 93 | 84 | 72 | 71 | ||
| SCAMPI-multi (in TOPCONS) | 90 | 88 | 92 | 70 | 91 | 75 | |||||
| SCAMPI-sequence (in TOPCONS) | 88 | 87 | 94 | 96 | 65 | 93 | 84 | 84 | 82 | 72 | 71 |
| SCAMPI-sequence (in TOPCONS-single) | 88 | 87 | 94 | 96 | 65 | 84 | 84 | 82 | 71 | 72 | |
| SOSUI | 82 | 83 | 93 | 96 | 49 | 90 | | | | 64 | 62 |
| SPLIT4 | 75 | 74 | 87 | 96 | 40 | 84 | | | | 60 | 56 |
| SVMtm | 81 | 84 | 56 | 92 | | | | 65 | 67 | ||
| SVMtop | 86 | 88 | 96 | 52 | 92 | 72 | 75 | 70 | 72 | 65 | |
| TMAP | 84 | 81 | 76 | 55 | 77 | | | | 59 | 52 | |
| TMHMM2 | 86 | 86 | 59 | 93 | 73 | 74 | 80 | 72 | 71 | ||
| TMMOD | 85 | 85 | 56 | 93 | 69 | 72 | 81 | 70 | 68 | ||
| TMpred | 86 | 81 | 64 | 94 | 52 | 65 | 61 | 65 | 69 | 70 | 67 |
| TOPCONS | 88 | 92 | 91 | 77 | |||||||
| TOPCONS-single | 89 | 87 | 96 | 66 | 81 | 80 | 82 | 75 | 75 | ||
| TOPPRED2 | 85 | 82 | 62 | 94 | 55 | 62 | 73 | 72 | 73 | 70 | 69 |
| VALPRED | 76 | 70 | 61 | 94 | 45 | 64 | | | | 42 | 33 |
| VALPRED2 | 54 | 80 | 40 | 65 | | | | 55 | 50 | ||
| waveTM | 88 | 84 | 79 | 96 | 52 | 80 | 62 | 58 | |||
Numbers in brackets refer to the subset of data used in the benchmark as specified in Table 3. The benchmark server metric used for sensitivity is “Qhtm %obs”, for specificity is “Qhtm %prd”, for correctly predicted sequences is “Qok”, for N-terminal topology is “Nterm”, for non-membrane topology segments is “Qio %obs”, and for helix boundaries is “QHb %obs”. The scores of the 5 highest scoring prediction methods are marked in bold (and if there is more than one method having the score of the 5th highest scoring method then all methods having that score are marked in bold). Methods that do not predict topology do not have topology scores.
Sensitivity benchmark results for predictions of families of membrane channels
| | |||||||
|---|---|---|---|---|---|---|---|
| DAS1997 (strict) | 100 | 83 | |||||
| HMMTOP (in TOPCONS-single) | 86 | 99 | 75 | 80 | 97 | ||
| HMMTOP2 | 86 | 99 | 75 | 80 | 97 | ||
| MemBrain | 93 | 89 | 92 | 75 | 92 | ||
| MEMSAT (in TOPCONS-single) | 86 | 99 | 84 | 74 | 95 | ||
| MEMSAT-SVM | 86 | 97 | 86 | 95 | |||
| MEMSAT3 | 86 | 74 | 70 | 92 | |||
| OCTOPUS | 86 | 77 | 89 | ||||
| OCTOPUS (in TOPCONS) | 86 | 97 | 75 | 81 | 97 | ||
| Philius | 86 | 75 | 83 | 75 | 89 | ||
| PolyPhobius | 86 | 73 | 86 | 97 | |||
| PRO-TMHMM (in TOPCONS) | 90 | 99 | 75 | 80 | |||
| PRODIV-TMHMM (in TOPCONS) | 86 | 75 | 91 | ||||
| SCAMPI-multi (in TOPCONS) | 86 | 97 | 75 | 81 | 97 | ||
| SVMtop | 86 | 81 | 76 | 75 | 92 | ||
| TMHMM2 | 86 | 99 | 75 | 75 | 89 | ||
| TMMOD | 86 | 75 | 79 | 75 | 84 | ||
| TOPCONS | 86 | 99 | 75 | 82 | 97 | ||
| TOPCONS-single | 86 | 99 | 75 | 76 | 97 | ||
| VALPRED2 | 95 | 97 | 91 | 75 | |||
Each prediction method's score is given for the channel structure family benchmark data subset as specified in Table 4. The benchmark server metric used is “Qhtm %obs”. The highest score for each family is marked in bold. Results for prediction methods that did not obtain a highest score for any of these specialised channel benchmarks except the Gap Junction benchmark (for which all methods except OPM (19,0) scored 100%) have been omitted.
Characteristics of the subsets of data used for the benchmarks reported in this paper
| | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| (1) TMH_1/2MH_OPM | Y | Y | | | Y | | Y | | 101 | 483 |
| (2) TMH_1/2MH_2008_OPM | Y | Y | | | | Y | Y | | 24 | 191 |
| (3) TMH_1/2MH_BB_SOLB_OPM | Y | Y | Y | Y | Y | | Y | | 599 | 483 |
| (4) TMH_OPM | Y | | | | Y | | Y | | 86 | 372 |
| (5) TMH_BB_SOLB_OPM | Y | | Y | Y | Y | | Y | | 584 | 372 |
| (6) TMH_PDBTM | Y | Y | Y | 86 | 464 | |||||
All these data subsets were restricted to sequences having less than 30% similarity to each other with similarity having been measured by EMBOSS global sequence alignment. Other parameters used to build the data subsets are specified by ticks in the columns. For parameters not specified here the benchmark server default values were used. Legend : TMH : transmembrane helices; ½MH : half-membrane helices; BB : membrane β-barrels; Solb : soluble proteins; All years : sequences used in the benchmark were not restricted by date that the PDB model was made available; 2008 : benchmark was carried out restricting sequences to those belonging to PDB structures deposited 2008 or after and not having any structures of similar sequence deposited before 2008. OPM : benchmark was carried out using OPM-adjusted membrane helix assigments. PDBTM : benchmark was carried out using PDBTM membrane helix assignments without including segments assigned as loops. #seqs : total number of sequences; #MHs : total number of membrane helices.
Counts of sequences and membrane helices of the data subsets used for the specialised benchmarks for channels reported in this paper
| Channels: Formate Nitrate Transporter (FNT) Family | (1) FNT | 3 | 19 | 3 | 22 |
| Channels: Amt/Rh proteins | (2) Amt/Rh | 6 | 67 | 0 | 67 |
| Channels: Aquaporins and Glyceroporins | (3) Aquaporin | 12 | 72 | 24 | 96 |
| Channels: Gap Junctions | (4) Jap junction | 1 | 4 | 0 | 4 |
| Channels: Potassium and Sodium Ion-Selective | (5) K+ channel | 25 | 80 | 25 | 105 |
| Channels: Urea Transporters | (6) Urea transport | 1 | 10 | 2 | 12 |
| Channels: Other Ion Channels | (7) Other | 14 | 37 | 0 | 37 |
The benchmark server's membrane protein structure family selections were used to restrict each benchmark to a membrane channel structure family. There was no restriction on similarity (the benchmark server's similarity level was set to 100%), and for all other options the server's default parameters were used. Legend : #seqs : number of sequences; #TMH : number of transmembrane helices that cross the membrane; #½MH : number of half-membrane helices; Total #MH : total number of membrane helices.