| Literature DB >> 35941534 |
Michael Bernhofer1,2, Burkhard Rost3,4,5.
Abstract
BACKGROUND: Despite the immense importance of transmembrane proteins (TMP) for molecular biology and medicine, experimental 3D structures for TMPs remain about 4-5 times underrepresented compared to non-TMPs. Today's top methods such as AlphaFold2 accurately predict 3D structures for many TMPs, but annotating transmembrane regions remains a limiting step for proteome-wide predictions.Entities:
Keywords: Protein language models; Protein structure prediction; Transmembrane protein prediction
Mesh:
Substances:
Year: 2022 PMID: 35941534 PMCID: PMC9358067 DOI: 10.1186/s12859-022-04873-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Per-protein performance. *
| β-TMP (57) | α-TMP (571) | Globular (5654) | ||||
|---|---|---|---|---|---|---|
| Recall (%) | FPR (%) | Recall (%) | FPR (%) | Recall (%) | FPR (%) | |
| TMbed | 2.8 ± 1.2 | |||||
| DeepTMHMM | 77.9 ± 12.7 | 95.8 ± 1.3 | 5.9 ± 2.2 | |||
| TMSEG | – | – | 96.5 ± 1.0 | 2.3 ± 0.3 | 97.7 ± 0.3 | 3.5 ± 1.0 |
| TOPCONS21 | – | – | 94.2 ± 1.3 | 2.6 ± 0.3 | 97.4 ± 0.3 | 5.8 ± 1.3 |
| OCTOPUS1 | – | – | 94.2 ± 1.9 | 9.1 ± 0.7 | 90.9 ± 0.7 | 5.8 ± 1.9 |
| Philius1 | – | – | 92.5 ± 1.4 | 2.6 ± 0.2 | 97.4 ± 0.2 | 7.5 ± 1.4 |
| PolyPhobius1 | – | – | 97.2 ± 1.1 | 5.3 ± 0.4 | 94.7 ± 0.4 | 2.8 ± 1.1 |
| SPOCTOPUS1 | – | – | 17.2 ± 0.8 | 82.8 ± 0.8 | ||
| SCAMPI2 (MSA) | – | – | 94.2 ± 1.6 | 5.6 ± 0.3 | 94.4 ± 0.3 | 5.8 ± 1.6 |
| CCTOP2 | 96.1 ± 2.1 | 3.7 ± 0.6 | 96.3 ± 0.6 | 3.9 ± 2.1 | ||
| HMM-TM (MSA)3 | – | – | 97.3 ± 1.6 | 21.4 ± 0.5 | 78.6 ± 0.5 | 2.7 ± 1.6 |
| BOCTOPUS2 | 84.0 ± 13.3 | 4.2 ± 0.5 | – | – | 95.8 ± 0.5 | 16.0 ± 13.3 |
| BetAware-Deep | 85.1 ± 9.3 | 4.7 ± 0.3 | – | – | 95.3 ± 0.3 | 14.9 ± 9.3 |
| PRED-TMBB24 | 88.8 ± 12.1 | 7.1 ± 0.4 | – | – | 92.9 ± 0.4 | 11.2 ± 12.1 |
| PROFtmb | 91.9 ± 9.0 | 6.1 ± 0.5 | – | – | 93.9 ± 0.5 | 8.1 ± 9.0 |
*Evaluation of the ability to distinguish between 57 beta barrel TMPs (β-TMP), 571 alpha helical TMPs (α-TMP) and 5654 globular, water-soluble non-TMP proteins in our data set. Recall and false positive rate (FPR) were averaged over the five independent cross-validation test sets; error margins given for the 95% confidence interval (1.96*standard error); bold: best values for each column; italics: differences statistically significant with over 95% confidence (only computed between best and 2nd best, or all methods ranked 1 and those ranked lower)
1Evaluation missing for one of 5,654 globular proteins
2Evaluation missing for one of 571 α-TMPs and six of 5,654 globular proteins
3Evaluation includes only 51 β-TMPs, 552 α-TMPs, and 5,524 globular proteins due to runtime errors
4The local PRED-TMBB2 version did not include the pre-filtering step of the web server. This caused a FPR for β-TMP of almost 78%. Thus, we listed the statistics for the web server predictions, which did not include MSA input
Per-segment performance for TMH (transmembrane helices). *
| TMH (571/2936) | |||||
|---|---|---|---|---|---|
| Recall (%) | Precision (%) | Qok (%) | Qnum (%) | Qtop (%) | |
| TMbed | |||||
| DeepTMHMM | 80.0 ± 2.4 | 80.5 ± 2.4 | 46.2 ± 4.8 | 85.7 ± 3.5 | 96.3 ± 2.2 |
| TMSEG | 74.5 ± 2.4 | 77.1 ± 1.7 | 35.6 ± 2.4 | 69.9 ± 2.7 | 83.8 ± 4.7 |
| TOPCONS2 | 76.4 ± 1.5 | 78.4 ± 0.8 | 41.0 ± 3.1 | 74.4 ± 3.3 | 91.7 ± 3.1 |
| OCTOPUS | 71.6 ± 1.5 | 75.7 ± 1.4 | 36.0 ± 2.8 | 67.6 ± 3.4 | 87.5 ± 3.1 |
| Philius | 70.8 ± 2.2 | 73.7 ± 0.8 | 34.2 ± 3.7 | 66.9 ± 3.4 | 87.5 ± 2.9 |
| PolyPhobius | 76.0 ± 2.1 | 76.4 ± 1.1 | 40.3 ± 3.5 | 74.5 ± 2.8 | 86.8 ± 2.7 |
| SPOCTOPUS | 71.5 ± 1.2 | 75.8 ± 1.2 | 35.7 ± 3.3 | 67.4 ± 5.5 | 87.2 ± 3.4 |
| SCAMPI2 (MSA) | 72.3 ± 2.7 | 74.1 ± 1.5 | 33.5 ± 3.0 | 72.2 ± 4.5 | 90.6 ± 3.5 |
| CCTOP1 | 77.0 ± 1.7 | 79.4 ± 1.0 | 41.9 ± 3.6 | 82.6 ± 2.7 | 92.6 ± 2.6 |
| HMM-TM (MSA)2 | 73.3 ± 1.7 | 72.5 ± 1.2 | 33.5 ± 1.4 | 72.1 ± 3.0 | 88.3 ± 4.2 |
*Segment performance for transmembrane helix (TMH) prediction based on 571 alpha helical TMPs (α-TMP) with a total of 2936 TMHs. Recall, Precision, Qok, Qnum, and Qtop were averaged over the five independent cross-validation test sets; error margins given for the 95% confidence interval (1.96*standard error); bold: best values for each column; italics: differences statistically significant with over 95% confidence (only computed between best and 2nd best).
1Evaluation missing for one of 571 α-TMPs.
2Evaluation includes only 552 of the 571 α-TMPs due to runtime errors of the method.
Per-segment performance for TMB (transmembrane beta strands). *
| TMB (57/768) | |||||
|---|---|---|---|---|---|
| Recall (%) | Precision (%) | Qok (%) | Qnum (%) | Qtop (%) | |
| TMbed | |||||
| DeepTMHMM | 85.9 ± 6.6 | 92.5 ± 4.7 | 46.1 ± 7.6 | 74.3 ± 13.0 | 97.2 ± 4.4 |
| BOCTOPUS2 | 85.3 ± 9.2 | 96.6 ± 2.0 | 56.6 ± 18.9 | 71.2 ± 11.8 | 98.0 ± 2.0 |
| BetAware-Deep | 67.1 ± 6.5 | 62.2 ± 11.4 | 8.7 ± 5.3 | 60.9 ± 14.1 | 95.7 ± 5.4 |
| PRED-TMBB2 (MSA) | 85.4 ± 1.9 | 75.6 ± 4.8 | 18.4 ± 15.0 | 44.5 ± 26.7 | 95.9 ± 3.4 |
| PROFtmb | 78.2 ± 10.1 | 78.0 ± 6.9 | 20.2 ± 12.8 | 46.6 ± 11.7 | 97.2 ± 1.0 |
*Segment performance for transmembrane beta strand (TMB) prediction based on 57 beta barrel TMPs (β-TMP) with a total of 768 TMBs. Recall, Precision, Qok, Qnum, and Qtop were averaged over the five independent cross-validation test sets; error margins given for the 95% confidence interval (1.96*standard error); bold: best values for each column; italics: differences statistically significant with over 95% confidence (only computed between best and 2nd best)
Fig. 1Potential transmembrane proteins in the globular data set. AlphaFold2 [11, 68] structure of extracellular serine protease (P09489) and Lipase 1 (P40601). Transmembrane segments (dark purple) predicted by TMbed correlate well with membrane boundaries (dotted lines: red = outside, blue = inside) predicted by the PPM [45] web server. Images created using Mol* Viewer [71]. Though our data set lists them as globular proteins, the predicted structures indicate transmembrane domains, which align with segments predicted by our method. The predicted domains overlap with autotransporter domains detected by the UniProtKB [46] automatic annotation system. Transmembrane segment predictions were made with the final TMbed ensemble model
Fig. 2New membrane proteins. PDB structures for probable flagellin 1 (Q9YAN8; 7TXI [73]), protein-serine O-palmitoleoyltransferase porcupine (Q9H237; 7URD [74]), choline transporter-like protein 1 (Q8WWI5; 7WWB [75]), S-layer protein SlpA (Q9RRB6; 7ZGY [76]), and membrane protein (P0DTC5; 8CTK [77]). Transmembrane segments (dark purple) predicted by TMbed; membrane boundaries (dotted lines: red = outside, blue = inside) predicted by the PPM [45] web server. Images created using Mol* Viewer [71]