| Literature DB >> 35521551 |
Lei Wang1,2, Haolin Zhong1, Zhidong Xue2,3, Yan Wang1,2.
Abstract
Transmembrane proteins (TMPs) are essential for cell recognition and communication, and they serve as important drug targets in humans. Transmembrane proteins' 3D structures are critical for determining their functions and drug design but are hard to determine even by experimental methods. Although some computational methods have been developed to predict transmembrane helices (TMHs) and orientation, there is still room for improvement. Considering that the pre-trained language model can make full use of massive unlabeled protein sequences to obtain latent feature representation for TMPs and reduce the dependence on evolutionary information, we proposed DeepTMpred, which used pre-trained self-supervised language models called ESM, convolutional neural networks, attentive neural network and conditional random fields for alpha-TMP topology prediction. Compared with the current state-of-the-art tools on a non-redundant dataset of TMPs, DeepTMpred demonstrated superior predictive performance in most evaluation metrics, especially at the TMH level. Furthermore, DeepTMpred could also obtain reliable prediction results for TMPs without much evolutionary feature in a few seconds. A tutorial on how to use DeepTMpred can be found in the colab notebook (https://colab.research.google.com/github/ISYSLAB-HUST/DeepTMpred/blob/master/notebook/test.ipynb).Entities:
Keywords: Topology prediction; Transfer learning; Transmembrane protein
Year: 2022 PMID: 35521551 PMCID: PMC9062415 DOI: 10.1016/j.csbj.2022.04.024
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
The summary of residues and TMHs located on TM and non-TM in training set and test set.
| Residues | 55,949 | 78,724 | 3126 | 3718 |
| TMHs | 2637 | – | 146 | – |
Fig. 1The flowchart of DeepTMpred.
Performance comparisons of DeepTMpred with the three representative state-of-the-art methods on the independent test set.
| MemBrain3.0 | 0.892 | 0.904 | 0.898 | 0.808 | 0.819 | 0.814 | 21 |
| TopGraph | 0.873 | 0.717 | 0.787 | 0.504 | 0.464 | 0.483 | 12 |
| TOPCONS | 0.918 | 0.722 | 0.808 | 0.573 | 0.530 | 0.551 | 12 |
| MEMSAT-SVM | 0.926 | 0.710 | 0.804 | 0.591 | 0.547 | 0.568 | 10 |
| CCTOP | 0.679 | 0.785 | 0.511 | 0.477 | 0.493 | 10 | |
| DeepTMpred-a | 0.874 | 0.895 | 0.857 | 0.863 | 0.860 | 26 | |
| DeepTMpred-b | 0.889 | 0.911 |
Fig. 2TMH properties of amino acids are represented in the output embeddings of the fusion layer(A) and convolutional layer (B), visualized here with t-SNE.
Performance of DeepTMpred for TMP orientation prediction with MEMSAT-SVM and TOPCONS on the independent test set.
| DeepTMpred-a | |||
| DeepTMpred-b | 0.915 | 0.875 | 0.680 |
| TOPCONS | 0.717 | 0.625 | 0.204 |
| MEMSAT-SVM | 0.769 | 0.700 | 0.406 |
Prediction performance of DeepTMpred and the other three methods on the SP1441 soluble protein test set (A: 113 proteins with signal peptides, B: 173 proteins without signal peptides.).
| DeepTMpred | 0.920 | 0.971 |
| MEMSAT-SVM | 0.699 | 0.965 |
| TMHMM2.0 | 0.655 | |
| TOPCONS | 0.971 |
The time complexity comparison between DeepTMpred and other four methods on the 40 test proteins.
| TMHMM2.0 | 4 | – |
| MEMSAT-SVM | 12,557 | UniRef-50 |
| MemBrain3.0* | 38,303 | NR, UniClust-30 |
| TOPCONS | 869 | Pfam, CDD |
| DeepTMpred | 8/6 | – |
Fig. 3Case study of bacteriorhodopsin-I. (A)Molecular structure of bacteriorhodopsin-I (PDB entry: 4pxk) annotated according to DeepTMpred-a prediction: TMHs in green. (B)Molecular structure of bacteriorhodopsin-I (PDB entry: 4pxk) annotated according to DeepTMpred-b prediction: TMHs in red. (C)Comparison between the real topology (OPM database) and prediction of DeepTMpred. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Comparison between the true topology of 4b4aA (OPM database) and that predicted by DeepTMpred, MEMSAT-SVM, MemBrain, and TOPCONS.
| OPM | (11–32) | (61–85) | (100–130) | (147–170) | (187–202) | (206–226) |
| MEMSAT-SVM | (15–40) | (59–87) | (101–130) | (150–173) | (190–205) | (210–225) |
| TOPCONS | (14–34) | (65–85) | (103–123) | (153–173) | (187–207) | (210–230) |
| MemBrain | (6–45) | (59–88) | (100–129) | (149–175) | (182–202) | (206–231) |
| DeepTMpred | (11–31) | (64–85) | (101–130) | (153–173) | (188–203) | (209–227) |