| Literature DB >> 34830151 |
Laszlo Dobson1, Gábor E Tusnády1.
Abstract
Transmembrane proteins (TMPs) play important roles in cells, ranging from transport processes and cell adhesion to communication. Many of these functions are mediated by intrinsically disordered regions (IDRs), flexible protein segments without a well-defined structure. Although a variety of prediction methods are available for predicting IDRs, their accuracy is very limited on TMPs due to their special physico-chemical properties. We prepared a dataset containing membrane proteins exclusively, using X-ray crystallography data. MemDis is a novel prediction method, utilizing convolutional neural network and long short-term memory networks for predicting disordered regions in TMPs. In addition to attributes commonly used in IDR predictors, we defined several TMP specific features to enhance the accuracy of our method further. MemDis achieved the highest prediction accuracy on TMP-specific dataset among other popular IDR prediction methods.Entities:
Keywords: bidirectional long-short term memory; convolutional neural network; deep learning; intrinsically disordered proteins; transmembrane proteins
Mesh:
Substances:
Year: 2021 PMID: 34830151 PMCID: PMC8623522 DOI: 10.3390/ijms222212270
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1(A) Receiver operating characteristic of MemDis and other disorder prediction methods. (B) Averaged performance of membrane-distant predictors. (C) Average performance of membrane proximal predictors.
Figure 2Interpretation of MemDis results. (A) Phospholemman: solution NMR structure, and representation of C-terminal by the prediction of MemDis, CCTOP and FELLS (helical propensity: purple, coil propensity: grey). (B) Integrin beta-3: solution NMR structure, MemDis and CCTOP predictions. The proposed NPxY endocytosis sorting signal is marked with purple, the LIR autophagy motif is marked with an orange box. (C) Mucopilin-1: Electron-microscopy structure, prediction from MemDis and CCTOP. Phosphoserines are marked with green cones below the sequence. The phosphorylation site is marked with a purple box, di-leucine motifs are marked with orange boxes. Cysteines have blue color. Topology is represented both in the structures and topology lines and structures are colored blue, red, yellow and orange (extracellular, cytosolic, transmembrane, and re-entrant loop regions, respectively). Disordered regions from MemDis are marked with green lines on the graphs. Note, only specific regions of the sequences are shown. (D) Detection rate of lipid-binding and non-lipid-binding disordered regions from the MemMoRF database.
Figure 3Data preparation for the training of MemDis. First, we selected protein fragments based on the available PDB information. Extracellular-distant (distance from membrane >15 AA), proximal (<15AA) and intracellular-distant, proximal residues from these fragments were fed into the appropriate CNN, also considering information from residues within 5AA from the residue of interest. The LSTM was trained on the full-length protein fragments considering the preceding 10AA.