| Literature DB >> 30393651 |
Xi Yin1,2, Jing Yang1,2, Feng Xiao1,2, Yang Yang3,4, Hong-Bin Shen5,6.
Abstract
Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels, transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments, accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called MemBrain, whose input is the amino acid sequence. MemBrain consists of specialized modules for predicting transmembrane helices, residue-residue contacts and relative accessible surface area of α-helical membrane proteins. MemBrain achieves a prediction accuracy of 97.9% of A TMH, 87.1% of A P, 3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. MemBrain-Contact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction, respectively. And MemBrain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of 13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins. MemBrain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/MemBrain/.Entities:
Keywords: Contact map prediction; Machine learning; Relative accessible surface area; Structure prediction; Transmembrane α-helices
Year: 2017 PMID: 30393651 PMCID: PMC6199043 DOI: 10.1007/s40820-017-0156-2
Source DB: PubMed Journal: Nanomicro Lett ISSN: 2150-5551
Fig. 1The gap between known protein sequences and structures is rapidly expanding
Fig. 2A screenshot of the submission interface of MemBrain web server (www.csbio.sjtu.edu.cn/bioinf/MemBrain/)
Fig. 3The pipeline of MemBrain for predicting transmembrane α-helices. For a query sequence, we generate the position specific scoring matrix as input features by searching against SWISS-PROT database using the PSI-BLAST tool. The OET-KNN algorithm is employed as the classifier with fused different sizes of sliding window for extracting features. Median filter is applied to smooth the profile of predicted probabilities. Finally, the dynamic threshold is effectively used to optimize the results of prediction
Fig. 4The pipeline of MemBrain-Contact for predicting TMH–TMH contact map. We extract the TMH locations and topologies from protein database to build the training dataset. The coevolved mutation analysis by PSICOV using multiple sequence alignment generated by PSI-BLAST and machine learning-based algorithm outputs are combined to generate the final predictions
Fig. 5The flowchart of MemBrain-Rasa prediction protocol. For a protein sequence, we extract six kinds of sequential features, which will be fed into the SVR classifier. We also designed a segment template similarity-based prediction engine for searching similar segments as templates for the target sequence against a locally constructed structure data pool. The outputs of the two engines are combined together to improve the prediction of relative accessible surface area