Literature DB >> 34236625

Prediction of Protein Solubility Based on Sequence Feature Fusion and DDcCNN.

Xianfang Wang1, Yifeng Liu2, Zhiyong Du3, Mingdong Zhu4, Aman Chandra Kaushik5, Xue Jiang6, Dongqing Wei7.   

Abstract

BACKGROUND: Prediction of protein solubility is an indispensable prerequisite for pharmaceutical research and production. The general and specific objective of this work is to design a new model for predicting protein solubility by using protein sequence feature fusion and deep dual-channel convolutional neural networks (DDcCNN) to improve the performance of existing prediction models.
METHODS: The redundancy of raw protein is reduced by CD-HIT. The four subsequences are built from protein sequence: one global and three locals. The global subsequence is the entire protein sequence, and these local subsequences are obtained by moving a sliding window with some rules. Using G-gap to extract the features of the above four subsequences, a mixed matrix is constructed as the input of one channel which is composed of three-layer convolutional operating. Additional features are extracted by SCRATCH tool as input of another channel, which is consist of a single convolution in order to find hidden relationships and improve the accuracy of predictor. The outputs of two parallel channels are concatenated as the input of the hidden layer. And the prediction of protein solubility is obtained in the output layer. The best protein solubility prediction model is obtained by doing some comparative experiments of different frameworks.
RESULTS: The performance indicators of DDcCNN model (our designed) are as follows: accuracy of 77.82%, Matthew's correlation coefficient of 0.57, sensitivity of 76.13% and specificity of 79.32%. The results of some comparative experiments show that the overall performance of DDcCNN model is better than existing models (GCNN, LCNN and PCNN). The related models and data are publicly deposited at http://www.ddccnn.wang .
CONCLUSION: The satisfactory performance of DDcCNN model reveals that these features and flexible computational methodologies can reinforce the existing prediction models for better prediction of protein solubility could be applied in several applications, such as to preselect initial targets that are soluble or to alter solubility of target proteins, thus can help to reduce the production cost.

Entities:  

Keywords:  Deep dual-channel convolutional neural network; Deep learning; Feature fusion; Protein solubility

Year:  2021        PMID: 34236625     DOI: 10.1007/s12539-021-00456-1

Source DB:  PubMed          Journal:  Interdiscip Sci        ISSN: 1867-1462            Impact factor:   2.233


  15 in total

1.  PROSO II--a new method for protein solubility prediction.

Authors:  Pawel Smialowski; Gero Doose; Phillipp Torkler; Stefanie Kaufmann; Dmitrij Frishman
Journal:  FEBS J       Date:  2012-05-21       Impact factor: 5.542

2.  Prediction of apoptosis protein subcellular localization via heterogeneous features and hierarchical extreme learning machine.

Authors:  S Zhang; T Zhang; C Liu
Journal:  SAR QSAR Environ Res       Date:  2019-02-26       Impact factor: 3.000

3.  A deep learning approach for filtering structural variants in short read sequencing data.

Authors:  Yongzhuang Liu; Yalin Huang; Guohua Wang; Yadong Wang
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

4.  New fusion protein systems designed to give soluble expression in Escherichia coli.

Authors:  G D Davis; C Elisee; D M Newham; R G Harrison
Journal:  Biotechnol Bioeng       Date:  1999-11-20       Impact factor: 4.530

5.  Solubility prediction from first principles: a density of states approach.

Authors:  Simon Boothroyd; Andy Kerridge; Anders Broo; David Buttar; Jamshed Anwar
Journal:  Phys Chem Chem Phys       Date:  2018-08-15       Impact factor: 3.676

6.  Computer-Aided Diagnosis in Histopathological Images of the Endometrium Using a Convolutional Neural Network and Attention Mechanisms.

Authors:  Hao Sun; Xianxu Zeng; Tao Xu; Gang Peng; Yutao Ma
Journal:  IEEE J Biomed Health Inform       Date:  2019-10-01       Impact factor: 5.772

7.  MU-LOC: A Machine-Learning Method for Predicting Mitochondrially Localized Proteins in Plants.

Authors:  Ning Zhang; R S P Rao; Fernanda Salvato; Jesper F Havelund; Ian M Møller; Jay J Thelen; Dong Xu
Journal:  Front Plant Sci       Date:  2018-05-23       Impact factor: 5.753

Review 8.  Inferring Protein-Protein Interaction Networks From Mass Spectrometry-Based Proteomic Approaches: A Mini-Review.

Authors:  Kumar Yugandhar; Shagun Gupta; Haiyuan Yu
Journal:  Comput Struct Biotechnol J       Date:  2019-06-20       Impact factor: 7.271

9.  ccSOL omics: a webserver for solubility prediction of endogenous and heterologous expression in Escherichia coli.

Authors:  Federico Agostini; Davide Cirillo; Carmen Maria Livi; Riccardo Delli Ponti; Gian Gaetano Tartaglia
Journal:  Bioinformatics       Date:  2014-07-01       Impact factor: 6.937

10.  Codon usage clusters correlation: towards protein solubility prediction in heterologous expression systems in E. coli.

Authors:  Leonardo Pellizza; Clara Smal; Guido Rodrigo; Martín Arán
Journal:  Sci Rep       Date:  2018-07-13       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.