Literature DB >> 29069295

PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine.

Reda Rawi1, Raghvendra Mall2, Khalid Kunji2, Chen-Hsiang Shen1, Peter D Kwong1, Gwo-Yu Chuang1.   

Abstract

Motivation: Protein solubility can be a decisive factor in both research and production efficiency, and in silico sequence-based predictors that can accurately estimate solubility outcomes are highly sought.
Results: In this study, we present a novel approach termed PRotein SolubIlity Predictor (PaRSnIP), which uses a gradient boosting machine algorithm as well as an approximation of sequence and structural features of the protein of interest. Based on an independent test set, PaRSnIP outperformed other state-of-the-art sequence-based methods by more than 9% in accuracy and 0.17 in Matthew's correlation coefficient, with an overall accuracy of 74% and Matthew's correlation coefficient of 0.48. Additionally, PaRSnIP provides importance scores for all features used in training. We observed higher fractions of exposed residues to associate positively with protein solubility and tripeptide stretches with multiple histidines to associate negatively with solubility. The improved prediction accuracy of PaRSnIP should enable it to predict protein solubility with greater reliability and to screen for sequence variants with enhanced manufacturability. Availability and implementation: PaRSnIP software is available for download under GitHub (https://github.com/RedaRawi/PaRSnIP). Contact: gwo-yu.chuang@nih.gov. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29069295      PMCID: PMC6031027          DOI: 10.1093/bioinformatics/btx662

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  16 in total

1.  SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.

Authors:  P Bertone; Y Kluger; N Lan; D Zheng; D Christendat; A Yee; A M Edwards; C H Arrowsmith; G T Montelione; M Gerstein
Journal:  Nucleic Acids Res       Date:  2001-07-01       Impact factor: 16.971

2.  Structural proteomics of an archaeon.

Authors:  D Christendat; A Yee; A Dharamsi; Y Kluger; A Savchenko; J R Cort; V Booth; C D Mackereth; V Saridakis; I Ekiel; G Kozlov; K L Maxwell; N Wu; L P McIntosh; K Gehring; M A Kennedy; A R Davidson; E F Pai; M Gerstein; A M Edwards; C H Arrowsmith
Journal:  Nat Struct Biol       Date:  2000-10

3.  Predicting the solubility of recombinant proteins in Escherichia coli.

Authors:  D L Wilkinson; R G Harrison
Journal:  Biotechnology (N Y)       Date:  1991-05

4.  PROSO II--a new method for protein solubility prediction.

Authors:  Pawel Smialowski; Gero Doose; Phillipp Torkler; Stefanie Kaufmann; Dmitrij Frishman
Journal:  FEBS J       Date:  2012-05-21       Impact factor: 5.542

5.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

6.  SOLpro: accurate sequence-based prediction of protein solubility.

Authors:  Christophe N Magnan; Arlo Randall; Pierre Baldi
Journal:  Bioinformatics       Date:  2009-06-23       Impact factor: 6.937

7.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity.

Authors:  Christophe N Magnan; Pierre Baldi
Journal:  Bioinformatics       Date:  2014-05-24       Impact factor: 6.937

8.  Understanding the relationship between the primary structure of proteins and its propensity to be soluble on overexpression in Escherichia coli.

Authors:  Susan Idicula-Thomas; Petety V Balaji
Journal:  Protein Sci       Date:  2005-02-02       Impact factor: 6.725

9.  His tag effect on solubility of human proteins produced in Escherichia coli: a comparison between four expression vectors.

Authors:  Esmeralda A Woestenenk; Martin Hammarström; Susanne van den Berg; Torleif Härd; Helena Berglund
Journal:  J Struct Funct Genomics       Date:  2004

10.  CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors:  Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal:  Bioinformatics       Date:  2012-10-11       Impact factor: 6.937

View more
  26 in total

1.  Bastion3: a two-layer ensemble predictor of type III secreted effectors.

Authors:  Jiawei Wang; Jiahui Li; Bingjiao Yang; Ruopeng Xie; Tatiana T Marquez-Lago; André Leier; Morihiro Hayashida; Tatsuya Akutsu; Yanju Zhang; Kuo-Chen Chou; Joel Selkrig; Tieli Zhou; Jiangning Song; Trevor Lithgow
Journal:  Bioinformatics       Date:  2019-06-01       Impact factor: 6.937

2.  Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity.

Authors:  Qizhi Zhu; Lihua Wang; Ruyu Dai; Wei Zhang; Wending Tang; Yannan Bin; Zeliang Wang; Junfeng Xia
Journal:  Interdiscip Sci       Date:  2021-06-18       Impact factor: 2.233

3.  DeepSol: a deep learning framework for sequence-based protein solubility prediction.

Authors:  Sameer Khurana; Reda Rawi; Khalid Kunji; Gwo-Yu Chuang; Halima Bensmail; Raghvendra Mall
Journal:  Bioinformatics       Date:  2018-08-01       Impact factor: 6.937

Review 4.  Learning Strategies in Protein Directed Evolution.

Authors:  Xavier F Cadet; Jean Christophe Gelly; Aster van Noord; Frédéric Cadet; Carlos G Acevedo-Rocha
Journal:  Methods Mol Biol       Date:  2022

Review 5.  Protein Design: From the Aspect of Water Solubility and Stability.

Authors:  Rui Qing; Shilei Hao; Eva Smorodina; David Jin; Arthur Zalevsky; Shuguang Zhang
Journal:  Chem Rev       Date:  2022-08-03       Impact factor: 72.087

6.  PERISCOPE-Opt: Machine learning-based prediction of optimal fermentation conditions and yields of recombinant periplasmic protein expressed in Escherichia coli.

Authors:  Kulandai Arockia Rajesh Packiam; Chien Wei Ooi; Fuyi Li; Shutao Mei; Beng Ti Tey; Huey Fang Ong; Jiangning Song; Ramakrishnan Nagasundara Ramanan
Journal:  Comput Struct Biotechnol J       Date:  2022-06-03       Impact factor: 6.155

7.  In silico screening and heterologous expression of soluble dimethyl sulfide monooxygenases of microbial origin in Escherichia coli.

Authors:  Prasanth Karaiyan; Catherine Ching Han Chang; Eng-Seng Chan; Beng Ti Tey; Ramakrishnan Nagasundara Ramanan; Chien Wei Ooi
Journal:  Appl Microbiol Biotechnol       Date:  2022-06-17       Impact factor: 5.560

8.  Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework.

Authors:  Yanju Zhang; Ruopeng Xie; Jiawei Wang; André Leier; Tatiana T Marquez-Lago; Tatsuya Akutsu; Geoffrey I Webb; Kuo-Chen Chou; Jiangning Song
Journal:  Brief Bioinform       Date:  2019-11-27       Impact factor: 11.622

9.  Insight into the protein solubility driving forces with neural attention.

Authors:  Daniele Raimondi; Gabriele Orlando; Piero Fariselli; Yves Moreau
Journal:  PLoS Comput Biol       Date:  2020-04-30       Impact factor: 4.475

10.  Solubility and Aggregation of Selected Proteins Interpreted on the Basis of Hydrophobicity Distribution.

Authors:  Magdalena Ptak-Kaczor; Mateusz Banach; Katarzyna Stapor; Piotr Fabian; Leszek Konieczny; Irena Roterman
Journal:  Int J Mol Sci       Date:  2021-05-08       Impact factor: 5.923

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.