Literature DB >> 29554211

DeepSol: a deep learning framework for sequence-based protein solubility prediction.

Sameer Khurana1, Reda Rawi2, Khalid Kunji3, Gwo-Yu Chuang2, Halima Bensmail3, Raghvendra Mall3.   

Abstract

Motivation: Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence.
Results: DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins. Availability and implementation: DeepSol's best performing models and results are publicly deposited at https://doi.org/10.5281/zenodo.1162886 (Khurana and Mall, 2018). Supplementary information: Supplementary data are available at Bioinformatics online.

Mesh:

Substances:

Year:  2018        PMID: 29554211      PMCID: PMC6355112          DOI: 10.1093/bioinformatics/bty166

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  24 in total

1.  SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.

Authors:  P Bertone; Y Kluger; N Lan; D Zheng; D Christendat; A Yee; A M Edwards; C H Arrowsmith; G T Montelione; M Gerstein
Journal:  Nucleic Acids Res       Date:  2001-07-01       Impact factor: 16.971

2.  Predicting the solubility of recombinant proteins in Escherichia coli.

Authors:  D L Wilkinson; R G Harrison
Journal:  Biotechnology (N Y)       Date:  1991-05

3.  Protein solubility: sequence based prediction and experimental verification.

Authors:  Pawel Smialowski; Antonio J Martin-Galiano; Aleksandra Mikolajka; Tobias Girschick; Tad A Holak; Dmitrij Frishman
Journal:  Bioinformatics       Date:  2006-12-06       Impact factor: 6.937

4.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

5.  SOLpro: accurate sequence-based prediction of protein solubility.

Authors:  Christophe N Magnan; Arlo Randall; Pierre Baldi
Journal:  Bioinformatics       Date:  2009-06-23       Impact factor: 6.937

6.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

Authors:  Sheng Wang; Siqi Sun; Zhen Li; Renyu Zhang; Jinbo Xu
Journal:  PLoS Comput Biol       Date:  2017-01-05       Impact factor: 4.475

7.  DeepSF: deep convolutional neural network for mapping protein sequences to folds.

Authors:  Jie Hou; Badri Adhikari; Jianlin Cheng
Journal:  Bioinformatics       Date:  2018-04-15       Impact factor: 6.937

8.  Exploring sequence characteristics related to high-level production of secreted proteins in Aspergillus niger.

Authors:  Bastiaan A van den Berg; Marcel J T Reinders; Marc Hulsman; Liang Wu; Herman J Pel; Johannes A Roubos; Dick de Ridder
Journal:  PLoS One       Date:  2012-10-01       Impact factor: 3.240

9.  CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors:  Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal:  Bioinformatics       Date:  2012-10-11       Impact factor: 6.937

10.  Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics.

Authors:  Ehsaneddin Asgari; Mohammad R K Mofrad
Journal:  PLoS One       Date:  2015-11-10       Impact factor: 3.240

View more
  35 in total

1.  Establishing synthesis pathway-host compatibility via enzyme solubility.

Authors:  Sara A Amin; Venkatesh Endalur Gopinarayanan; Nikhil U Nair; Soha Hassoun
Journal:  Biotechnol Bioeng       Date:  2019-03-29       Impact factor: 4.530

Review 2.  Stepwise optimization of recombinant protein production in Escherichia coli utilizing computational and experimental approaches.

Authors:  Kulandai Arockia Rajesh Packiam; Ramakrishnan Nagasundara Ramanan; Chien Wei Ooi; Lakshminarasimhan Krishnaswamy; Beng Ti Tey
Journal:  Appl Microbiol Biotechnol       Date:  2020-02-19       Impact factor: 4.813

3.  Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods.

Authors:  Fuyi Li; Yanan Wang; Chen Li; Tatiana T Marquez-Lago; André Leier; Neil D Rawlings; Gholamreza Haffari; Jerico Revote; Tatsuya Akutsu; Kuo-Chen Chou; Anthony W Purcell; Robert N Pike; Geoffrey I Webb; A Ian Smith; Trevor Lithgow; Roger J Daly; James C Whisstock; Jiangning Song
Journal:  Brief Bioinform       Date:  2019-11-27       Impact factor: 11.622

4.  Prediction of Protein Solubility Based on Sequence Feature Fusion and DDcCNN.

Authors:  Xianfang Wang; Yifeng Liu; Zhiyong Du; Mingdong Zhu; Aman Chandra Kaushik; Xue Jiang; Dongqing Wei
Journal:  Interdiscip Sci       Date:  2021-07-08       Impact factor: 2.233

5.  DeepSol: a deep learning framework for sequence-based protein solubility prediction.

Authors:  Sameer Khurana; Reda Rawi; Khalid Kunji; Gwo-Yu Chuang; Halima Bensmail; Raghvendra Mall
Journal:  Bioinformatics       Date:  2018-08-01       Impact factor: 6.937

Review 6.  Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies.

Authors:  Rahmad Akbar; Habib Bashour; Puneet Rawat; Philippe A Robert; Eva Smorodina; Tudor-Stefan Cotet; Karine Flem-Karlsen; Robert Frank; Brij Bhushan Mehta; Mai Ha Vu; Talip Zengin; Jose Gutierrez-Marcos; Fridtjof Lund-Johansen; Jan Terje Andersen; Victor Greiff
Journal:  MAbs       Date:  2022 Jan-Dec       Impact factor: 5.857

7.  Enhancement of the solubility of recombinant proteins by fusion with a short-disordered peptide.

Authors:  Jun Ren; Suhee Hwang; Junhao Shen; Hyeongwoo Kim; Hyunjoo Kim; Jieun Kim; Soyoung Ahn; Min-Gyun Kim; Seung Ho Lee; Dokyun Na
Journal:  J Microbiol       Date:  2022-07-14       Impact factor: 2.902

8.  Cluster learning-assisted directed evolution.

Authors:  Yuchi Qiu; Jian Hu; Guo-Wei Wei
Journal:  Nat Comput Sci       Date:  2021-12-09

9.  Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field.

Authors:  Jalil Villalobos-Alva; Luis Ochoa-Toledo; Mario Javier Villalobos-Alva; Atocha Aliseda; Fernando Pérez-Escamirosa; Nelly F Altamirano-Bustamante; Francine Ochoa-Fernández; Ricardo Zamora-Solís; Sebastián Villalobos-Alva; Cristina Revilla-Monsalve; Nicolás Kemper-Valverde; Myriam M Altamirano-Bustamante
Journal:  Front Bioeng Biotechnol       Date:  2022-07-07

10.  Solubility and Aggregation of Selected Proteins Interpreted on the Basis of Hydrophobicity Distribution.

Authors:  Magdalena Ptak-Kaczor; Mateusz Banach; Katarzyna Stapor; Piotr Fabian; Leszek Konieczny; Irena Roterman
Journal:  Int J Mol Sci       Date:  2021-05-08       Impact factor: 5.923

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.