Literature DB >> 15857247

Linear regression models for solvent accessibility prediction in proteins.

Michael Wagner1, Rafał Adamczak, Aleksey Porollo, Jarosław Meller.   

Abstract

The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 15857247     DOI: 10.1089/cmb.2005.12.355

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  43 in total

1.  Phosphorylated and nonphosphorylated serine and threonine residues evolve at different rates in mammals.

Authors:  Sean Chun-Chang Chen; Feng-Chi Chen; Wen-Hsiung Li
Journal:  Mol Biol Evol       Date:  2010-06-09       Impact factor: 16.240

2.  A robust protocol to map binding sites of the 14-3-3 interactome: Cdc25C requires phosphorylation of both S216 and S263 to bind 14-3-3.

Authors:  Perry M Chan; Yuen-Wai Ng; Ed Manser
Journal:  Mol Cell Proteomics       Date:  2010-12-28       Impact factor: 5.911

3.  A novel computational and structural analysis of nsSNPs in CFTR gene.

Authors:  C George Priya Doss; R Rajasekaran; C Sudandiradoss; K Ramanathan; R Purohit; R Sethumadhavan
Journal:  Genomic Med       Date:  2008-05-14

4.  Accessible surface area from NMR chemical shifts.

Authors:  Noor E Hafsa; David Arndt; David S Wishart
Journal:  J Biomol NMR       Date:  2015-06-16       Impact factor: 2.835

5.  An amino acid packing code for α-helical structure and protein design.

Authors:  Hyun Joo; Archana G Chavan; Jamie Phan; Ryan Day; Jerry Tsai
Journal:  J Mol Biol       Date:  2012-03-15       Impact factor: 5.469

6.  Evolution of domain-peptide interactions to coadapt specificity and affinity to functional diversity.

Authors:  Abdellali Kelil; Emmanuel D Levy; Stephen W Michnick
Journal:  Proc Natl Acad Sci U S A       Date:  2016-06-17       Impact factor: 11.205

7.  The Talpid3 gene (KIAA0586) encodes a centrosomal protein that is essential for primary cilia formation.

Authors:  Yili Yin; Fiona Bangs; I Robert Paton; Alan Prescott; John James; Megan G Davey; Paul Whitley; Grigory Genikhovich; Ulrich Technau; David W Burt; Cheryll Tickle
Journal:  Development       Date:  2009-01-14       Impact factor: 6.868

8.  PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences.

Authors:  Yoichi Murakami; Ruth V Spriggs; Haruki Nakamura; Susan Jones
Journal:  Nucleic Acids Res       Date:  2010-05-27       Impact factor: 16.971

9.  Identifying human kinase-specific protein phosphorylation sites by integrating heterogeneous information from various sources.

Authors:  Tingting Li; Pufeng Du; Nanfang Xu
Journal:  PLoS One       Date:  2010-11-15       Impact factor: 3.240

10.  A multi-factor model for caspase degradome prediction.

Authors:  Lawrence J K Wee; Joo Chuan Tong; Tin Wee Tan; Shoba Ranganathan
Journal:  BMC Genomics       Date:  2009-12-03       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.