Literature DB >> 24860169

SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity.

Christophe N Magnan1, Pierre Baldi1.   

Abstract

MOTIVATION: Accurately predicting protein secondary structure and relative solvent accessibility is important for the study of protein evolution, structure and function and as a component of protein 3D structure prediction pipelines. Most predictors use a combination of machine learning and profiles, and thus must be retrained and assessed periodically as the number of available protein sequences and structures continues to grow.
RESULTS: We present newly trained modular versions of the SSpro and ACCpro predictors of secondary structure and relative solvent accessibility together with their multi-class variants SSpro8 and ACCpro20. We introduce a sharp distinction between the use of sequence similarity alone, typically in the form of sequence profiles at the input level, and the additional use of sequence-based structural similarity, which uses similarity to sequences in the Protein Data Bank to infer annotations at the output level, and study their relative contributions to modern predictors. Using sequence similarity alone, SSpro's accuracy is between 79 and 80% (79% for ACCpro) and no other predictor seems to exceed 82%. However, when sequence-based structural similarity is added, the accuracy of SSpro rises to 92.9% (90% for ACCpro). Thus, by combining both approaches, these problems appear now to be essentially solved, as an accuracy of 100% cannot be expected for several well-known reasons. These results point also to several open technical challenges, including (i) achieving on the order of ≥ 80% accuracy, without using any similarity with known proteins and (ii) achieving on the order of ≥ 85% accuracy, using sequence similarity alone.
AVAILABILITY AND IMPLEMENTATION: SSpro, SSpro8, ACCpro and ACCpro20 programs, data and web servers are available through the SCRATCH suite of protein structure predictors at http://scratch.proteomics.ics.uci.edu.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Substances:

Year:  2014        PMID: 24860169      PMCID: PMC4215083          DOI: 10.1093/bioinformatics/btu352

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  13 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Prediction of coordination number and relative solvent accessibility in proteins.

Authors:  Gianluca Pollastri; Pierre Baldi; Pietro Fariselli; Rita Casadio
Journal:  Proteins       Date:  2002-05-01

3.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

4.  UniRef: comprehensive and non-redundant UniProt reference clusters.

Authors:  Baris E Suzek; Hongzhan Huang; Peter McGarvey; Raja Mazumder; Cathy H Wu
Journal:  Bioinformatics       Date:  2007-03-22       Impact factor: 6.937

5.  Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility.

Authors:  Claudio Mirabello; Gianluca Pollastri
Journal:  Bioinformatics       Date:  2013-06-14       Impact factor: 6.937

6.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

7.  The Dropout Learning Algorithm.

Authors:  Pierre Baldi; Peter Sadowski
Journal:  Artif Intell       Date:  2014-05       Impact factor: 9.088

8.  Scalable web services for the PSIPRED Protein Analysis Workbench.

Authors:  Daniel W A Buchan; Federico Minneci; Tim C O Nugent; Kevin Bryson; David T Jones
Journal:  Nucleic Acids Res       Date:  2013-06-08       Impact factor: 16.971

9.  SCRATCH: a protein structure and structural feature prediction server.

Authors:  J Cheng; A Z Randall; M J Sweredoski; P Baldi
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

10.  Sequence-similar, structure-dissimilar protein pairs in the PDB.

Authors:  Mickey Kosloff; Rachel Kolodny
Journal:  Proteins       Date:  2008-05-01
View more
  89 in total

1.  VIRALpro: a tool to identify viral capsid and tail sequences.

Authors:  Clovis Galiez; Christophe N Magnan; Francois Coste; Pierre Baldi
Journal:  Bioinformatics       Date:  2016-01-05       Impact factor: 6.937

2.  Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12.

Authors:  Chengxin Zhang; S M Mortuza; Baoji He; Yanting Wang; Yang Zhang
Journal:  Proteins       Date:  2017-11-14

3.  Structural introspection of a putative fluoride transporter in plants.

Authors:  Aditya Banerjee; Aryadeep Roychoudhury
Journal:  3 Biotech       Date:  2019-02-22       Impact factor: 2.406

4.  DeepAffinity: interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks.

Authors:  Mostafa Karimi; Di Wu; Zhangyang Wang; Yang Shen
Journal:  Bioinformatics       Date:  2019-09-15       Impact factor: 6.937

5.  Secreted Proteins Defy the Expression Level-Evolutionary Rate Anticorrelation.

Authors:  Felix Feyertag; Patricia M Berninsone; David Alvarez-Ponce
Journal:  Mol Biol Evol       Date:  2017-03-01       Impact factor: 16.240

6.  The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis.

Authors:  Nelson Gil; Andras Fiser
Journal:  Bioinformatics       Date:  2019-01-01       Impact factor: 6.937

7.  Inadequate Reference Datasets Biased toward Short Non-epitopes Confound B-cell Epitope Prediction.

Authors:  Kh Shamsur Rahman; Erfan Ullah Chowdhury; Konrad Sachse; Bernhard Kaltenboeck
Journal:  J Biol Chem       Date:  2016-05-09       Impact factor: 5.157

8.  PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine.

Authors:  Reda Rawi; Raghvendra Mall; Khalid Kunji; Chen-Hsiang Shen; Peter D Kwong; Gwo-Yu Chuang
Journal:  Bioinformatics       Date:  2018-04-01       Impact factor: 6.937

9.  Systematic analysis and prediction of type IV secreted effector proteins by machine learning approaches.

Authors:  Jiawei Wang; Bingjiao Yang; Yi An; Tatiana Marquez-Lago; André Leier; Jonathan Wilksch; Qingyang Hong; Yang Zhang; Morihiro Hayashida; Tatsuya Akutsu; Geoffrey I Webb; Richard A Strugnell; Jiangning Song; Trevor Lithgow
Journal:  Brief Bioinform       Date:  2019-05-21       Impact factor: 11.622

10.  Predicting Proteolysis in Complex Proteomes Using Deep Learning.

Authors:  Matiss Ozols; Alexander Eckersley; Christopher I Platt; Callum Stewart-McGuinness; Sarah A Hibbert; Jerico Revote; Fuyi Li; Christopher E M Griffiths; Rachel E B Watson; Jiangning Song; Mike Bell; Michael J Sherratt
Journal:  Int J Mol Sci       Date:  2021-03-17       Impact factor: 5.923

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.