Literature DB >> 29047157

Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

Badri Adhikari1, Jie Hou2, Jianlin Cheng2.   

Abstract

In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66.
© 2017 Wiley Periodicals, Inc.

Entities:  

Keywords:  CASP; coevolution; deep learning; machine learning; multiple sequence alignment; protein contact prediction

Mesh:

Substances:

Year:  2017        PMID: 29047157      PMCID: PMC5820155          DOI: 10.1002/prot.25405

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


  22 in total

1.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.

Authors:  Michael Remmert; Andreas Biegert; Andreas Hauser; Johannes Söding
Journal:  Nat Methods       Date:  2011-12-25       Impact factor: 28.547

2.  Integration of QUARK and I-TASSER for Ab Initio Protein Structure Prediction in CASP11.

Authors:  Wenxuan Zhang; Jianyi Yang; Baoji He; Sara Elizabeth Walker; Hongjiu Zhang; Brandon Govindarajoo; Jouko Virtanen; Zhidong Xue; Hong-Bin Shen; Yang Zhang
Journal:  Proteins       Date:  2015-09-23

3.  FT-COMAR: fault tolerant three-dimensional structure reconstruction from protein contact maps.

Authors:  Marco Vassura; Luciano Margara; Pietro Di Lena; Filippo Medri; Piero Fariselli; Rita Casadio
Journal:  Bioinformatics       Date:  2008-04-01       Impact factor: 6.937

4.  Improved residue contact prediction using support vector machines and a large feature set.

Authors:  Jianlin Cheng; Pierre Baldi
Journal:  BMC Bioinformatics       Date:  2007-04-02       Impact factor: 3.169

5.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins.

Authors:  David T Jones; Tanya Singh; Tomasz Kosciolek; Stuart Tetchner
Journal:  Bioinformatics       Date:  2014-11-26       Impact factor: 6.937

6.  CCMpred--fast and precise prediction of protein residue-residue contacts from correlated mutations.

Authors:  Stefan Seemayer; Markus Gruber; Johannes Söding
Journal:  Bioinformatics       Date:  2014-07-26       Impact factor: 6.937

7.  Improved contact predictions using the recognition of protein like contact patterns.

Authors:  Marcin J Skwark; Daniele Raimondi; Mirco Michel; Arne Elofsson
Journal:  PLoS Comput Biol       Date:  2014-11-06       Impact factor: 4.475

8.  Accurate contact predictions using covariation techniques and machine learning.

Authors:  Tomasz Kosciolek; David T Jones
Journal:  Proteins       Date:  2015-08-14

9.  Evaluation of free modeling targets in CASP11 and ROLL.

Authors:  Lisa N Kinch; Wenlin Li; Bohdan Monastyrskyy; Andriy Kryshtafovych; Nick V Grishin
Journal:  Proteins       Date:  2016-01-20

10.  FreeContact: fast and free software for protein contact prediction from residue co-evolution.

Authors:  László Kaján; Thomas A Hopf; Matúš Kalaš; Debora S Marks; Burkhard Rost
Journal:  BMC Bioinformatics       Date:  2014-03-26       Impact factor: 3.169

View more
  6 in total

1.  Driven to near-experimental accuracy by refinement via molecular dynamics simulations.

Authors:  Lim Heo; Collin F Arbour; Michael Feig
Journal:  Proteins       Date:  2019-06-24

2.  High-accuracy protein structures by combining machine-learning with physics-based refinement.

Authors:  Lim Heo; Michael Feig
Journal:  Proteins       Date:  2019-11-15

3.  Assessing the accuracy of contact predictions in CASP13.

Authors:  Rojan Shrestha; Eduardo Fajardo; Nelson Gil; Krzysztof Fidelis; Andriy Kryshtafovych; Bohdan Monastyrskyy; Andras Fiser
Journal:  Proteins       Date:  2019-10-24

4.  Protein Structure Refinement Using Multi-Objective Particle Swarm Optimization with Decomposition Strategy.

Authors:  Cheng-Peng Zhou; Di Wang; Xiaoyong Pan; Hong-Bin Shen
Journal:  Int J Mol Sci       Date:  2021-04-23       Impact factor: 5.923

5.  ComplexContact: a web server for inter-protein contact prediction using deep learning.

Authors:  Hong Zeng; Sheng Wang; Tianming Zhou; Feifeng Zhao; Xiufeng Li; Qing Wu; Jinbo Xu
Journal:  Nucleic Acids Res       Date:  2018-07-02       Impact factor: 16.971

Review 6.  Combined approaches from physics, statistics, and computer science for ab initio protein structure prediction: ex unitate vires (unity is strength)?

Authors:  Marc Delarue; Patrice Koehl
Journal:  F1000Res       Date:  2018-07-24
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.