Literature DB >> 30395601

Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction.

Susann Vorberg1, Stefan Seemayer1, Johannes Söding1.   

Abstract

Compensatory mutations between protein residues in physical contact can manifest themselves as statistical couplings between the corresponding columns in a multiple sequence alignment (MSA) of the protein family. Conversely, large coupling coefficients predict residue contacts. Methods for de-novo protein structure prediction based on this approach are becoming increasingly reliable. Their main limitation is the strong systematic and statistical noise in the estimation of coupling coefficients, which has so far limited their application to very large protein families. While most research has focused on improving predictions by adding external information, little progress has been made to improve the statistical procedure at the core, because our lack of understanding of the sources of noise poses a major obstacle. First, we show theoretically that the expectation value of the coupling score assuming no coupling is proportional to the product of the square roots of the column entropies, and we propose a simple entropy bias correction (EntC) that subtracts out this expectation value. Second, we show that the average product correction (APC) includes the correction of the entropy bias, partly explaining its success. Third, we have developed CCMgen, the first method for simulating protein evolution and generating realistic synthetic MSAs with pairwise statistical residue couplings. Fourth, to learn exact statistical models that reliably reproduce observed alignment statistics, we developed CCMpredPy, an implementation of the persistent contrastive divergence (PCD) method for exact inference. Fifth, we demonstrate how CCMgen and CCMpredPy can facilitate the development of contact prediction methods by analysing the systematic noise contributions from phylogeny and entropy. Using the entropy bias correction, we can disentangle both sources of noise and find that entropy contributes roughly twice as much noise as phylogeny.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 30395601      PMCID: PMC6237422          DOI: 10.1371/journal.pcbi.1006526

Source DB:  PubMed          Journal:  PLoS Comput Biol        ISSN: 1553-734X            Impact factor:   4.475


  54 in total

1.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.

Authors:  Michael Remmert; Andreas Biegert; Andreas Hauser; Johannes Söding
Journal:  Nat Methods       Date:  2011-12-25       Impact factor: 28.547

2.  Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information.

Authors:  Cristina Marino Buslje; Javier Santos; Jose Maria Delfino; Morten Nielsen
Journal:  Bioinformatics       Date:  2009-03-10       Impact factor: 6.937

3.  Learning generative models for protein fold families.

Authors:  Sivaraman Balakrishnan; Hetunandan Kamisetty; Jaime G Carbonell; Su-In Lee; Christopher James Langmead
Journal:  Proteins       Date:  2011-01-25

4.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

5.  All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences.

Authors:  Sikander Hayat; Chris Sander; Debora S Marks; Arne Elofsson
Journal:  Proc Natl Acad Sci U S A       Date:  2015-04-09       Impact factor: 11.205

6.  ACE: adaptive cluster expansion for maximum entropy graphical model inference.

Authors:  J P Barton; E De Leonardis; A Coucke; S Cocco
Journal:  Bioinformatics       Date:  2016-06-21       Impact factor: 6.937

7.  Conservation of residue interactions in a family of Ca-binding proteins.

Authors:  A Godzik; C Sander
Journal:  Protein Eng       Date:  1989-08

8.  Finding coevolving amino acid residues using row and column weighting of mutual information and multi-dimensional amino acid representation.

Authors:  Rodrigo Gouveia-Oliveira; Anders G Pedersen
Journal:  Algorithms Mol Biol       Date:  2007-10-03       Impact factor: 1.405

9.  CCMpred--fast and precise prediction of protein residue-residue contacts from correlated mutations.

Authors:  Stefan Seemayer; Markus Gruber; Johannes Söding
Journal:  Bioinformatics       Date:  2014-07-26       Impact factor: 6.937

10.  Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners.

Authors:  Carlo Baldassi; Marco Zamparo; Christoph Feinauer; Andrea Procaccini; Riccardo Zecchina; Martin Weigt; Andrea Pagnani
Journal:  PLoS One       Date:  2014-03-24       Impact factor: 3.240

View more
  5 in total

1.  Correlations from structure and phylogeny combine constructively in the inference of protein partners from sequences.

Authors:  Andonis Gerardos; Nicola Dietler; Anne-Florence Bitbol
Journal:  PLoS Comput Biol       Date:  2022-05-16       Impact factor: 4.779

2.  Generating functional protein variants with variational autoencoders.

Authors:  Alex Hawkins-Hooker; Florence Depardieu; Sebastien Baur; Guillaume Couairon; Arthur Chen; David Bikard
Journal:  PLoS Comput Biol       Date:  2021-02-26       Impact factor: 4.475

3.  Extracting phylogenetic dimensions of coevolution reveals hidden functional signals.

Authors:  Alexandre Colavin; Esha Atolia; Anne-Florence Bitbol; Kerwyn Casey Huang
Journal:  Sci Rep       Date:  2022-01-17       Impact factor: 4.379

4.  Phylogenetic weighting does little to improve the accuracy of evolutionary coupling analyses.

Authors:  Adam J Hockenberry; Claus O Wilke
Journal:  Entropy (Basel)       Date:  2019-10-12       Impact factor: 2.524

5.  Efficient generative modeling of protein sequences using simple autoregressive models.

Authors:  Jeanne Trinquier; Guido Uguzzoni; Andrea Pagnani; Francesco Zamponi; Martin Weigt
Journal:  Nat Commun       Date:  2021-10-04       Impact factor: 14.919

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.