Literature DB >> 33716309

Mi3-GPU: MCMC-based Inverse Ising Inference on GPUs for protein covariation analysis.

Allan Haldane1, Ronald M Levy2.   

Abstract

Inverse Ising inference is a method for inferring the coupling parameters of a Potts/Ising model based on observed site-covariation, which has found important applications in protein physics for detecting interactions between residues in protein families. We introduce Mi3-GPU ("mee-three", for MCMC Inverse Ising Inference) software for solving the inverse Ising problem for protein-sequence datasets with few analytic approximations, by parallel Markov-Chain Monte-Carlo sampling on GPUs. We also provide tools for analysis and preparation of protein-family Multiple Sequence Alignments (MSAs) to account for finite-sampling issues, which are a major source of error or bias in inverse Ising inference. Our method is "generative" in the sense that the inferred model can be used to generate synthetic MSAs whose mutational statistics (marginals) can be verified to match the dataset MSA statistics up to the limits imposed by the effects of finite sampling. Our GPU implementation enables the construction of models which reproduce the covariation patterns of the observed MSA with a precision that is not possible with more approximate methods. The main components of our method are a GPU-optimized algorithm to greatly accelerate MCMC sampling, combined with a multi-step Quasi-Newton parameter-update scheme using a "Zwanzig reweighting" technique. We demonstrate the ability of this software to produce generative models on typical protein family datasets for sequence lengths L ~ 300 with 21 residue types with tens of millions of inferred parameters in short running times.

Entities:  

Year:  2020        PMID: 33716309      PMCID: PMC7944406          DOI: 10.1016/j.cpc.2020.107312

Source DB:  PubMed          Journal:  Comput Phys Commun        ISSN: 0010-4655            Impact factor:   4.390


  26 in total

1.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families.

Authors:  Faruck Morcos; Andrea Pagnani; Bryan Lunt; Arianna Bertolino; Debora S Marks; Chris Sander; Riccardo Zecchina; José N Onuchic; Terence Hwa; Martin Weigt
Journal:  Proc Natl Acad Sci U S A       Date:  2011-11-21       Impact factor: 11.205

2.  Adaptive cluster expansion for inferring boltzmann machines with noisy data.

Authors:  S Cocco; R Monasson
Journal:  Phys Rev Lett       Date:  2011-03-02       Impact factor: 9.161

3.  Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution.

Authors:  D D Pollock; W R Taylor
Journal:  Protein Eng       Date:  1997-06

4.  Influence of multiple-sequence-alignment depth on Potts statistical models of protein covariation.

Authors:  Allan Haldane; Ronald M Levy
Journal:  Phys Rev E       Date:  2019-03       Impact factor: 2.529

5.  Large pseudocounts and L2-norm penalties are necessary for the mean-field inference of Ising and Potts models.

Authors:  J P Barton; S Cocco; E De Leonardis; R Monasson
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2014-07-28

6.  ACE: adaptive cluster expansion for maximum entropy graphical model inference.

Authors:  J P Barton; E De Leonardis; A Coucke; S Cocco
Journal:  Bioinformatics       Date:  2016-06-21       Impact factor: 6.937

7.  Structural propensities of kinase family proteins from a Potts model of residue co-variation.

Authors:  Allan Haldane; William F Flynn; Peng He; R S K Vijayan; Ronald M Levy
Journal:  Protein Sci       Date:  2016-06-26       Impact factor: 6.725

8.  Translating HIV sequences into quantitative fitness landscapes predicts viral vulnerabilities for rational immunogen design.

Authors:  Andrew L Ferguson; Jaclyn K Mann; Saleha Omarjee; Thumbi Ndung'u; Bruce D Walker; Arup K Chakraborty
Journal:  Immunity       Date:  2013-03-21       Impact factor: 31.745

Review 9.  Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models.

Authors:  Richard R Stein; Debora S Marks; Chris Sander
Journal:  PLoS Comput Biol       Date:  2015-07-30       Impact factor: 4.475

10.  From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.

Authors:  Simona Cocco; Remi Monasson; Martin Weigt
Journal:  PLoS Comput Biol       Date:  2013-08-22       Impact factor: 4.475

View more
  4 in total

1.  Unique features of different classes of G-protein-coupled receptors revealed from sequence coevolutionary and structural analysis.

Authors:  Hung N Do; Allan Haldane; Ronald M Levy; Yinglong Miao
Journal:  Proteins       Date:  2021-10-09

2.  Limits to detecting epistasis in the fitness landscape of HIV.

Authors:  Avik Biswas; Allan Haldane; Ronald M Levy
Journal:  PLoS One       Date:  2022-01-18       Impact factor: 3.240

3.  The generative capacity of probabilistic protein sequence models.

Authors:  Francisco McGee; Sandro Hauri; Quentin Novinger; Slobodan Vucetic; Ronald M Levy; Vincenzo Carnevale; Allan Haldane
Journal:  Nat Commun       Date:  2021-11-02       Impact factor: 14.919

4.  Efficient generative modeling of protein sequences using simple autoregressive models.

Authors:  Jeanne Trinquier; Guido Uguzzoni; Andrea Pagnani; Francesco Zamponi; Martin Weigt
Journal:  Nat Commun       Date:  2021-10-04       Impact factor: 14.919

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.