Literature DB >> 30304492

The EVcouplings Python framework for coevolutionary sequence analysis.

Thomas A Hopf1,2, Anna G Green1, Benjamin Schubert1,2,3, Sophia Mersmann1, Charlotta P I Schärfe1,4,5, John B Ingraham1, Agnes Toth-Petroczy1, Kelly Brock1, Adam J Riesselman1, Perry Palmedo1,6, Chan Kang1, Robert Sheridan7, Eli J Draizen8, Christian Dallago1,2,9, Chris Sander2,3, Debora S Marks1.   

Abstract

SUMMARY: Coevolutionary sequence analysis has become a commonly used technique for de novo prediction of the structure and function of proteins, RNA, and protein complexes. We present the EVcouplings framework, a fully integrated open-source application and Python package for coevolutionary analysis. The framework enables generation of sequence alignments, calculation and evaluation of evolutionary couplings (ECs), and de novo prediction of structure and mutation effects. The combination of an easy to use, flexible command line interface and an underlying modular Python package makes the full power of coevolutionary analyses available to entry-level and advanced users.
AVAILABILITY AND IMPLEMENTATION: https://github.com/debbiemarkslab/evcouplings.
© The Author(s) 2018. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30304492      PMCID: PMC6499242          DOI: 10.1093/bioinformatics/bty862

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Coevolutionary sequence analysis presents a promising new approach to the long-standing problem of de novo prediction of the 3D structure of proteins and RNAs. In this approach, pairwise graphical models are used to identify evolutionary couplings (ECs) between sites, which frequently correspond to physical contacts in the molecule’s 3D structure. ECs have been used to successfully predict the residue contacts (Balakrishnan ; Ekeberg ; Marks ; Morcos ) and full 3D structure of proteins (Hopf ; Marks ; Ovchinnikov ), RNAs (Weinreb ), complexes (Hopf ; Ovchinnikov ; Weigt ), as well as effects of mutations (Figliuzzi ; Hopf ). However, these applications require integrating multiple tools, data sources and extensive data processing. Available software in this field provides high-performance reimplementations of EC inference tools (Kaján ; Seemayer ; Weinreb ), integration of multiple signals to improve prediction accuracy (Jones ; Skwark ), and a library targeted at format conversion between the different approaches (Simkovic ). To make these methods accessible to a general biological audience, we present a flexible, open source application and Python package for end-to-end evolutionary coupling analysis. EVcouplings, making use of external tools, covers all necessary functionality, including alignment generation, EC calculation, de novo structure and mutation effect prediction, visualization of results, and comparison of predictions to experimental structures.

2 EVcouplings framework

The EVcouplings framework integrates the functionality of the previously published methods EVfold (Hopf ; Marks ), EVcomplex (Hopf ) and EVmutation (Hopf ). It provides (i) an easy-to-use command-line application and (ii) a modular Python package containing all functions, data structures and pipelines that comprise the application. Command-line application: The command-line application allows users to obtain predictions for their proteins and complexes of interest by running the respective EVcouplings pipelines (Fig. 1). Each pipeline is comprised of a series of modular stages that can be configured using a YAML file, which aids reproducibility by documenting all parameters. The pipelines are parallelized and support local multi-process execution as well as commonly used cluster systems, and automatically handles job submission and monitoring. The steps of the prediction pipelines are: , which generates and processes sequence alignments,, which pairs putatively interacting sequences for the protein complex pipeline, which calculates ECs, compare, which compares ECs to experimental structures, , which predicts the effects of mutations, and , which generates de novo 3D models.
Fig. 1.

The EVcouplings Python framework. (a) The protein monomer EVcouplings pipeline entails multiple sequence alignment generation (align stage), EC inference (couplings stage), de novo folding (fold stage), mutation effect prediction (mutate stage) and comparison to experimental structure (compare stage). (b) The protein complex pipeline extends the monomer pipeline to protein interactions by pairing putatively interacting homologs (concatenate stage) and providing restraints for molecular docking (dock stage)

The EVcouplings Python framework. (a) The protein monomer EVcouplings pipeline entails multiple sequence alignment generation (align stage), EC inference (couplings stage), de novo folding (fold stage), mutation effect prediction (mutate stage) and comparison to experimental structure (compare stage). (b) The protein complex pipeline extends the monomer pipeline to protein interactions by pairing putatively interacting homologs (concatenate stage) and providing restraints for molecular docking (dock stage) EVcouplings Python package: The command-line application is built on the underlying evcouplings Python package, whose modular architecture and comprehensive documentation facilitate the development of new stages and pipelines. Additionally, the package serves as a toolbox for handling and analyzing EC-related data. Examples for interactive usage are provided in Jupyter notebooks (Kluyver ) distributed with the package, and extensive documentation is available on the web (http://evcouplings.readthedocs.io).

3 Conclusion

EVcouplings is an open source, integrated pipeline for evolutionary couplings analyses. The underlying API serves as a modular basis for data analysis and will allow developers to rapidly create new workflows.
  17 in total

1.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families.

Authors:  Faruck Morcos; Andrea Pagnani; Bryan Lunt; Arianna Bertolino; Debora S Marks; Chris Sander; Riccardo Zecchina; José N Onuchic; Terence Hwa; Martin Weigt
Journal:  Proc Natl Acad Sci U S A       Date:  2011-11-21       Impact factor: 11.205

2.  Identification of direct residue contacts in protein-protein interaction by message passing.

Authors:  Martin Weigt; Robert A White; Hendrik Szurmant; James A Hoch; Terence Hwa
Journal:  Proc Natl Acad Sci U S A       Date:  2008-12-30       Impact factor: 11.205

3.  Learning generative models for protein fold families.

Authors:  Sivaraman Balakrishnan; Hetunandan Kamisetty; Jaime G Carbonell; Su-In Lee; Christopher James Langmead
Journal:  Proteins       Date:  2011-01-25

4.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models.

Authors:  Magnus Ekeberg; Cecilia Lövkvist; Yueheng Lan; Martin Weigt; Erik Aurell
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2013-01-11

5.  Sequence co-evolution gives 3D contacts and structures of protein complexes.

Authors:  Thomas A Hopf; Charlotta P I Schärfe; João P G L M Rodrigues; Anna G Green; Oliver Kohlbacher; Chris Sander; Alexandre M J J Bonvin; Debora S Marks
Journal:  Elife       Date:  2014-09-25       Impact factor: 8.140

6.  Three-dimensional structures of membrane proteins from genomic sequencing.

Authors:  Thomas A Hopf; Lucy J Colwell; Robert Sheridan; Burkhard Rost; Chris Sander; Debora S Marks
Journal:  Cell       Date:  2012-05-10       Impact factor: 41.582

7.  Protein 3D structure computed from evolutionary sequence variation.

Authors:  Debora S Marks; Lucy J Colwell; Robert Sheridan; Thomas A Hopf; Andrea Pagnani; Riccardo Zecchina; Chris Sander
Journal:  PLoS One       Date:  2011-12-07       Impact factor: 3.240

8.  Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information.

Authors:  Sergey Ovchinnikov; Hetunandan Kamisetty; David Baker
Journal:  Elife       Date:  2014-05-01       Impact factor: 8.140

9.  CCMpred--fast and precise prediction of protein residue-residue contacts from correlated mutations.

Authors:  Stefan Seemayer; Markus Gruber; Johannes Söding
Journal:  Bioinformatics       Date:  2014-07-26       Impact factor: 6.937

10.  FreeContact: fast and free software for protein contact prediction from residue co-evolution.

Authors:  László Kaján; Thomas A Hopf; Matúš Kalaš; Debora S Marks; Burkhard Rost
Journal:  BMC Bioinformatics       Date:  2014-03-26       Impact factor: 3.169

View more
  40 in total

1.  Cluster learning-assisted directed evolution.

Authors:  Yuchi Qiu; Jian Hu; Guo-Wei Wei
Journal:  Nat Comput Sci       Date:  2021-12-09

2.  The SpoVA membrane complex is required for dipicolinic acid import during sporulation and export during germination.

Authors:  Yongqiang Gao; Rocio Del Carmen Barajas-Ornelas; Jeremy D Amon; Fernando H Ramírez-Guadiana; Assaf Alon; Kelly P Brock; Debora S Marks; Andrew C Kruse; David Z Rudner
Journal:  Genes Dev       Date:  2022-06-02       Impact factor: 12.890

3.  Coevolution-derived native and non-native contacts determine the emergence of a novel fold in a universally conserved family of transcription factors.

Authors:  Pablo Galaz-Davison; Diego U Ferreiro; César A Ramírez-Sarmiento
Journal:  Protein Sci       Date:  2022-06       Impact factor: 6.993

4.  OMA1 mediates local and global stress responses against protein misfolding in CHCHD10 mitochondrial myopathy.

Authors:  Mario K Shammas; Xiaoping Huang; Beverly P Wu; Evelyn Fessler; Insung Y Song; Nicholas P Randolph; Yan Li; Christopher Ke Bleck; Danielle A Springer; Carl Fratter; Ines A Barbosa; Andrew F Powers; Pedro M Quirós; Carlos Lopez-Otin; Lucas T Jae; Joanna Poulton; Derek P Narendra
Journal:  J Clin Invest       Date:  2022-07-15       Impact factor: 19.456

5.  Structural evolution of the ancient enzyme, dissimilatory sulfite reductase.

Authors:  Daniel R Colman; Gilles Labesse; Gurla V T Swapna; Johanna Stefanakis; Gaetano T Montelione; Eric S Boyd; Catherine A Royer
Journal:  Proteins       Date:  2022-02-18

6.  Exploring the Evolutionary History of Kinetic Stability in the α-Lytic Protease Family.

Authors:  Charlotte F Nixon; Shion A Lim; Zachary R Sailer; Ivan N Zheludev; Christine L Gee; Brian A Kelch; Michael J Harms; Susan Marqusee
Journal:  Biochemistry       Date:  2021-01-12       Impact factor: 3.162

7.  CoCoNet-boosting RNA contact prediction by convolutional neural networks.

Authors:  Mehari B Zerihun; Fabrizio Pucci; Alexander Schug
Journal:  Nucleic Acids Res       Date:  2021-12-16       Impact factor: 16.971

8.  Deep representation learning improves prediction of LacI-mediated transcriptional repression.

Authors:  Alexander S Garruss; Katherine M Collins; George M Church
Journal:  Proc Natl Acad Sci U S A       Date:  2021-07-06       Impact factor: 12.779

9.  Endolysin Regulation in Phage Mu Lysis.

Authors:  Jake S Chamblee; Jolene Ramsey; Yi Chen; Lori T Maddox; Curtis Ross; Kam H To; Jesse L Cahill; Ry Young
Journal:  mBio       Date:  2022-04-26       Impact factor: 7.786

10.  Identification of hydatidosis-related modules and key regulatory genes.

Authors:  Jijun Song; Mingxin Song
Journal:  PeerJ       Date:  2020-06-18       Impact factor: 2.984

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.