Literature DB >> 29311320

Power law tails in phylogenetic systems.

Chongli Qin1, Lucy J Colwell2.   

Abstract

Covariance analysis of protein sequence alignments uses coevolving pairs of sequence positions to predict features of protein structure and function. However, current methods ignore the phylogenetic relationships between sequences, potentially corrupting the identification of covarying positions. Here, we use random matrix theory to demonstrate the existence of a power law tail that distinguishes the spectrum of covariance caused by phylogeny from that caused by structural interactions. The power law is essentially independent of the phylogenetic tree topology, depending on just two parameters-the sequence length and the average branch length. We demonstrate that these power law tails are ubiquitous in the large protein sequence alignments used to predict contacts in 3D structure, as predicted by our theory. This suggests that to decouple phylogenetic effects from the interactions between sequence distal sites that control biological function, it is necessary to remove or down-weight the eigenvectors of the covariance matrix with largest eigenvalues. We confirm that truncating these eigenvectors improves contact prediction.

Keywords:  phylogeny; power law; protein; sequence covariance; structure prediction

Mesh:

Substances:

Year:  2018        PMID: 29311320      PMCID: PMC5789915          DOI: 10.1073/pnas.1711913115

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  36 in total

1.  Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis.

Authors:  W R Atchley; K R Wollenberg; W M Fitch; W Terhalle; A W Dress
Journal:  Mol Biol Evol       Date:  2000-01       Impact factor: 16.240

2.  Genomics-aided structure prediction.

Authors:  Joanna I Sułkowska; Faruck Morcos; Martin Weigt; Terence Hwa; José N Onuchic
Journal:  Proc Natl Acad Sci U S A       Date:  2012-06-12       Impact factor: 11.205

3.  Weights for data related by a tree.

Authors:  S F Altschul; R J Carroll; D J Lipman
Journal:  J Mol Biol       Date:  1989-06-20       Impact factor: 5.469

4.  Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus.

Authors:  D Altschuh; A M Lesk; A C Bloomer; A Klug
Journal:  J Mol Biol       Date:  1987-02-20       Impact factor: 5.469

5.  Translating HIV sequences into quantitative fitness landscapes predicts viral vulnerabilities for rational immunogen design.

Authors:  Andrew L Ferguson; Jaclyn K Mann; Saleha Omarjee; Thumbi Ndung'u; Bruce D Walker; Arup K Chakraborty
Journal:  Immunity       Date:  2013-03-21       Impact factor: 31.745

6.  Three-dimensional structures of membrane proteins from genomic sequencing.

Authors:  Thomas A Hopf; Lucy J Colwell; Robert Sheridan; Burkhard Rost; Chris Sander; Debora S Marks
Journal:  Cell       Date:  2012-05-10       Impact factor: 41.582

7.  Population structure and eigenanalysis.

Authors:  Nick Patterson; Alkes L Price; David Reich
Journal:  PLoS Genet       Date:  2006-12       Impact factor: 5.917

8.  Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information.

Authors:  Sergey Ovchinnikov; Hetunandan Kamisetty; David Baker
Journal:  Elife       Date:  2014-05-01       Impact factor: 8.140

9.  Accurate prediction of protein-protein interactions from sequence alignments using a Bayesian method.

Authors:  Lukas Burger; Erik van Nimwegen
Journal:  Mol Syst Biol       Date:  2008-02-12       Impact factor: 11.429

10.  The Pfam protein families database: towards a more sustainable future.

Authors:  Robert D Finn; Penelope Coggill; Ruth Y Eberhardt; Sean R Eddy; Jaina Mistry; Alex L Mitchell; Simon C Potter; Marco Punta; Matloob Qureshi; Amaia Sangrador-Vegas; Gustavo A Salazar; John Tate; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2015-12-15       Impact factor: 16.971

View more
  19 in total

1.  Genome-wide discovery of epistatic loci affecting antibiotic resistance in Neisseria gonorrhoeae using evolutionary couplings.

Authors:  Benjamin Schubert; Rohan Maddamsetti; Jackson Nyman; Maha R Farhat; Debora S Marks
Journal:  Nat Microbiol       Date:  2018-12-03       Impact factor: 17.745

2.  Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction.

Authors:  Susann Vorberg; Stefan Seemayer; Johannes Söding
Journal:  PLoS Comput Biol       Date:  2018-11-05       Impact factor: 4.475

3.  The accumulative law and its probability model: an extension of the Pareto distribution and the log-normal distribution.

Authors:  Minyu Feng; Liang-Jian Deng; Feng Chen; Matjaž Perc; Jürgen Kurths
Journal:  Proc Math Phys Eng Sci       Date:  2020-05-06       Impact factor: 2.704

4.  Deciphering polymorphism in 61,157 Escherichia coli genomes via epistatic sequence landscapes.

Authors:  Lucile Vigué; Giancarlo Croce; Marie Petitjean; Etienne Ruppé; Olivier Tenaillon; Martin Weigt
Journal:  Nat Commun       Date:  2022-07-12       Impact factor: 17.694

5.  Singular value decomposition of protein sequences as a method to visualize sequence and residue space.

Authors:  Autum R Baxter-Koenigs; Gina El Nesr; Doug Barrick
Journal:  Protein Sci       Date:  2022-10       Impact factor: 6.993

Review 6.  Advances in Chromatin and Chromosome Research: Perspectives from Multiple Fields.

Authors:  Andrews Akwasi Agbleke; Assaf Amitai; Jason D Buenrostro; Aditi Chakrabarti; Lingluo Chu; Anders S Hansen; Kristen M Koenig; Ajay S Labade; Sirui Liu; Tadasu Nozaki; Sergey Ovchinnikov; Andrew Seeber; Haitham A Shaban; Jan-Hendrik Spille; Andrew D Stephens; Jun-Han Su; Dushan Wadduwage
Journal:  Mol Cell       Date:  2020-08-07       Impact factor: 17.970

7.  Inferring interaction partners from protein sequences using mutual information.

Authors:  Anne-Florence Bitbol
Journal:  PLoS Comput Biol       Date:  2018-11-13       Impact factor: 4.475

8.  Revealing evolutionary constraints on proteins through sequence analysis.

Authors:  Shou-Wen Wang; Anne-Florence Bitbol; Ned S Wingreen
Journal:  PLoS Comput Biol       Date:  2019-04-24       Impact factor: 4.475

9.  Statistical characteristics of amino acid covariance as possible descriptors of viral genomic complexity.

Authors:  C K Sruthi; Meher K Prakash
Journal:  Sci Rep       Date:  2019-12-05       Impact factor: 4.379

10.  Statistical investigations of protein residue direct couplings.

Authors:  Andrew F Neuwald; Stephen F Altschul
Journal:  PLoS Comput Biol       Date:  2018-12-31       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.