Lucas Czech1, Alexandros Stamatakis1,2. 1. Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany. 2. Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany.
Abstract
BACKGROUND: The exponential decrease in molecular sequencing cost generates unprecedented amounts of data. Hence, scalable methods to analyze these data are required. Phylogenetic (or Evolutionary) Placement methods identify the evolutionary provenance of anonymous sequences with respect to a given reference phylogeny. This increasingly popular method is deployed for scrutinizing metagenomic samples from environments such as water, soil, or the human gut. NOVEL METHODS: Here, we present novel and, more importantly, highly scalable methods for analyzing phylogenetic placements of metagenomic samples. More specifically, we introduce methods for (a) visualizing differences between samples and their correlation with associated meta-data on the reference phylogeny, (b) clustering similar samples using a variant of the k-means method, and (c) finding phylogenetic factors using an adaptation of the Phylofactorization method. These methods enable to interpret metagenomic data in a phylogenetic context, to find patterns in the data, and to identify branches of the phylogeny that are driving these patterns. RESULTS: To demonstrate the scalability and utility of our methods, as well as to provide exemplary interpretations of our methods, we applied them to 3 publicly available datasets comprising 9782 samples with a total of approximately 168 million sequences. The results indicate that new biological insights can be attained via our methods.
BACKGROUND: The exponential decrease in molecular sequencing cost generates unprecedented amounts of data. Hence, scalable methods to analyze these data are required. Phylogenetic (or Evolutionary) Placement methods identify the evolutionary provenance of anonymous sequences with respect to a given reference phylogeny. This increasingly popular method is deployed for scrutinizing metagenomic samples from environments such as water, soil, or the human gut. NOVEL METHODS: Here, we present novel and, more importantly, highly scalable methods for analyzing phylogenetic placements of metagenomic samples. More specifically, we introduce methods for (a) visualizing differences between samples and their correlation with associated meta-data on the reference phylogeny, (b) clustering similar samples using a variant of the k-means method, and (c) finding phylogenetic factors using an adaptation of the Phylofactorization method. These methods enable to interpret metagenomic data in a phylogenetic context, to find patterns in the data, and to identify branches of the phylogeny that are driving these patterns. RESULTS: To demonstrate the scalability and utility of our methods, as well as to provide exemplary interpretations of our methods, we applied them to 3 publicly available datasets comprising 9782 samples with a total of approximately 168 million sequences. The results indicate that new biological insights can be attained via our methods.
Authors: Sujatha Srinivasan; Noah G Hoffman; Martin T Morgan; Frederick A Matsen; Tina L Fiedler; Robert W Hall; Frederick J Ross; Connor O McCoy; Roger Bumgarner; Jeanne M Marrazzo; David N Fredricks Journal: PLoS One Date: 2012-06-18 Impact factor: 3.240
Authors: Christian Quast; Elmar Pruesse; Pelin Yilmaz; Jan Gerken; Timmy Schweer; Pablo Yarza; Jörg Peplies; Frank Oliver Glöckner Journal: Nucleic Acids Res Date: 2012-11-28 Impact factor: 16.971
Authors: Pelin Yilmaz; Laura Wegener Parfrey; Pablo Yarza; Jan Gerken; Elmar Pruesse; Christian Quast; Timmy Schweer; Jörg Peplies; Wolfgang Ludwig; Frank Oliver Glöckner Journal: Nucleic Acids Res Date: 2013-11-28 Impact factor: 16.971
Authors: Jay Vornhagen; Christine M Bassis; Srividya Ramakrishnan; Robert Hein; Sophia Mason; Yehudit Bergman; Nicole Sunshine; Yunfan Fan; Caitlyn L Holmes; Winston Timp; Michael C Schatz; Vincent B Young; Patricia J Simner; Michael A Bachman Journal: PLoS Pathog Date: 2021-04-30 Impact factor: 6.823
Authors: Alison C Bartenslager; Nirosh D Althuge; John Dustin Loy; Matthew M Hille; Matthew L Spangler; Samodha C Fernando Journal: Anim Microbiome Date: 2021-01-30
Authors: Robyn J Wright; Rafael Bosch; Morgan G I Langille; Matthew I Gibson; Joseph A Christie-Oleza Journal: Microbiome Date: 2021-06-21 Impact factor: 14.650