Literature DB >> 29422694

Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees.

Tom M W Nye1, Xiaoxian Tang2, Grady Weyenberg3, Ruriko Yoshida4.   

Abstract

Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample's structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the [Formula: see text]th principal component in Euclidean space: the locus of the weighted Fréchet mean of [Formula: see text] vertex trees when the weights vary over the [Formula: see text]-simplex. We establish some basic properties of these objects, in particular showing that they have dimension [Formula: see text], and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.

Entities:  

Keywords:  Fréchet mean; Phylogenetic tree; Principal component analysis; Tree space

Year:  2017        PMID: 29422694      PMCID: PMC5793493          DOI: 10.1093/biomet/asx047

Source DB:  PubMed          Journal:  Biometrika        ISSN: 0006-3444            Impact factor:   2.445


  13 in total

1.  DendroPy: a Python library for phylogenetic computing.

Authors:  Jeet Sukumaran; Mark T Holder
Journal:  Bioinformatics       Date:  2010-04-25       Impact factor: 6.937

2.  A fast algorithm for computing geodesic distances in tree space.

Authors:  Megan Owen; J Scott Provan
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2011 Jan-Mar       Impact factor: 3.710

3.  Analysis and visualization of tree space.

Authors:  David M Hillis; Tracy A Heath; Katherine St John
Journal:  Syst Biol       Date:  2005-06       Impact factor: 15.683

4.  Normalizing Kernels in the Billera-Holmes-Vogtmann Treespace.

Authors:  Grady Weyenberg; Ruriko Yoshida; Daniel Howe
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2016-05-10       Impact factor: 3.710

5.  One thousand two hundred ninety nuclear genes from a genome-wide survey support lungfishes as the sister group of tetrapods.

Authors:  Dan Liang; Xing Xing Shen; Peng Zhang
Journal:  Mol Biol Evol       Date:  2013-04-14       Impact factor: 16.240

6.  Tree-space statistics and approximations for large-scale analysis of anatomical trees.

Authors:  Aasa Feragen; Megan Owen; Jens Petersen; Mathilde M W Wille; Laura H Thomsen; Asger Dirksen; Marleen de Bruijne
Journal:  Inf Process Med Imaging       Date:  2013

7.  kdetrees: Non-parametric estimation of phylogenetic tree distributions.

Authors:  Grady Weyenberg; Peter M Huggins; Christopher L Schardl; Daniel K Howe; Ruriko Yoshida
Journal:  Bioinformatics       Date:  2014-04-24       Impact factor: 6.937

Review 8.  Progress in taxonomy of the Apicomplexan protozoa.

Authors:  N D Levine
Journal:  J Protozool       Date:  1988-11

9.  Clustering Genes of Common Evolutionary History.

Authors:  Kevin Gori; Tomasz Suchan; Nadir Alvarez; Nick Goldman; Christophe Dessimoz
Journal:  Mol Biol Evol       Date:  2016-02-17       Impact factor: 16.240

10.  The Apicomplexan whole-genome phylogeny: an analysis of incongruence among gene trees.

Authors:  Chih-Horng Kuo; John P Wares; Jessica C Kissinger
Journal:  Mol Biol Evol       Date:  2008-09-26       Impact factor: 16.240

View more
  4 in total

1.  Information geometry for phylogenetic trees.

Authors:  M K Garba; T M W Nye; J Lueg; S F Huckemann
Journal:  J Math Biol       Date:  2021-02-15       Impact factor: 2.259

2.  Feature selection for kernel methods in systems biology.

Authors:  Céline Brouard; Jérôme Mariette; Rémi Flamary; Nathalie Vialaneix
Journal:  NAR Genom Bioinform       Date:  2022-03-07

3.  CLARITY: comparing heterogeneous data using dissimilarity.

Authors:  Daniel J Lawson; Vinesh Solanki; Igor Yanovich; Johannes Dellert; Damian Ruck; Phillip Endicott
Journal:  R Soc Open Sci       Date:  2021-12-08       Impact factor: 2.963

4.  Association testing for binary trees-A Markov branching process approach.

Authors:  Xiaowei Wu; Hongxiao Zhu
Journal:  Stat Med       Date:  2022-03-09       Impact factor: 2.497

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.