Grady Weyenberg1, Peter M Huggins1, Christopher L Schardl1, Daniel K Howe1, Ruriko Yoshida1. 1. Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA.
Abstract
MOTIVATION: Although the majority of gene histories found in a clade of organisms are expected to be generated by a common process (e.g. the coalescent process), it is well known that numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history distinct from those of the majority of genes. Such 'outlying' gene trees are considered to be biologically interesting, and identifying these genes has become an important problem in phylogenetics. RESULTS: We propose and implement kdetrees, a non-parametric method for estimating distributions of phylogenetic trees, with the goal of identifying trees that are significantly different from the rest of the trees in the sample. Our method compares favorably with a similar recently published method, featuring an improvement of one polynomial order of computational complexity (to quadratic in the number of trees analyzed), with simulation studies suggesting only a small penalty to classification accuracy. Application of kdetrees to a set of Apicomplexa genes identified several unreliable sequence alignments that had escaped previous detection, as well as a gene independently reported as a possible case of horizontal gene transfer. We also analyze a set of Epichloë genes, fungi symbiotic with grasses, successfully identifying a contrived instance of paralogy. AVAILABILITY AND IMPLEMENTATION: Our method for estimating tree distributions and identifying outlying trees is implemented as the R package kdetrees and is available for download from CRAN.
MOTIVATION: Although the majority of gene histories found in a clade of organisms are expected to be generated by a common process (e.g. the coalescent process), it is well known that numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history distinct from those of the majority of genes. Such 'outlying' gene trees are considered to be biologically interesting, and identifying these genes has become an important problem in phylogenetics. RESULTS: We propose and implement kdetrees, a non-parametric method for estimating distributions of phylogenetic trees, with the goal of identifying trees that are significantly different from the rest of the trees in the sample. Our method compares favorably with a similar recently published method, featuring an improvement of one polynomial order of computational complexity (to quadratic in the number of trees analyzed), with simulation studies suggesting only a small penalty to classification accuracy. Application of kdetrees to a set of Apicomplexa genes identified several unreliable sequence alignments that had escaped previous detection, as well as a gene independently reported as a possible case of horizontal gene transfer. We also analyze a set of Epichloë genes, fungi symbiotic with grasses, successfully identifying a contrived instance of paralogy. AVAILABILITY AND IMPLEMENTATION: Our method for estimating tree distributions and identifying outlying trees is implemented as the R package kdetrees and is available for download from CRAN.
Authors: Scott V Edwards; Sally Potter; C Jonathan Schmitt; Jason G Bragg; Craig Moritz Journal: Proc Natl Acad Sci U S A Date: 2016-07-19 Impact factor: 11.205
Authors: Slawomir Michniewski; Tamsin Redgwell; Aurelija Grigonyte; Branko Rihtman; Maria Aguilo-Ferretjans; Joseph Christie-Oleza; Eleanor Jameson; David J Scanlan; Andrew D Millard Journal: Environ Microbiol Date: 2019-04-04 Impact factor: 5.491
Authors: Sean D Schoville; Sabrina Simon; Ming Bai; Zachary Beethem; Roman Y Dudko; Monika J B Eberhard; Paul B Frandsen; Simon C Küpper; Ryuichiro Machida; Max Verheij; Peter C Willadsen; Xin Zhou; Benjamin Wipfler Journal: Evol Appl Date: 2020-09-11 Impact factor: 5.183