Literature DB >> 30787042

Fast Estimation of Recombination Rates Using Topological Data Analysis.

Devon P Humphreys1, Melissa R McGuirl2, Michael Miyagi3, Andrew J Blumberg4.   

Abstract

Accurate estimation of recombination rates is critical for studying the origins and maintenance of genetic diversity. Because the inference of recombination rates under a full evolutionary model is computationally expensive, we developed an alternative approach using topological data analysis (TDA) on genome sequences. We find that this method can analyze datasets larger than what can be handled by any existing recombination inference software, and has accuracy comparable to commonly used model-based methods with significantly less processing time. Previous TDA methods used information contained solely in the first Betti number ([Formula: see text]) of a set of genomes, which aims to capture the number of loops that can be detected within a genealogy. These explorations have proven difficult to connect to the theory of the underlying biological process of recombination, and, consequently, have unpredictable behavior under perturbations of the data. We introduce a new topological feature, which we call ψ, with a natural connection to coalescent models, and present novel arguments relating [Formula: see text] to population genetic models. Using simulations, we show that ψ and [Formula: see text] are differentially affected by missing data, and package our approach as TREE (Topological Recombination Estimator). TREE's efficiency and accuracy make it well suited as a first-pass estimator of recombination rate heterogeneity or hotspots throughout the genome. Our work empirically and theoretically justifies the use of topological statistics as summaries of genome sequences and describes a new, unintuitive relationship between topological features of the distribution of sequence data and the footprint of recombination on genomes.
Copyright © 2019 by the Genetics Society of America.

Entities:  

Keywords:  coalescent theory; population genetics; recombination; topological data analysis

Mesh:

Year:  2019        PMID: 30787042      PMCID: PMC6456321          DOI: 10.1534/genetics.118.301565

Source DB:  PubMed          Journal:  Genetics        ISSN: 0016-6731            Impact factor:   4.562


  26 in total

1.  Estimating recombination rates from population genetic data.

Authors:  P Fearnhead; P Donnelly
Journal:  Genetics       Date:  2001-11       Impact factor: 4.562

Review 2.  Estimating recombination rates from population-genetic data.

Authors:  Michael P H Stumpf; Gilean A T McVean
Journal:  Nat Rev Genet       Date:  2003-12       Impact factor: 53.242

3.  Approximating the coalescent with recombination.

Authors:  Gilean A T McVean; Niall J Cardin
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2005-07-29       Impact factor: 6.237

4.  Recombination rate estimation in the presence of hotspots.

Authors:  Adam Auton; Gil McVean
Journal:  Genome Res       Date:  2007-07-10       Impact factor: 9.043

5.  Topology of viral evolution.

Authors:  Joseph Minhow Chan; Gunnar Carlsson; Raul Rabadan
Journal:  Proc Natl Acad Sci U S A       Date:  2013-10-29       Impact factor: 11.205

6.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees.

Authors:  A Rambaut; N C Grassly
Journal:  Comput Appl Biosci       Date:  1997-06

7.  Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival.

Authors:  Monica Nicolau; Arnold J Levine; Gunnar Carlsson
Journal:  Proc Natl Acad Sci U S A       Date:  2011-04-11       Impact factor: 11.205

8.  Statistical properties of the number of recombination events in the history of a sample of DNA sequences.

Authors:  R R Hudson; N L Kaplan
Journal:  Genetics       Date:  1985-09       Impact factor: 4.562

9.  High resolution localization of recombination hot spots using sperm typing.

Authors:  R Hubert; M MacDonald; J Gusella; N Arnheim
Journal:  Nat Genet       Date:  1994-07       Impact factor: 38.330

10.  PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice.

Authors:  F Baudat; J Buard; C Grey; A Fledel-Alon; C Ober; M Przeworski; G Coop; B de Massy
Journal:  Science       Date:  2009-12-31       Impact factor: 47.728

View more
  1 in total

1.  Multiparameter persistent homology landscapes identify immune cell spatial patterns in tumors.

Authors:  Oliver Vipond; Joshua A Bull; Philip S Macklin; Ulrike Tillmann; Christopher W Pugh; Helen M Byrne; Heather A Harrington
Journal:  Proc Natl Acad Sci U S A       Date:  2021-10-12       Impact factor: 11.205

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.