Jade Yu Cheng1,2,3, Thomas Mailund1, Rasmus Nielsen2,3. 1. Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark. 2. Departments of Integrative Biology and Statistics, University of California, Berkeley, Berkeley, CA, USA. 3. Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark.
Abstract
MOTIVATION: Structure methods are highly used population genetic methods for classifying individuals in a sample fractionally into discrete ancestry components. CONTRIBUTION: We introduce a new optimization algorithm for the classical STRUCTURE model in a maximum likelihood framework. Using analyses of real data we show that the new method finds solutions with higher likelihoods than the state-of-the-art method in the same computational time. The optimization algorithm is also applicable to models based on genotype likelihoods, that can account for the uncertainty in genotype-calling associated with Next Generation Sequencing (NGS) data. We also present a new method for estimating population trees from ancestry components using a Gaussian approximation. Using coalescence simulations of diverging populations, we explore the adequacy of the STRUCTURE-style models and the Gaussian assumption for identifying ancestry components correctly and for inferring the correct tree. In most cases, ancestry components are inferred correctly, although sample sizes and times since admixture can influence the results. We show that the popular Gaussian approximation tends to perform poorly under extreme divergence scenarios e.g. with very long branch lengths, but the topologies of the population trees are accurately inferred in all scenarios explored. The new methods are implemented together with appropriate visualization tools in the software package Ohana. AVAILABILITY AND IMPLEMENTATION: Ohana is publicly available at https://github.com/jade-cheng/ohana . In addition to source code and installation instructions, we also provide example work-flows in the project wiki site. CONTACT: jade.cheng@birc.au.dk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Structure methods are highly used population genetic methods for classifying individuals in a sample fractionally into discrete ancestry components. CONTRIBUTION: We introduce a new optimization algorithm for the classical STRUCTURE model in a maximum likelihood framework. Using analyses of real data we show that the new method finds solutions with higher likelihoods than the state-of-the-art method in the same computational time. The optimization algorithm is also applicable to models based on genotype likelihoods, that can account for the uncertainty in genotype-calling associated with Next Generation Sequencing (NGS) data. We also present a new method for estimating population trees from ancestry components using a Gaussian approximation. Using coalescence simulations of diverging populations, we explore the adequacy of the STRUCTURE-style models and the Gaussian assumption for identifying ancestry components correctly and for inferring the correct tree. In most cases, ancestry components are inferred correctly, although sample sizes and times since admixture can influence the results. We show that the popular Gaussian approximation tends to perform poorly under extreme divergence scenarios e.g. with very long branch lengths, but the topologies of the population trees are accurately inferred in all scenarios explored. The new methods are implemented together with appropriate visualization tools in the software package Ohana. AVAILABILITY AND IMPLEMENTATION: Ohana is publicly available at https://github.com/jade-cheng/ohana . In addition to source code and installation instructions, we also provide example work-flows in the project wiki site. CONTACT: jade.cheng@birc.au.dk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham Journal: Am J Hum Genet Date: 2007-07-25 Impact factor: 11.025
Authors: Jacob E Crawford; Ricardo Amaru; Jihyun Song; Colleen G Julian; Fernando Racimo; Jade Yu Cheng; Xiuqing Guo; Jie Yao; Bharath Ambale-Venkatesh; João A Lima; Jerome I Rotter; Josef Stehlik; Lorna G Moore; Josef T Prchal; Rasmus Nielsen Journal: Am J Hum Genet Date: 2017-11-02 Impact factor: 11.025
Authors: Jacklyn N Hellwege; Jacob M Keaton; Ayush Giri; Xiaoyi Gao; Digna R Velez Edwards; Todd L Edwards Journal: Curr Protoc Hum Genet Date: 2017-10-18
Authors: Joel M Alves; Miguel Carneiro; Jade Y Cheng; Ana Lemos de Matos; Masmudur M Rahman; Liisa Loog; Paula F Campos; Nathan Wales; Anders Eriksson; Andrea Manica; Tanja Strive; Stephen C Graham; Sandra Afonso; Diana J Bell; Laura Belmont; Jonathan P Day; Susan J Fuller; Stéphane Marchandeau; William J Palmer; Guillaume Queney; Alison K Surridge; Filipe G Vieira; Grant McFadden; Rasmus Nielsen; M Thomas P Gilbert; Pedro J Esteves; Nuno Ferrand; Francis M Jiggins Journal: Science Date: 2019-02-14 Impact factor: 47.728
Authors: Stephanie M Yan; Rachel M Sherman; Dylan J Taylor; Divya R Nair; Andrew N Bortvin; Michael C Schatz; Rajiv C McCoy Journal: Elife Date: 2021-09-16 Impact factor: 8.140
Authors: Lucas Vicuña; Olga Klimenkova; Tomás Norambuena; Felipe I Martinez; Mario I Fernandez; Vladimir Shchur; Susana Eyheramendy Journal: Genome Biol Evol Date: 2020-08-01 Impact factor: 3.416
Authors: Fernando Racimo; Jessie Woodbridge; Ralph M Fyfe; Martin Sikora; Karl-Göran Sjögren; Kristian Kristiansen; Marc Vander Linden Journal: Proc Natl Acad Sci U S A Date: 2020-04-01 Impact factor: 11.205