Pier Francesco Palamara1. 1. Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
Abstract
MOTIVATION: Simulation under the coalescent model is ubiquitous in the analysis of genetic data. The rapid growth of real data sets from multiple human populations led to increasing interest in simulating very large sample sizes at whole-chromosome scales. When the sample size is large, the coalescent model becomes an increasingly inaccurate approximation of the discrete time Wright-Fisher model (DTWF). Analytical and computational treatment of the DTWF, however, is generally harder. RESULTS: We present a simulator (ARGON) for the DTWF process that scales up to hundreds of thousands of samples and whole-chromosome lengths, with a time/memory performance comparable or superior to currently available methods for coalescent simulation. The simulator supports arbitrary demographic history, migration, Newick tree output, variable mutation/recombination rates and gene conversion, and efficiently outputs pairwise identical-by-descent sharing data. AVAILABILITY: ARGON (version 0.1) is written in Java, open source, and freely available at https://github.com/pierpal/ARGON CONTACT: ppalama@hsph.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Simulation under the coalescent model is ubiquitous in the analysis of genetic data. The rapid growth of real data sets from multiple human populations led to increasing interest in simulating very large sample sizes at whole-chromosome scales. When the sample size is large, the coalescent model becomes an increasingly inaccurate approximation of the discrete time Wright-Fisher model (DTWF). Analytical and computational treatment of the DTWF, however, is generally harder. RESULTS: We present a simulator (ARGON) for the DTWF process that scales up to hundreds of thousands of samples and whole-chromosome lengths, with a time/memory performance comparable or superior to currently available methods for coalescent simulation. The simulator supports arbitrary demographic history, migration, Newick tree output, variable mutation/recombination rates and gene conversion, and efficiently outputs pairwise identical-by-descent sharing data. AVAILABILITY: ARGON (version 0.1) is written in Java, open source, and freely available at https://github.com/pierpal/ARGON CONTACT: ppalama@hsph.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Steven Gazal; Hilary K Finucane; Nicholas A Furlotte; Po-Ru Loh; Pier Francesco Palamara; Xuanyao Liu; Armin Schoech; Brendan Bulik-Sullivan; Benjamin M Neale; Alexander Gusev; Alkes L Price Journal: Nat Genet Date: 2017-09-11 Impact factor: 38.330
Authors: Franz Baumdicker; Gertjan Bisschop; Daniel Goldstein; Graham Gower; Aaron P Ragsdale; Georgia Tsambos; Sha Zhu; Bjarki Eldon; E Castedo Ellerman; Jared G Galloway; Ariella L Gladstein; Gregor Gorjanc; Bing Guo; Ben Jeffery; Warren W Kretzschumar; Konrad Lohse; Michael Matschiner; Dominic Nelson; Nathaniel S Pope; Consuelo D Quinto-Cortés; Murillo F Rodrigues; Kumar Saunack; Thibaut Sellinger; Kevin Thornton; Hugo van Kemenade; Anthony W Wohns; Yan Wong; Simon Gravel; Andrew D Kern; Jere Koskela; Peter L Ralph; Jerome Kelleher Journal: Genetics Date: 2022-03-03 Impact factor: 4.402
Authors: Nathan Nakatsuka; Priya Moorjani; Niraj Rai; Biswanath Sarkar; Arti Tandon; Nick Patterson; Gandham SriLakshmi Bhavani; Katta Mohan Girisha; Mohammed S Mustak; Sudha Srinivasan; Amit Kaushik; Saadi Abdul Vahab; Sujatha M Jagadeesh; Kapaettu Satyamoorthy; Lalji Singh; David Reich; Kumarasamy Thangaraj Journal: Nat Genet Date: 2017-07-17 Impact factor: 38.330