Bo Peng1, Xiaoming Liu. 1. Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA. bpeng@mdanderson.org
Abstract
OBJECTIVE: Simulated samples have been widely used in the development of efficient statistical methods identifying genetic variants that predispose to human genetic diseases. Although it is well known that natural selection has a strong influence on the number and diversity of rare genetic variations in human populations, existing simulation methods are limited in their ability to simulate multi-locus selection models with realistic distributions of the random fitness effects of newly arising mutants. METHODS: We developed a computer program to simulate large populations of gene sequences using a forward-time simulation approach. This program is capable of simulating several multi-locus fitness schemes with arbitrary diploid single-locus selection models with random or locus-specific fitness effects. Arbitrary quantitative trait or disease models can be applied to the simulated populations from which individual- or family-based samples can be drawn and analyzed. RESULTS: Using realistic demographic and natural selection models estimated from empirical sequence data, datasets simulated using our method differ significantly in the number and diversity of rare variants from datasets simulated using existing methods that ignore natural selection. Our program thus provides a useful tool to simulate datasets with realistic distributions of rare genetic variants for the study of genetic diseases caused by such variants.
OBJECTIVE: Simulated samples have been widely used in the development of efficient statistical methods identifying genetic variants that predispose to humangenetic diseases. Although it is well known that natural selection has a strong influence on the number and diversity of rare genetic variations in human populations, existing simulation methods are limited in their ability to simulate multi-locus selection models with realistic distributions of the random fitness effects of newly arising mutants. METHODS: We developed a computer program to simulate large populations of gene sequences using a forward-time simulation approach. This program is capable of simulating several multi-locus fitness schemes with arbitrary diploid single-locus selection models with random or locus-specific fitness effects. Arbitrary quantitative trait or disease models can be applied to the simulated populations from which individual- or family-based samples can be drawn and analyzed. RESULTS: Using realistic demographic and natural selection models estimated from empirical sequence data, datasets simulated using our method differ significantly in the number and diversity of rare variants from datasets simulated using existing methods that ignore natural selection. Our program thus provides a useful tool to simulate datasets with realistic distributions of rare genetic variants for the study of genetic diseases caused by such variants.
Authors: Alkes L Price; Gregory V Kryukov; Paul I W de Bakker; Shaun M Purcell; Jeff Staples; Lee-Jen Wei; Shamil R Sunyaev Journal: Am J Hum Genet Date: 2010-05-13 Impact factor: 11.025
Authors: Clive J Hoggart; Marc Chadeau-Hyam; Taane G Clark; Riccardo Lampariello; John C Whittaker; Maria De Iorio; David J Balding Journal: Genetics Date: 2007-10-18 Impact factor: 4.562
Authors: Gregory V Kryukov; Alexander Shpunt; John A Stamatoyannopoulos; Shamil R Sunyaev Journal: Proc Natl Acad Sci U S A Date: 2009-02-06 Impact factor: 11.205
Authors: Marc Chadeau-Hyam; Clive J Hoggart; Paul F O'Reilly; John C Whittaker; Maria De Iorio; David J Balding Journal: BMC Bioinformatics Date: 2008-09-08 Impact factor: 3.169
Authors: Adam R Boyko; Scott H Williamson; Amit R Indap; Jeremiah D Degenhardt; Ryan D Hernandez; Kirk E Lohmueller; Mark D Adams; Steffen Schmidt; John J Sninsky; Shamil R Sunyaev; Thomas J White; Rasmus Nielsen; Andrew G Clark; Carlos D Bustamante Journal: PLoS Genet Date: 2008-05-30 Impact factor: 5.917
Authors: Bo Peng; Huann-Sheng Chen; Leah E Mechanic; Ben Racine; John Clarke; Elizabeth Gillanders; Eric J Feuer Journal: Genet Epidemiol Date: 2014-12-13 Impact factor: 2.135