Biao Li1, Gao T Wang1, Suzanne M Leal1. 1. Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.
Abstract
MOTIVATION: There is great interest in analyzing next generation sequence data that has been generated for pedigrees. However, unlike for population-based data there are only a limited number of rare variant methods to analyze pedigree data. One limitation is the ability to evaluate type I and II errors for family-based methods, due to lack of software that can simulate realistic sequence data for pedigrees. SUMMARY: We developed RarePedSim (Rare-variant Pedigree-based Simulator), a program to simulate region/gene-level genotype and phenotype data for complex and Mendelian traits for any given pedigree structure. Using a genetic model, sequence variant data can be generated either conditionally or unconditionally on pedigree members' qualitative or quantitative phenotypes. Additionally, qualitative or quantitative traits can be generated conditional on variant data. Sequence data can either be simulated using realistic population demographic models or obtained from sequence-based studies. Variant sites can be annotated with positions, allele frequencies and functionality. For rare variants, RarePedSim is the only program that can efficiently generate both genotypes and phenotypes, regardless of pedigree structure. Data generated by RarePedSim are in standard Linkage file (.ped) and Variant Call (.vcf) formats, ready to be used for a variety of purposes, including evaluation of type I error and power, for association methods including mixed models and linkage analysis methods. AVAILABILITY AND IMPLEMENTATION: bioinformatics.org/simped/rare CONTACT: sleal@bcm.edu.
MOTIVATION: There is great interest in analyzing next generation sequence data that has been generated for pedigrees. However, unlike for population-based data there are only a limited number of rare variant methods to analyze pedigree data. One limitation is the ability to evaluate type I and II errors for family-based methods, due to lack of software that can simulate realistic sequence data for pedigrees. SUMMARY: We developed RarePedSim (Rare-variant Pedigree-based Simulator), a program to simulate region/gene-level genotype and phenotype data for complex and Mendelian traits for any given pedigree structure. Using a genetic model, sequence variant data can be generated either conditionally or unconditionally on pedigree members' qualitative or quantitative phenotypes. Additionally, qualitative or quantitative traits can be generated conditional on variant data. Sequence data can either be simulated using realistic population demographic models or obtained from sequence-based studies. Variant sites can be annotated with positions, allele frequencies and functionality. For rare variants, RarePedSim is the only program that can efficiently generate both genotypes and phenotypes, regardless of pedigree structure. Data generated by RarePedSim are in standard Linkage file (.ped) and Variant Call (.vcf) formats, ready to be used for a variety of purposes, including evaluation of type I error and power, for association methods including mixed models and linkage analysis methods. AVAILABILITY AND IMPLEMENTATION: bioinformatics.org/simped/rare CONTACT: sleal@bcm.edu.
Authors: Zongxiao He; Brian J O'Roak; Joshua D Smith; Gao Wang; Stanley Hooker; Regie Lyn P Santos-Cortez; Biao Li; Mengyuan Kan; Nik Krumm; Deborah A Nickerson; Jay Shendure; Evan E Eichler; Suzanne M Leal Journal: Am J Hum Genet Date: 2013-12-19 Impact factor: 11.025
Authors: Zongxiao He; Di Zhang; Alan E Renton; Biao Li; Linhai Zhao; Gao T Wang; Alison M Goate; Richard Mayeux; Suzanne M Leal Journal: Am J Hum Genet Date: 2017-01-05 Impact factor: 11.025
Authors: Linhai Zhao; Zongxiao He; Di Zhang; Gao T Wang; Alan E Renton; Badri N Vardarajan; Michael Nothnagel; Alison M Goate; Richard Mayeux; Suzanne M Leal Journal: Am J Hum Genet Date: 2019-10-03 Impact factor: 11.025
Authors: Christian X Weichenberger; Johannes Rainer; Cristian Pattaro; Peter P Pramstaller; Francisco S Domingues Journal: Bioinformatics Date: 2019-01-01 Impact factor: 6.937