Literature DB >> 23161487

Simulating realistic genomic data with rare variants.

Yaji Xu1, Yinghua Wu, Chi Song, Heping Zhang.   

Abstract

Increasing evidence suggests that rare and generally deleterious genetic variants might have a strong impact on disease risks of not only Mendelian disease, but also many common diseases. However, identifying such rare variants remains challenging, and novel statistical methods and bioinformatic software must be developed. Hence, we have to extensively evaluate various methods under reasonable genetic models. Although there are abundant genomic data, they are not most helpful for the evaluation of the methods because the disease mechanism is unknown. Thus, it is imperative that we simulate genomic data that mimic the real data containing rare variants and that enable us to impose a known disease penetrance model. Although resampling simulation methods have shown their advantages in computational efficiency and in preserving important properties such as linkage disequilibrium (LD) and allele frequency, they still have limitations as we demonstrated. We propose an algorithm that combines a regression-based imputation with resampling to simulate genetic data with both rare and common variants. Logistic regression model was employed to fit the relationship between a rare variant and its nearby common variants in the 1000 Genomes Project data and then applied to the real data to fill in one rare variant at a time using the fitted logistic model based on common variants. Individuals then were simulated using the real data with imputed rare variants. We compared our method with existing simulators and demonstrated that our method performed well in retaining the real sample properties, such as LD and minor allele frequency, qualitatively.
© 2012 WILEY PERIODICALS, INC.

Entities:  

Mesh:

Year:  2012        PMID: 23161487      PMCID: PMC3543480          DOI: 10.1002/gepi.21696

Source DB:  PubMed          Journal:  Genet Epidemiol        ISSN: 0741-0395            Impact factor:   2.135


  24 in total

1.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data.

Authors:  Na Li; Matthew Stephens
Journal:  Genetics       Date:  2003-12       Impact factor: 4.562

2.  Pooled association tests for rare variants in exon-resequencing studies.

Authors:  Alkes L Price; Gregory V Kryukov; Paul I W de Bakker; Shaun M Purcell; Jeff Staples; Lee-Jen Wei; Shamil R Sunyaev
Journal:  Am J Hum Genet       Date:  2010-05-13       Impact factor: 11.025

3.  simuPOP: a forward-time population genetics simulation environment.

Authors:  Bo Peng; Marek Kimmel
Journal:  Bioinformatics       Date:  2005-07-14       Impact factor: 6.937

4.  A new multipoint method for genome-wide association studies by imputation of genotypes.

Authors:  Jonathan Marchini; Bryan Howie; Simon Myers; Gil McVean; Peter Donnelly
Journal:  Nat Genet       Date:  2007-06-17       Impact factor: 38.330

Review 5.  Common and rare variants in multifactorial susceptibility to common diseases.

Authors:  Walter Bodmer; Carolina Bonilla
Journal:  Nat Genet       Date:  2008-06       Impact factor: 38.330

6.  Novel tree-based method to generate markers from rare variant data.

Authors:  Yuan Jiang; Jennifer S Brennan; Rose Calixte; Yunxiao He; Epiphanie Nyirabahizi; Heping Zhang
Journal:  BMC Proc       Date:  2011-11-29

7.  A LASSO-based approach to analyzing rare variants in genetic association studies.

Authors:  Jennifer S Brennan; Yunxiao He; Rose Calixte; Epiphanie Nyirabahizi; Yuan Jiang; Heping Zhang
Journal:  BMC Proc       Date:  2011-11-29

8.  Forward-time simulations of human populations with complex diseases.

Authors:  Bo Peng; Christopher I Amos; Marek Kimmel
Journal:  PLoS Genet       Date:  2007-02-15       Impact factor: 5.917

9.  A groupwise association test for rare mutations using a weighted sum statistic.

Authors:  Bo Eskerod Madsen; Sharon R Browning
Journal:  PLoS Genet       Date:  2009-02-13       Impact factor: 5.917

10.  Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip.

Authors:  Chris C A Spencer; Zhan Su; Peter Donnelly; Jonathan Marchini
Journal:  PLoS Genet       Date:  2009-05-15       Impact factor: 5.917

View more
  4 in total

1.  TARV: tree-based analysis of rare variants identifying risk modifying variants in CTNNA2 and CNTNAP2 for alcohol addiction.

Authors:  Chi Song; Heping Zhang
Journal:  Genet Epidemiol       Date:  2014-07-15       Impact factor: 2.135

2.  Genetic simulation tools for post-genome wide association studies of complex diseases.

Authors:  Huann-Sheng Chen; Carolyn M Hutter; Leah E Mechanic; Elizabeth M Gillanders; Eric J Feuer; Christopher I Amos; Vineet Bafna; Elizabeth R Hauser; Ryan D Hernandez; Chun Li; David A Liberles; Kimberly McAllister; Jason H Moore; Dina N Paltoo; George J Papanicolaou; Bo Peng; Marylyn D Ritchie; Gabriel Rosenfeld; John S Witte
Journal:  Genet Epidemiol       Date:  2014-11-04       Impact factor: 2.135

3.  Genetic data simulators and their applications: an overview.

Authors:  Bo Peng; Huann-Sheng Chen; Leah E Mechanic; Ben Racine; John Clarke; Elizabeth Gillanders; Eric J Feuer
Journal:  Genet Epidemiol       Date:  2014-12-13       Impact factor: 2.135

4.  Second-generation PLINK: rising to the challenge of larger and richer datasets.

Authors:  Christopher C Chang; Carson C Chow; Laurent Cam Tellier; Shashaank Vattikuti; Shaun M Purcell; James J Lee
Journal:  Gigascience       Date:  2015-02-25       Impact factor: 6.524

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.