Literature DB >> 21479081

USING LINEAR PREDICTORS TO IMPUTE ALLELE FREQUENCIES FROM SUMMARY OR POOLED GENOTYPE DATA.

Xiaoquan Wen1, Matthew Stephens.   

Abstract

Recently-developed genotype imputation methods are a powerful tool for detecting untyped genetic variants that affect disease susceptibility in genetic association studies. However, existing imputation methods require individual-level genotype data, whereas in practice it is often the case that only summary data are available. For example this may occur because, for reasons of privacy or politics, only summary data are made available to the research community at large; or because only summary data are collected, as in DNA pooling experiments. In this article, we introduce a new statistical method that can accurately infer the frequencies of untyped genetic variants in these settings, and indeed substantially improve frequency estimates at typed variants in pooling experiments where observations are noisy. Our approach, which predicts each allele frequency using a linear combination of observed frequencies, is statistically straight-forward, and related to a long history of the use of linear methods for estimating missing values (e.g. Kriging). The main statistical novelty is our approach to regularizing the covariance matrix estimates, and the resulting linear predictors, which is based on methods from population genetics. We find that, besides being both fast and flexible - allowing new problems to be tackled that cannot be handled by existing imputation approaches purpose-built for the genetic context - these linear methods are also very accurate. Indeed, imputation accuracy using this approach is similar to that obtained by state-of-the art imputation methods that use individual-level data, but at a fraction of the computational cost.

Entities:  

Year:  2010        PMID: 21479081      PMCID: PMC3072818          DOI: 10.1214/10-aoas338

Source DB:  PubMed          Journal:  Ann Appl Stat        ISSN: 1932-6157            Impact factor:   2.083


  26 in total

1.  Two-locus sampling distributions and their application.

Authors:  R R Hudson
Journal:  Genetics       Date:  2001-12       Impact factor: 4.562

2.  A coalescent-based method for detecting and estimating recombination from gene sequences.

Authors:  Gil McVean; Philip Awadalla; Paul Fearnhead
Journal:  Genetics       Date:  2002-03       Impact factor: 4.562

3.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data.

Authors:  Na Li; Matthew Stephens
Journal:  Genetics       Date:  2003-12       Impact factor: 4.562

4.  Efficiency and power in genetic association studies.

Authors:  Paul I W de Bakker; Roman Yelensky; Itsik Pe'er; Stacey B Gabriel; Mark J Daly; David Altshuler
Journal:  Nat Genet       Date:  2005-10-23       Impact factor: 38.330

5.  A haplotype map of the human genome.

Authors: 
Journal:  Nature       Date:  2005-10-27       Impact factor: 49.962

6.  A new multipoint method for genome-wide association studies by imputation of genotypes.

Authors:  Jonathan Marchini; Bryan Howie; Simon Myers; Gil McVean; Peter Donnelly
Journal:  Nat Genet       Date:  2007-06-17       Impact factor: 38.330

7.  Methods to impute missing genotypes for population data.

Authors:  Zhaoxia Yu; Daniel J Schaid
Journal:  Hum Genet       Date:  2007-09-13       Impact factor: 4.132

8.  Multimarker analysis and imputation of multiple platform pooling-based genome-wide association studies.

Authors:  Nils Homer; Waibhav D Tembe; Szabolcs Szelinger; Margot Redman; Dietrich A Stephan; John V Pearson; Stanley F Nelson; David Craig
Journal:  Bioinformatics       Date:  2008-07-10       Impact factor: 6.937

9.  Inferences about linkage disequilibrium.

Authors:  B S Weir
Journal:  Biometrics       Date:  1979-03       Impact factor: 2.571

10.  Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays.

Authors:  Nils Homer; Szabolcs Szelinger; Margot Redman; David Duggan; Waibhav Tembe; Jill Muehling; John V Pearson; Dietrich A Stephan; Stanley F Nelson; David W Craig
Journal:  PLoS Genet       Date:  2008-08-29       Impact factor: 5.917

View more
  31 in total

1.  Approximately independent linkage disequilibrium blocks in human populations.

Authors:  Tomaz Berisa; Joseph K Pickrell
Journal:  Bioinformatics       Date:  2015-09-22       Impact factor: 6.937

2.  Enhanced localization of genetic samples through linkage-disequilibrium correction.

Authors:  Yael Baran; Inés Quintela; Angel Carracedo; Bogdan Pasaniuc; Eran Halperin
Journal:  Am J Hum Genet       Date:  2013-05-30       Impact factor: 11.025

3.  Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data.

Authors:  Daniel J Schaid; Shannon K McDonnell; Jason P Sinnwell; Stephen N Thibodeau
Journal:  Genet Epidemiol       Date:  2013-05-05       Impact factor: 2.135

4.  DISSCO: direct imputation of summary statistics allowing covariates.

Authors:  Zheng Xu; Qing Duan; Song Yan; Wei Chen; Mingyao Li; Ethan Lange; Yun Li
Journal:  Bioinformatics       Date:  2015-03-24       Impact factor: 6.937

5.  Scalable privacy-preserving data sharing methodology for genome-wide association studies.

Authors:  Fei Yu; Stephen E Fienberg; Aleksandra B Slavković; Caroline Uhler
Journal:  J Biomed Inform       Date:  2014-02-06       Impact factor: 6.317

6.  Efficient Integrative Multi-SNP Association Analysis via Deterministic Approximation of Posteriors.

Authors:  Xiaoquan Wen; Yeji Lee; Francesca Luca; Roger Pique-Regi
Journal:  Am J Hum Genet       Date:  2016-05-26       Impact factor: 11.025

7.  Incorporating Functional Annotations for Fine-Mapping Causal Variants in a Bayesian Framework Using Summary Statistics.

Authors:  Wenan Chen; Shannon K McDonnell; Stephen N Thibodeau; Lori S Tillmans; Daniel J Schaid
Journal:  Genetics       Date:  2016-09-21       Impact factor: 4.562

8.  A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES.

Authors:  Xiang Zhou
Journal:  Ann Appl Stat       Date:  2017-12-28       Impact factor: 2.083

9.  A Unifying Framework for Imputing Summary Statistics in Genome-Wide Association Studies.

Authors:  Yue Wu; Eleazar Eskin; Sriram Sankararaman
Journal:  J Comput Biol       Date:  2020-02-13       Impact factor: 1.479

10.  BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES.

Authors:  Xiang Zhu; Matthew Stephens
Journal:  Ann Appl Stat       Date:  2017-10-05       Impact factor: 2.083

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.