Literature DB >> 26307072

Choosing Subsamples for Sequencing Studies by Minimizing the Average Distance to the Closest Leaf.

Jonathan T L Kang1, Peng Zhang2, Sebastian Zöllner3, Noah A Rosenberg4.   

Abstract

Imputation of genotypes in a study sample can make use of sequenced or densely genotyped external reference panels consisting of individuals that are not from the study sample. It also can employ internal reference panels, incorporating a subset of individuals from the study sample itself. Internal panels offer an advantage over external panels because they can reduce imputation errors arising from genetic dissimilarity between a population of interest and a second, distinct population from which the external reference panel has been constructed. As the cost of next-generation sequencing decreases, internal reference panel selection is becoming increasingly feasible. However, it is not clear how best to select individuals to include in such panels. We introduce a new method for selecting an internal reference panel--minimizing the average distance to the closest leaf (ADCL)--and compare its performance relative to an earlier algorithm: maximizing phylogenetic diversity (PD). Employing both simulated data and sequences from the 1000 Genomes Project, we show that ADCL provides a significant improvement in imputation accuracy, especially for imputation of sites with low-frequency alleles. This improvement in imputation accuracy is robust to changes in reference panel size, marker density, and length of the imputation target region.
Copyright © 2015 by the Genetics Society of America.

Entities:  

Keywords:  algorithms; imputation; polymorphic sites; sequencing; study design

Mesh:

Year:  2015        PMID: 26307072      PMCID: PMC4596665          DOI: 10.1534/genetics.115.176909

Source DB:  PubMed          Journal:  Genetics        ISSN: 0016-6731            Impact factor:   4.562


  32 in total

1.  Generating samples under a Wright-Fisher neutral model of genetic variation.

Authors:  Richard R Hudson
Journal:  Bioinformatics       Date:  2002-02       Impact factor: 6.937

Review 2.  Uncovering the roles of rare variants in common disease through whole-genome sequencing.

Authors:  Elizabeth T Cirulli; David B Goldstein
Journal:  Nat Rev Genet       Date:  2010-06       Impact factor: 53.242

3.  A haplotype map of the human genome.

Authors: 
Journal:  Nature       Date:  2005-10-27       Impact factor: 49.962

4.  Phylogenetic diversity and the greedy algorithm.

Authors:  Mike Steel
Journal:  Syst Biol       Date:  2005-08       Impact factor: 15.683

5.  Selecting taxa to save or sequence: desirable criteria and a greedy solution.

Authors:  Magnus Bordewich; Allen G Rodrigo; Charles Semple
Journal:  Syst Biol       Date:  2008-12       Impact factor: 15.683

6.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

Review 7.  Genotype imputation.

Authors:  Yun Li; Cristen Willer; Serena Sanna; Gonçalo Abecasis
Journal:  Annu Rev Genomics Hum Genet       Date:  2009       Impact factor: 8.929

8.  Practical considerations for imputation of untyped markers in admixed populations.

Authors:  Daniel Shriner; Adebowale Adeyemo; Guanjie Chen; Charles N Rotimi
Journal:  Genet Epidemiol       Date:  2010-04       Impact factor: 2.135

Review 9.  Missing heritability and strategies for finding the underlying causes of complex disease.

Authors:  Evan E Eichler; Jonathan Flint; Greg Gibson; Augustine Kong; Suzanne M Leal; Jason H Moore; Joseph H Nadeau
Journal:  Nat Rev Genet       Date:  2010-06       Impact factor: 53.242

10.  Genotype-imputation accuracy across worldwide human populations.

Authors:  Lucy Huang; Yun Li; Andrew B Singleton; John A Hardy; Gonçalo Abecasis; Noah A Rosenberg; Paul Scheet
Journal:  Am J Hum Genet       Date:  2009-02       Impact factor: 11.025

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.