Literature DB >> 19276111

Estimating the number of unseen variants in the human genome.

Iuliana Ionita-Laza1, Christoph Lange, Nan M Laird.   

Abstract

The different genetic variation discovery projects (The SNP Consortium, the International HapMap Project, the 1000 Genomes Project, etc.) aim to identify as much as possible of the underlying genetic variation in various human populations. The question we address in this article is how many new variants are yet to be found. This is an instance of the species problem in ecology, where the goal is to estimate the number of species in a closed population. We use a parametric beta-binomial model that allows us to calculate the expected number of new variants with a desired minimum frequency to be discovered in a new dataset of individuals of a specified size. The method can also be used to predict the number of individuals necessary to sequence in order to capture all (or a fraction of) the variation with a specified minimum frequency. We apply the method to three datasets: the ENCODE dataset, the SeattleSNPs dataset, and the National Institute of Environmental Health Sciences SNPs dataset. Consistent with previous descriptions, our results show that the African population is the most diverse in terms of the number of variants expected to exist, the Asian populations the least diverse, with the European population in-between. In addition, our results show a clear distinction between the Chinese and the Japanese populations, with the Japanese population being the less diverse. To find all common variants (frequency at least 1%) the number of individuals that need to be sequenced is small ( approximately 350) and does not differ much among the different populations; our data show that, subject to sequence accuracy, the 1000 Genomes Project is likely to find most of these common variants and a high proportion of the rarer ones (frequency between 0.1 and 1%). The data reveal a rule of diminishing returns: a small number of individuals ( approximately 150) is sufficient to identify 80% of variants with a frequency of at least 0.1%, while a much larger number (> 3,000 individuals) is necessary to find all of those variants. Finally, our results also show a much higher diversity in environmental response genes compared with the average genome, especially in African populations.

Entities:  

Mesh:

Year:  2009        PMID: 19276111      PMCID: PMC2664058          DOI: 10.1073/pnas.0807815106

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  14 in total

1.  The genetical structure of populations.

Authors:  S WRIGHT
Journal:  Ann Eugen       Date:  1951-03

2.  Complement factor H polymorphism in age-related macular degeneration.

Authors:  Robert J Klein; Caroline Zeiss; Emily Y Chew; Jen-Yue Tsai; Richard S Sackler; Chad Haynes; Alice K Henning; John Paul SanGiovanni; Shrikant M Mane; Susan T Mayne; Michael B Bracken; Frederick L Ferris; Jurg Ott; Colin Barnstable; Josephine Hoh
Journal:  Science       Date:  2005-03-10       Impact factor: 47.728

3.  A haplotype map of the human genome.

Authors: 
Journal:  Nature       Date:  2005-10-27       Impact factor: 49.962

4.  Sequence features in regions of weak and strong linkage disequilibrium.

Authors:  Albert V Smith; Daryl J Thomas; Heather M Munro; Gonçalo R Abecasis
Journal:  Genome Res       Date:  2005-11       Impact factor: 9.043

5.  A genome-wide association study identifies IL23R as an inflammatory bowel disease gene.

Authors:  Richard H Duerr; Kent D Taylor; Steven R Brant; John D Rioux; Mark S Silverberg; Mark J Daly; A Hillary Steinhart; Clara Abraham; Miguel Regueiro; Anne Griffiths; Themistocles Dassopoulos; Alain Bitton; Huiying Yang; Stephan Targan; Lisa Wu Datta; Emily O Kistner; L Philip Schumm; Annette T Lee; Peter K Gregersen; M Michael Barmada; Jerome I Rotter; Dan L Nicolae; Judy H Cho
Journal:  Science       Date:  2006-10-26       Impact factor: 47.728

6.  A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer.

Authors:  David J Hunter; Peter Kraft; Kevin B Jacobs; David G Cox; Meredith Yeager; Susan E Hankinson; Sholom Wacholder; Zhaoming Wang; Robert Welch; Amy Hutchinson; Junwen Wang; Kai Yu; Nilanjan Chatterjee; Nick Orr; Walter C Willett; Graham A Colditz; Regina G Ziegler; Christine D Berg; Saundra S Buys; Catherine A McCarty; Heather Spencer Feigelson; Eugenia E Calle; Michael J Thun; Richard B Hayes; Margaret Tucker; Daniela S Gerhard; Joseph F Fraumeni; Robert N Hoover; Gilles Thomas; Stephen J Chanock
Journal:  Nat Genet       Date:  2007-05-27       Impact factor: 38.330

7.  Information capture using SNPs from HapMap and whole-genome chips differs in a sample of inflammatory and cardiovascular gene-centric regions from genome-wide estimates.

Authors:  Chris Wallace; Richard J Dobson; Patricia B Munroe; Mark J Caulfield
Journal:  Genome Res       Date:  2007-09-25       Impact factor: 9.043

8.  Estimating coverage and power for genetic association studies using near-complete variation data.

Authors:  Tushar R Bhangale; Mark J Rieder; Deborah A Nickerson
Journal:  Nat Genet       Date:  2008-06-22       Impact factor: 38.330

9.  Determination of ancestral alleles for human single-nucleotide polymorphisms using high-density oligonucleotide arrays.

Authors:  J G Hacia; J B Fan; O Ryder; L Jin; K Edgemon; G Ghandour; R A Mayer; B Sun; L Hsie; C M Robbins; L C Brody; D Wang; E S Lander; R Lipshutz; S P Fodor; F S Collins
Journal:  Nat Genet       Date:  1999-06       Impact factor: 38.330

10.  Large-scale copy number polymorphism in the human genome.

Authors:  Jonathan Sebat; B Lakshmi; Jennifer Troge; Joan Alexander; Janet Young; Pär Lundin; Susanne Månér; Hillary Massa; Megan Walker; Maoyen Chi; Nicholas Navin; Robert Lucito; John Healy; James Hicks; Kenny Ye; Andrew Reiner; T Conrad Gilliam; Barbara Trask; Nick Patterson; Anders Zetterberg; Michael Wigler
Journal:  Science       Date:  2004-07-23       Impact factor: 47.728

View more
  33 in total

1.  Highly diverse TCRα chain repertoire of pre-immune CD8⁺ T cells reveals new insights in gene recombination.

Authors:  Raphael Genolet; Brian J Stevenson; Laurent Farinelli; Magne Osterås; Immanuel F Luescher
Journal:  EMBO J       Date:  2012-02-28       Impact factor: 11.598

2.  Two-stage extreme phenotype sequencing design for discovering and testing common and rare genetic variants: efficiency and power.

Authors:  Guolian Kang; Dongyu Lin; Hakon Hakonarson; Jinbo Chen
Journal:  Hum Hered       Date:  2012-06-07       Impact factor: 0.444

3.  On the optimal design of genetic variant discovery studies.

Authors:  Iuliana Ionita-Laza; Nan M Laird
Journal:  Stat Appl Genet Mol Biol       Date:  2010-08-27

Review 4.  Evolving molecular diagnostics for familial cardiomyopathies: at the heart of it all.

Authors:  Thomas E Callis; Brian C Jensen; Karen E Weck; Monte S Willis
Journal:  Expert Rev Mol Diagn       Date:  2010-04       Impact factor: 5.225

5.  Replication strategies for rare variant complex trait association studies via next-generation sequencing.

Authors:  Dajiang J Liu; Suzanne M Leal
Journal:  Am J Hum Genet       Date:  2010-12-10       Impact factor: 11.025

6.  The future of the human SNP identification: which individuals to sequence?

Authors:  Juergen K V Reichardt; Ruty Mehrian-Shai
Journal:  Proc Natl Acad Sci U S A       Date:  2009-04-29       Impact factor: 11.205

7.  BETASEQ: a powerful novel method to control type-I error inflation in partially sequenced data for rare variant association testing.

Authors:  Song Yan; Yun Li
Journal:  Bioinformatics       Date:  2013-12-12       Impact factor: 6.937

8.  Comprehensive assessment of T-cell receptor beta-chain diversity in alphabeta T cells.

Authors:  Harlan S Robins; Paulo V Campregher; Santosh K Srivastava; Abigail Wacher; Cameron J Turtle; Orsalem Kahsai; Stanley R Riddell; Edus H Warren; Christopher S Carlson
Journal:  Blood       Date:  2009-08-25       Impact factor: 22.113

9.  Epigenetic gene regulation in the adult mammalian brain: multiple roles in memory formation.

Authors:  Farah D Lubin
Journal:  Neurobiol Learn Mem       Date:  2011-03-16       Impact factor: 2.877

10.  Univariate/multivariate genome-wide association scans using data from families and unrelated samples.

Authors:  Lei Zhang; Yu-Fang Pei; Jian Li; Christopher J Papasian; Hong-Wen Deng
Journal:  PLoS One       Date:  2009-08-04       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.