| Literature DB >> 18208327 |
Alkes L Price1, Johannah Butler, Nick Patterson, Cristian Capelli, Vincenzo L Pascali, Francesca Scarnicci, Andres Ruiz-Linares, Leif Groop, Angelica A Saetta, Penelope Korkolopoulou, Uri Seligsohn, Alicja Waliszewska, Christine Schirmer, Kristin Ardlie, Alexis Ramos, James Nemesh, Lori Arbeitman, David B Goldstein, David Reich, Joel N Hirschhorn.
Abstract
European Americans are often treated as a homogeneous group, but in fact form a structured population due to historical immigration of diverse source populations. Discerning the ancestry of European Americans genotyped in association studies is important in order to prevent false-positive or false-negative associations due to population stratification and to identify genetic variants whose contribution to disease risk differs across European ancestries. Here, we investigate empirical patterns of population structure in European Americans, analyzing 4,198 samples from four genome-wide association studies to show that components roughly corresponding to northwest European, southeast European, and Ashkenazi Jewish ancestry are the main sources of European American population structure. Building on this insight, we constructed a panel of 300 validated markers that are highly informative for distinguishing these ancestries. We demonstrate that this panel of markers can be used to correct for stratification in association studies that do not generate dense genotype data.Entities:
Mesh:
Substances:
Year: 2007 PMID: 18208327 PMCID: PMC2211542 DOI: 10.1371/journal.pgen.0030236
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1The Top Two Axes of Variation of MS, BD, PD, and IBD Datasets
(A) MS dataset, (B) BD dataset, (C) PD dataset, (D) IBD dataset, (E) IBD dataset with samples labeled according to self-reported ancestry (see Methods): northwest European (IBD-NWreport), southeast European (IBD-SEreport) or Ashkenazi Jewish (IBD-AJreport), with individuals having unknown or mixed European ancestry and not self-reporting as Ashkenazi Jewish (IBD-noreport) not displayed.
Figure 2The Top Two Axes of Variation of the Combined Dataset (MS, BD, PD, and IBD)
Samples from the IBD dataset are labeled according to self-reported ancestry, as in Figure 1E.
Inferred Ancestry of Individuals in the MS, BD, PD, and IBD Datasets
Values of Genome-Wide Inflation Factor (λ) for Two Comparisons of Genome-Wide Datasets, Correcting along 0, 1, 2, or 10 Eigenvectors Using EIGENSTRAT
Association Statistics between LCT Candidate Marker and Height in 368 European American Samples, before and after Stratification Correction Using Our Panel of 300 Markers
Figure 3The Top Two Axes of Variation of a Dataset of Diverse European Samples
Results are based on (A) 583 markers putatively ancestry-informative markers, and (B) 300 validated markers.
Figure 4The Top Two Axes of Variation of the Height Samples Together with European Samples
Results are based on the 299 markers from our marker panel that are unlinked to the LCT locus. Height samples are labeled according to self-reported grandparental origin: northwest European (Height-NWreport), southeast European (Height-SEreport) or four USA-born grandparents (Height-USAreport).