| Literature DB >> 25415970 |
Timothy D O'Connor1, Wenqing Fu2, Josyf C Mychaleckyj3, Benjamin Logsdon4, Paul Auer5, Christopher S Carlson4, Suzanne M Leal6, Joshua D Smith2, Mark J Rieder2, Michael J Bamshad7, Deborah A Nickerson2, Joshua M Akey2.
Abstract
Understanding the genetic structure of human populations has important implications for the design and interpretation of disease mapping studies and reconstructing human evolutionary history. To date, inferences of human population structure have primarily been made with common variants. However, recent large-scale resequencing studies have shown an abundance of rare variation in humans, which may be particularly useful for making inferences of fine-scale population structure. To this end, we used an information theory framework and extensive coalescent simulations to rigorously quantify the informativeness of rare and common variation to detect signatures of fine-scale population structure. We show that rare variation affords unique insights into patterns of recent population structure. Furthermore, to empirically assess our theoretical findings, we analyzed high-coverage exome sequences in 6,515 European and African American individuals. As predicted, rare variants are more informative than common polymorphisms in revealing a distinct cluster of European-American individuals, and subsequent analyses demonstrate that these individuals are likely of Ashkenazi Jewish ancestry. Our results provide new insights into the population structure using rare variation, which will be an important factor to account for in rare variant association studies.Entities:
Keywords: exome sequencing; information theory; recent demography
Mesh:
Year: 2014 PMID: 25415970 PMCID: PMC4327153 DOI: 10.1093/molbev/msu326
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1.Relative informativeness of rare and common variants. (a) Summary of the demographic model used. NA, NB, and NF denote the ancestral population size, the bottleneck population size, and the final population size, respectively. Ts and Te indicate the time to population splitting and bottleneck. Population expansion begins immediately when the bottleneck ends. (b) Ancestry proportions estimated by FRAPPE as a function of the time to population splitting. (c) The expected IG for common (blue) and rare (red) variants as a function of the time to population splitting. Black lines denote the ratio of rare to common IG. (d) Inset of panel C for the time to population splitting in the range of 0–20 kya.
Fig. 2.PCA of common and rare variation. (a) PCA results for the first two principal components of the 6,515 ESP individuals using common variants (MAF ≥ 10%). AA are in blue and EA are in red. (b) PCA results for the same individuals using rare variants (MAF ≤ 0.5%).
Fig. 3.PCA and Procrustes analysis of the combined SNP data. (a) Procrustes projection (blue) using the longitude/latitude values (black) and the PCA values from the HGDP European samples (Ady, Adygei, Russia Caucasus; Fre, French, France; Nor, North Italian, Italy; Orc, Orcadian, Orkney Islands; Rus, Russian, Russia; Sar, Sardinian, Italy; and Tus, Tuscan, Italy). The predicted position of the Cluster 1 samples is shown in red. (b) Global PCA of the 1,337 individuals from 83 populations (including Cluster 1 representatives) labeled by major geographical or ethnic group. The inset highlights the position of Cluster 1 in red.
Fig. 4.Cluster 1 individuals are closely related to individuals of Ashkenazi Jewish ancestry. (a) The global FRAPPE analysis with K = 8, focused on European and Middle Eastern populations. The remaining populations are in supplementary figure S8, Supplementary Material online. (b) Neighbor-joining tree based on Nei's genetic distance. Green branches denote the main group of Jewish populations and Cluster 1 is highlighted by the red circle. Note, Cluster 1 groups with Ashkenazi and then Sephardic Jewish populations.