| Literature DB >> 22373300 |
Tesfaye M Baye1, Hua He, Lili Ding, Brad G Kurowski, Xue Zhang, Lisa J Martin.
Abstract
Next-generation sequencing technologies now make it possible to genotype and measure hundreds of thousands of rare genetic variations in individuals across the genome. Characterization of high-density genetic variation facilitates control of population genetic structure on a finer scale before large-scale genotyping in disease genetics studies. Population structure is a well-known, prevalent, and important factor in common variant genetic studies, but its relevance in rare variants is unclear. We perform an extensive population structure analysis using common and rare functional variants from the Genetic Analysis Workshop 17 mini-exome sequence. The analysis based on common functional variants required 388 principal components to account for 90% of the variation in population structure. However, an analysis based on rare variants required 532 significant principal components to account for similar levels of variation. Using rare variants, we detected fine-scale substructure beyond the population structure identified using common functional variants. Our results show that the level of population structure embedded in rare variant data is different from the level embedded in common variant data and that correcting for population structure is only as good as the level one wishes to correct.Entities:
Year: 2011 PMID: 22373300 PMCID: PMC3287920 DOI: 10.1186/1753-6561-5-S9-S8
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Figure 1Scatterplot of principal component axis one (PC1) and axis two (PC2) based on (a) common functional variants and (b) rare functional variants. CEPH, European-descended population (U.S. Caucasians).
Figure 2Inferred genetic ancestry with even clusters from seven populations based on common (upper panel) and rare (lower panel) variants. Each individual is represented by a thin vertical line, which is partitioned into seven colored segments that represent the individual’s estimated ancestry coefficients in the seven clusters. Individuals (separated by solid lines) are represented by bars on the x-axis, and ancestry proportion is given on the y-axis. The proportion of ancestry is illustrated by the amount of different color in each individual. CEPH, European-descended population (U.S. Caucasians).
Figure 3Predictive accuracy of common versus rare functional variants based on PC1 or PC2.
Figure 4Scatterplot of the 697 individuals using allele frequency for the common and rare variants. CEPH, European-descended population (U.S. Caucasians).