| Literature DB >> 23267368 |
Robert Makowsky1, Qi Yan, Howard W Wiener, Michael Sandel, Brahim Aissani, Hemant K Tiwari, Sadeep Shrestha.
Abstract
Genome-wide association (GWA) studies have become a standard approach for discovering and validating genomic polymorphisms putatively associated with phenotypes of interest. Accounting for population structure in GWA studies is critical to attain unbiased parameter measurements and control Type I error. One common approach to accounting for population structure is to include several principal components derived from the entire autosomal dataset, which reflects population structure signal. However, knowing which components to include is subjective and generally not conclusive. We examined how phylogenetic signal from mitochondrial DNA (mtDNA) and chromosome Y (chr:Y) markers is concordant with principal component data based on autosomal markers to determine whether mtDNA and chr:Y phylogenetic data can help guide principal component selection. Using HAPMAP and other original data from individuals of multiple ancestries, we examined the relationships of mtDNA and chr:Y phylogenetic signal with the autosomal PCA using best subset logistic regression. We show that while the two approaches agree at times, this is independent of the component order and not completely represented in the Eigen values. Additionally, we use simulations to demonstrate that our approach leads to a slightly reduced Type I error rate compared to the standard approach. This approach provides preliminary evidence to support the theoretical concept that mtDNA and chr:Y data can be informative in locating the PCs that are most associated with evolutionary history of populations that are being studied, although the utility of such information will depend on the specific situation.Entities:
Keywords: PCA; Y chromosome; mitochondria; phylogeny; population sub-structure
Year: 2012 PMID: 23267368 PMCID: PMC3527715 DOI: 10.3389/fgene.2012.00301
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1(A) First three principal component scores based on all autosomal markers with post hoc groupings. (B) Scree plot depicting Eigen values for the first 50 components for the corresponding components in (A).
Principal Components based on autosomal markers and associated phylogenic clades of mitochondrial DNA (mtDNA) and Y chromosome chr:Y markers.
| Principal components | Mitochondrial and Y chromosome (chr:Y) markers based phylogenic clades | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| X1 | X11 | X113 | X2 | X21 | X211 | X2111 | X2112 | X21121 | X21122 | |
| 1st | E−157 | E−41 | . | E−157 | E−85 | E−71 | E−7 | E−43 | E−31 | E−22 |
| 2nd | E−17 | . | . | E−17 | E−18 | E−20 | E−14 | . | E−26 | E−53 |
| 3rd | . | . | . | . | . | . | . | . | . | E−4 |
| 4th | . | . | . | . | . | . | E−11 | E−6 | . | E−15 |
| 5th | . | . | . | . | . | . | E−8 | E−3 | . | E−8 |
| 9th | . | E−3 | . | . | . | . | . | . | . | . |
| 11th | . | . | . | . | . | . | E−5 | . | . | E−6 |
| 12th | . | . | . | . | . | . | . | . | . | E−3 |
| 14th | . | E−3 | . | . | . | . | . | . | . | . |
| 18th | . | E−3 | . | . | . | . | . | . | . | . |
| 25th | . | E−5 | . | . | . | . | . | . | . | . |
| 27th | . | E−3 | . | . | . | . | . | . | . | . |
| 29th | . | E−3 | . | . | . | . | . | . | . | . |
| 33rd | E−15 | E−42 | E−4 | E−15 | . | . | . | . | . | E−3 |
| 34th | E−6 | E−21 | . | E−6 | E−19 | E−15 | . | E−9 | . | . |
| 35th | E−6 | E−30 | . | E−6 | . | . | . | . | . | . |
Components selected based on subset selection reported as the corresponding .
Figure 2Maximum parsimony cladogram of mitochondrial DNA (mtDNA) and Y chromosome (chr:Y) single nucleotide polymorphisms. All nodes have >70% bootstrap proportion. Unique identifiers (n = 19) were assigned to each statistically significant node. Terminal branch tips are colored based on groupings assigned in Figure 1 using the same color coding (Green, Yoruba; Black, African-American; Red, Japanese; Purple, Chinese; Blue, Caucasian).
Figure 3Example plots depicting how principal component scores are associated with phylogeny: (A) 21 with principal components 1, 2, and 34 and (B) 21121 with principal components 1 and 2.
Figure 4Type I error rates and calculated using (A) the η statistic, (B) . For each correction type, a vertical line has been drawn at the median value, although this is off the scale for the “No correction” approach in plots (B,C). Additionally, “Our methods” 1 and 2 have the same median value in plots (B,C). In (A), the scale on the x-axis is designed to demonstrate how the various correction methods all vastly outperform the model without any correction. In (B,C), the x-axis scale has focused in on the three correction methods to better demonstrate the gain resulting from our approach.