| Literature DB >> 25822090 |
Minsun Song1, Wei Hao1, John D Storey2.
Abstract
We present a new statistical test of association between a trait and genetic markers, which we theoretically and practically prove to be robust to arbitrarily complex population structure. The statistical test involves a set of parameters that can be directly estimated from large-scale genotyping data, such as those measured in genome-wide association studies (GWAS). We also derive a new set of methodologies, called a 'genotype-conditional association test' (GCAT), shown to provide accurate association tests in populations with complex structures, manifested in both the genetic and non-genetic contributions to the trait. We demonstrate the proposed method on a large simulation study and on the Northern Finland Birth Cohort study. In the Finland study, we identify several new significant loci that other methods do not detect. Our proposed framework provides a substantially different approach to the problem from existing methods, such as the linear mixed-model and principal-component approaches.Entities:
Mesh:
Year: 2015 PMID: 25822090 PMCID: PMC4464830 DOI: 10.1038/ng.3244
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Figure 2Performance of association testing methods. One-hundred quantitative trait GWAS studies were simulated in each of the Balding-Nichols, HGDP, TGP, PSD (α =0.1), and Spatial (a =0.1) simulation scenarios (see Online Methods for definitions of each) to compare the Oracle, GCAT (proposed), LMM-EMMAX, LMM-GEMMA, and PCA testing methods. The variance contributions to the trait are genetic=5%, non-genetic=5%, and noise=90%. The difference between the observed number of false positives and expected number of false positives is plotted against the expected number of false positives under the null hypothesis of no association for each simulated study (grey lines), the average of those differences (black line), and the middle 90% (blue lines). All simulations involved m =100,000 SNPs, so the range of the x-axis corresponds to choosing a significance threshold of up to p-value ≤ 0.0025. The difference on the y-axis is the number of “spurious associations.” PCA is shown on a separate y-axis since it usually has a much larger maximum than the other methods. The Oracle method is where the true population structure parameters are inputted into the proposed test (see Results), which we have theoretically proven always corrects for structure (see Supplementary Note).
Number of significant loci at genome-wide significance (p-value < 7.2×10−8) for each of the 10 traits from the Northern Finland Birth Cohort data. Each method was performed with a subsequent genomic control inflation factor correction applied (denoted by +GC). The counts for LMM+GC, PCA+GC, and Uncorr+GC were obtained from Table 2 in Kang et al. (2010). In this case LMM is EMMAX-LMM.
| Trait | Abbreviation | GCAT+GC | LMM+GC | PCA+GC | Uncorr+GC |
|---|---|---|---|---|---|
| Body Mass Index | BMI | 0 | 0 | 0 | 0 |
| C-reactive Protein | CRP | 2 | 2 | 2 | 2 |
| Diastolic blood pressure | DBP | 0 | 0 | 0 | 0 |
| Glucose | GLU | 3 | 2 | 2 | 2 |
| HDL Cholesterol | HDL | 4 | 4 | 2 | 4 |
| Height | Height | 1 | 0 | 0 | 0 |
| Insulin | INS | 0 | 0 | 0 | 0 |
| LDL Cholesterol | LDL | 4 | 3 | 3 | 3 |
| Systolic blood pressure | SBP | 0 | 0 | 0 | 0 |
| Triglycerides | TG | 2 | 3 | 2 | 2 |
|
| |||||
| Total | 16 | 14 | 11 | 13 | |
Result when the Box-Cox transformation was not applied to the CRP trait. The result is 1 when the transformation is applied.