| Literature DB >> 20470424 |
Ming-Huei Chen1, Martin G Larson, Yi-Hsiang Hsu, Gina M Peloso, Chao-Yu Guo, Caroline S Fox, Larry D Atwood, Qiong Yang.
Abstract
BACKGROUND: Genome-wide association (GWA) studies that use population-based association approaches may identify spurious associations in the presence of population admixture. In this paper, we propose a novel three-stage approach that is computationally efficient and robust to population admixture and more powerful than the family-based association test (FBAT) for GWA studies with family data.We propose a three-stage approach for GWA studies with family data. The first stage is to perform linear regression ignoring phenotypic correlations among family members. SNPs with a first stage p-value below a liberal cut-off (e.g. 0.1) are then analyzed in the second stage that employs a linear mixed effects (LME) model that accounts for within family correlations. Next, SNPs that reach genome-wide significance (e.g. 10-6 for 34,625 genotyped SNPs in this paper) are analyzed in the third stage using FBAT, with correction of multiple testing only for SNPs that enter the third stage. Simulations are performed to evaluate type I error and power of the proposed method compared to LME adjusting for 10 principal components (PC) of the genotype data. We also apply the three-stage approach to the GWA analyses of uric acid in Framingham Heart Study's SNP Health Association Resource (SHARe) project.Entities:
Mesh:
Year: 2010 PMID: 20470424 PMCID: PMC2892427 DOI: 10.1186/1471-2156-11-40
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Figure 1Flow chart of the three-stage approach. P-value cut-off is 0.1, 10-6 and 0.05/n for the first, second and third stage, respectively, where n is the number of SNPs detected at the second stage LME. When FBAT is not included, the three-stage approach reduces to two-stage approach.
Type I error estimate at alpha = 0.05 with 10,000 replicates for quantitative phenotype and single SNP genotype data with MAF = 0.1.
| Phenotype Distribution | Polygenic Variance | LM | LME | FBAT | LM-LME 1 | LM-LME 2 |
|---|---|---|---|---|---|---|
| Normal | 0.3 | 0.084 | 0.051 | 0.050 | 0.049 | 0.051 |
| Normal | 0.6 | 0.115 | 0.052 | 0.049 | 0.046 | 0.050 |
| Abs Normal | 0.3 | 0.055 | 0.053 | 0.049 | 0.053 | 0.053 |
| Abs Normal | 0.6 | 0.064 | 0.050 | 0.049 | 0.050 | 0.050 |
| Chi-square(1) | 0.3 | 0.076 | 0.052 | 0.050 | 0.052 | 0.052 |
| Chi-square(1) | 0.6 | 0.104 | 0.049 | 0.051 | 0.045 | 0.047 |
| Lognormal | 0.3 | 0.059 | 0.054 | 0.046 | 0.054 | 0.054 |
| Lognormal | 0.6 | 0.066 | 0.049 | 0.047 | 0.049 | 0.049 |
The SNP explains 0% phenotype variation.
Normal: marginal phenotype follows the standard normal distribution
Abs Normal: marginal phenotype follows the absolute standard normal distribution
Chi-square(1): marginal phenotype follows a Chi-square distribution with 1 degree of freedom
Lognormal: marginal phenotype follows a Lognormal distribution
LM_LME1: two-stage approach with LM (p-value cut-off 0.1) at first stage
LM_LME2: two-stage approach with LM (p-value cut-off 0.2) at first stage
Type I error estimate at alpha = 0.05 with 10,000 replicates of phenotype and a single SNP in LE with a QTL explaining 10% phenotype variation.
| MAF | Polygenic Variance | LM | LME | FBAT | LM-LME 1 | LM-LME 2 |
|---|---|---|---|---|---|---|
| 0.005 | 0.3 | 0.087 | 0.046 | 0.052 | 0.044 | 0.045 |
| 0.005 | 0.6 | 0.118 | 0.053 | 0.047 | 0.046 | 0.050 |
| 0.01 | 0.3 | 0.089 | 0.050 | 0.049 | 0.048 | 0.050 |
| 0.01 | 0.6 | 0.122 | 0.049 | 0.048 | 0.044 | 0.048 |
| 0.05 | 0.3 | 0.087 | 0.048 | 0.052 | 0.047 | 0.048 |
| 0.05 | 0.6 | 0.118 | 0.050 | 0.050 | 0.043 | 0.047 |
| 0.1 | 0.3 | 0.082 | 0.044 | 0.053 | 0.041 | 0.043 |
| 0.1 | 0.6 | 0.122 | 0.049 | 0.054 | 0.043 | 0.047 |
The SNP explains 0% phenotype variation.
A QTL is simulated to explain 10% phenotype variation and in LE with the SNP.
Marginal phenotype distribution follows the standard normal distribution.
LM_LME1: two-stage approach with LM (p-value cut-off 0.1) at first stage
LM_LME2: two-stage approach with LM (p-value cut-off 0.2) at first stage
Correlation coefficient between FBAT and LME statistics based on 10,000 replicates of no SNP association with a continuous phenotype (marginal phenotype distribution follows the standard normal distribution).
| Correlation of FBAT and LME statistics at LME p-value level | |||||||
|---|---|---|---|---|---|---|---|
| 0.01 | 0.3 | 0.89 | 0.82 | 0.77 | 0.68 | 0.61 | 0.35 |
| 0.01 | 0.6 | 0.91 | 0.83 | 0.79 | 0.69 | 0.63 | 0.36 |
| 0.05 | 0.3 | 0.87 | 0.81 | 0.75 | 0.71 | 0.61 | 0.34 |
| 0.05 | 0.6 | 0.89 | 0.84 | 0.76 | 0.72 | 0.64 | 0.36 |
| 0.1 | 0.3 | 0.90 | 0.81 | 0.77 | 0.70 | 0.60 | 0.34 |
| 0.1 | 0.6 | 0.90 | 0.83 | 0.79 | 0.73 | 0.63 | 0.36 |
Type I error estimate at alpha = 1e-6 using 100 replicates of phenotype without QTL effect and real 550K genotype data of 34,265 SNPs (HWE p-value > 1e-6, call rate > 95%, MAF > 0.01) on chromosome 1.
| MAF | Polygenic Variance | LM | LME | FBAT | LM-LME 1 | Three-stage |
|---|---|---|---|---|---|---|
| 0.1 | 0.3 | 0.22 | 0.01 | 0.02 | 0.01 | 0.01 |
| 0.1 | 0.6 | 0.78 | 0.01 | 0.02 | 0.01 | 0 |
Marginal phenotype distribution follows the standard normal distribution.
LM_LME1: two-stage approach with LM at first stage with p-value cut-off 0.1 and LME at second stage with p-value cut-off 1e-6.
Three-stage: LM at first stage with p-value cut-off 0.1, LME at second stage with p-value cut-off 1e-6 and FBAT at third stage with p-value cut-off 0.05/n, where n is the number of SNPs entering the third stage.
Power estimate at alpha = 1e-6 using 100 replicates of phenotype with 1% QTL (rs1570092 with MAF 0.31) effect and 30% of polygenic effect, and real 550K genotype data of 34,265 SNPs (HWE p-value > 1e-6, call rate > 95%, MAF > 0.01) on chromosome 1.
| LD group | # SNPs in LD group | LM | LME | FBAT | LM-LME 1 | Three-stage |
|---|---|---|---|---|---|---|
| 0 < | 34025 | 0.20 | 0 | 0.01 | 0 | 0 |
| 0.01 < | 209 | 0.02 | 0.01 | 0 | 0.01 | 0 |
| 0.1 < | 6 | 0.70 | 0.54 | 0 | 0.54 | 0.50 |
| 0.3 < | 15 | 1 | 1 | 0.39 | 1 | 0.99 |
| 0.8 < | 10 | 1 | 1 | 0.51 | 1 | 0.99 |
rs1570092 explains 1% phenotype variation.
Marginal phenotype distribution follows a normal distribution with variance 1. The additive genetic effect is sqrt(0.01/2/0.31/0.69) = 0.153.
LM_LME1: two-stage approach with LM at first stage with p-value cut-off 0.1, and LME at second stage with p-value cut-off 1e-6.
Three-stage: LM at first stage with p-value cut-off 0.1, LME at second stage with p-value cut-off 1e-6 and FBAT at third stage with p-value cut-off 0.05/n, where n is the number of SNPs entering the third stage.
Type I error and power estimates in the presence of population admixture at alpha = 1e-6 using 100 replicates.
| Phenotype data | LM | LME | FBAT | LM-LME 1 | Three-stage | |
|---|---|---|---|---|---|---|
| Type I error (r2 = 0) | original phenotypes | 0.63 | 0.08 | 0.02 | 0.08 | 0.02 |
| residuals (1) | 0.59 | 0.07 | - | 0.07 | - | |
| residuals (2) | 0.13 | 0.01 | - | 0.01 | - | |
| Power r2 = 0.5 | original phenotypes | 0.43 | 0.39 | 0 | 0.39 | 0.31 |
| residuals (1) | 0.45 | 0.37 | - | 0.37 | - | |
| residuals (2) | 0.17 | 0.14 | - | 0.14 | ||
| Power r2 = 0.8 | original phenotypes | 0.66 | 0.63 | 0.05 | 0.63 | 0.56 |
| residuals (1) | 0.66 | 0.63 | - | 0.63 | - | |
| residuals (2) | 0.57 | 0.47 | - | 0.47 | - | |
| Power QTL | original phenotypes | 0.81 | 0.85 | 0.11 | 0.85 | 0.83 |
| residuals (1) | 0.82 | 0.84 | - | 0.84 | - | |
| residuals (2) | 0.81 | 0.67 | - | 0.67 | - | |
Marginal phenotype distribution follows a normal distribution with variance 1. The additive genetic effect in assessing power is sqrt(0.005/2/0.3/0.7) = 0.109.
LM_LME1: two-stage approach with LM at first stage with p-value cut-off 0.1 and LME at second stage with p-value cut-off 1e-6.
Three-stage: LM at first stage with p-value cut-off 0.1, LME at second stage with p-value cut-off 1e-6 and FBAT at third stage with p-value cut-off 0.05/n, where n is the number of SNPs entering the third stage.
The original simulated phenotypes and the residuals adjusted for 10 PC obtained from all 34,625 SNPs on chromosome 1 and 100 admixture SNPs (1), and from 100 admixture SNPs (2) are analyzed. The 100 admixture simulated SNPs have expected allele frequency 0.3, F= 0.025, the offset value δ = 0.25, QTL variance is 0.005, and the polygenic variation is 0.3. When estimating power, 34,265 SNPs are not used.
Top SNP in each gene identified from the three-stage analyses of Uric acid levels in FHS SHARe project.
| SNP | Chr | Position | Gene | MAF | LM pval | LME pval | FBAT pval | Direction | Three-stage pval | Three-stage pval for top SNPs |
|---|---|---|---|---|---|---|---|---|---|---|
| rs1165205 | 6 | 25978521 | SLC17A3 | 0.46 | 3.2E-11 | 5.6E-10 | 7.1E-03 | --- | 1.0E+00 | 2.1E-02 |
| rs2231142 | 4 | 89271347 | ABCG2 | 0.11 | 2.4E-23 | 9.0E-20 | 5.6E-11 | +++ | 8.3E-09 | 1.7E-10 |
| rs16890979 | 4 | 9531265 | SLC2A9 | 0.23 | 3.4E-88 | 1.6E-76 | 8.3E-23 | --- | 1.2E-20 | 2.5E-22 |
1) Direction: the three signs are the direction of changes in multivariable adjusted uric acid per minor allele in the LM, LME and FBAT analyses, respectively.
2) Three-stage pval: Three-stage p-values adjusting for the multiple testing of all SNPs with p-value < 5 × 10-8 in the second stage.
3) Three-stage pval for top SNPs: Three-stage p-values adjusting for the multiple testing of only the top SNP in each gene among all SNPs with p-value < 5 × 10-8 in the second stage.