| Literature DB >> 24860592 |
Abstract
Combining data when data are collected under different study designs, such as family trios and unrelated case-control samples, gains more power and is cost-effective than analyzing each data separately. However, a potential concern is population stratification (PS) among unrelated case-control samples and analyses integrating data should address this confounding effect. In this paper, we develop a simpler method, haplotype generalized linear model (HGLM), that tests and estimates haplotype effects on disease risk and allows for modification against PS for combining data. We proposed to combine information across aggregations of haplotype weighted-counts estimated from population case-control data and trio data separately, and to perform subsequent GLM analysis. Furthermore, we present a framework of analysis of variance based on haplotype weighted-counts for detecting whether it is appropriate to combine two data sources, as well as the modified HGLM with clustering methods for addressing PS. We evaluate the statistical properties in terms of the accuracy, false positive rate (FPR) and empirical power using simulated data with regard to various disease risks, sample sizes, multi-SNP haplotypes and the presence of PS. Our simulation results indicate that HGLM performs comparably well with the likelihood-based haplotype association analysis, particularly when the haplotype effects are moderate, but may not perform well when dealing with lengthy haplotypes for small sample sizes. In the presence of PS, the modified HGLM remains valid and has satisfactory nominal level and small bias. Overall, HGLM appears to be successful in combining data and is simple to implement in standard statistical software.Entities:
Keywords: case-parent trios; combined association analysis; haplotype specific association test; haplotype weighted-count; population stratification
Year: 2014 PMID: 24860592 PMCID: PMC4028876 DOI: 10.3389/fgene.2014.00103
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Schematic diagram of HGLM for combining data of unrelated case-control samples and family trios.
Figure 2Schematic diagram of M-HGLM for combining data of clustered case-control samples and family trios when PS presents. HC1 and HC2 denote the haplotype weighted-count obtained from clusters 1 and 2, respectively, when performing clustering methods on mixed population case-control genotype data.
False positive rate under all haplotype odds ratios (HOR) equal to 1.
| Power of the overall test | 0.055 | 0.057 | 0.056 | 0.055 | 0.055 | 0.056 | 0.053 | |||
| 1-2 | 1 | FPR | 0.044 | 0.047 | 0.050 | 0.049 | 0.047 | 0.043 | 0.036 | |
| Bias | 0.104 | 0.096 | 0.075 | 0.077 | 0.084 | 0.036 | 0.018 | |||
| MSE | 0.311 | 0.300 | 0.195 | 0.201 | 0.228 | 0.089 | 0.062 | |||
| 2-1 | 1 | FPR | 0.048 | 0.054 | 0.050 | 0.048 | 0.051 | 0.064 | 0.045 | |
| Bias | 0.047 | 0.045 | 0.028 | 0.026 | 0.030 | 0.019 | 0.009 | |||
| MSE | 0.081 | 0.078 | 0.067 | 0.065 | 0.073 | 0.035 | 0.025 | |||
| 2-2 | 1 | FPR | 0.045 | 0.043 | 0.050 | 0.045 | 0.043 | 0.050 | 0.041 | |
| Bias | 0.075 | 0.080 | 0.077 | 0.080 | 0.081 | 0.035 | 0.029 | |||
| MSE | 0.224 | 0.235 | 0.183 | 0.184 | 0.192 | 0.078 | 0.057 | |||
| Power of the overall test | 0.045 | 0.044 | 0.049 | 0.048 | 0.047 | 0.047 | 0.035 | |||
| 1-2 | 1 | FPR | 0.043 | 0.046 | 0.034 | 0.035 | 0.032 | 0.042 | 0.037 | |
| Bias | 0.006 | 0.006 | 0.018 | 0.019 | 0.019 | 0.006 | 0.005 | |||
| MSE | 0.017 | 0.017 | 0.025 | 0.026 | 0.026 | 0.010 | 0.008 | |||
| 2-1 | 1 | FPR | 0.050 | 0.051 | 0.047 | 0.047 | 0.048 | 0.044 | 0.036 | |
| Bias | 0.001 | 0.001 | 0.008 | 0.008 | 0.008 | 0.001 | 0.001 | |||
| MSE | 0.006 | 0.006 | 0.011 | 0.011 | 0.012 | 0.004 | 0.003 | |||
| 2-2 | 1 | FPR | 0.050 | 0.048 | 0.043 | 0.043 | 0.041 | 0.049 | 0.035 | |
| Bias | −0.001 | −0.001 | 0.009 | 0.009 | 0.010 | −0.002 | −0.003 | |||
| MSE | 0.014 | 0.014 | 0.026 | 0.027 | 0.027 | 0.009 | 0.007 | |||
The reference haplotype is 1-1.
FPR and Power (%) at 5% significance level for (1) 2-locus mild effect model and (2) moderate effect model under varying sample size.
| Mild | Overall | 25.6 | 26.2 | 27.2 | 27.1 | 27.1 | 50.2 | 38.6 | |||
| Effect | 1–2 | 1.207 | Haplotype | 6.6 | 7.1 | 7.3 | 7.4 | 6.7 | 9.1 | 9.9 | |
| Model | 2-1 | 1.421 | Haplotype | 27.3 | 28.1 | 27.4 | 27.8 | 27.6 | 50.8 | 41.5 | |
| 2-2 | 1.525 | Haplotype | 18.4 | 18.1 | 24.3 | 23.4 | 23.1 | 38.1 | 32.2 | ||
| Overall | 100 | 100 | 92.1 | 92.2 | 92.3 | 100 | 100 | ||||
| 1-2 | 1.207 | Haplotype | 27.7 | 27.2 | 22.0 | 21.6 | 22.5 | 42.8 | 38.2 | ||
| 2-1 | 1.421 | Haplotype | 99.4 | 99.3 | 90.4 | 90.2 | 90.4 | 99.8 | 99.8 | ||
| 2-2 | 1.525 | Haplotype | 95.3 | 95.4 | 79.4 | 78.4 | 78.4 | 99.5 | 98.9 | ||
| Moderate | Overall | 87.4 | 87.6 | 86.4 | 86.0 | 87.0 | 99.3 | 97.7 | |||
| Effect | 1-2 | 1 | Haplotype | 3.9 | 3.8 | 5.3 | 4.6 | 4.4 | 5.2 | 4.8 | |
| Model | 2-1 | 2.067 | Haplotype | 82.2 | 83.0 | 87.8 | 88.0 | 88.2 | 99.2 | 96.8 | |
| 2-2 | 2.067 | Haplotype | 46.5 | 47.7 | 54.7 | 54.5 | 53.6 | 84.1 | 75.7 | ||
| Overall | 100 | 100 | 100 | 100 | 100 | 100 | 100 | ||||
| 1-2 | 1 | Haplotype | 4.1 | 4.2 | 5.4 | 5.5 | 5.4 | 5.5 | 5.6 | ||
| 2-1 | 2.067 | Haplotype | 100 | 100 | 100 | 100 | 100 | 100 | 100 | ||
| 2-2 | 2.067 | Haplotype | 100 | 100 | 99.8 | 99.8 | 99.8 | 100 | 100 | ||
The power under true HOR = 1 represents the FPR.
The reference haplotype is 1-1.
Figure 3Boxplot of estimate biases (estimate - true HOR) for haplotype odds ratios (HOR) based on mild effect model over 1000 replications. (A–C) Denotes the HOR = 1.207, 1.421, 1.525 for n1 = n2 = n3 = 100 and (D–F) is HOR = 1.207, 1.421, 1.525 for n1 = 500, n2 = n3 = 1000.
Figure 4Boxplot of estimate biases (estimate—true HOR) for haplotype odds ratios (HOR) based on moderate effect model over 1000 replications. (A–C) Denotes the HOR = 1, 2.067, 2.067 for n1 = n2 = n3 = 100 and (D–F) is HOR = 1, 2.067, 2.067 for n1 = 500, n2 = n3 = 1000.
Power (%) at 5% significance level and bias for the 5-locus model under varying sample size. The reference haplotype is 1-2-1-2-2.
| Overall power | 98.1 | 98.0 | 98.2 | 98.5 | 98.3 | 100 | 100 | |||
| 1-1-1-1-1 | 1.467 | Power | 9.5 | 9.7 | 8.8 | 9.9 | 8.0 | 14.4 | 14.0 | |
| Bias | 0.502 | 0.384 | 0.310 | 0.350 | 0.488 | 0.157 | 0.057 | |||
| 1-1-2-1-1 | 3.811 | Power | 97.2 | 97.7 | 97.0 | 97.9 | 97.5 | 100 | 99.5 | |
| Bias | 0.355 | 0.215 | 0.493 | 0.571 | 1.311 | 0.160 | −0.904 | |||
| 1-2-2-2-2 | 1.528 | Power | 17.7 | 19.0 | 20.2 | 19.4 | 17.7 | 33.5 | 30.4 | |
| Bias | 0.178 | 0.149 | 0.149 | 0.158 | 0.252 | 0.075 | −0.031 | |||
| 2-2-2-2-1 | 1.309 | Power | 9.8 | 10.0 | 11.2 | 10.3 | 10.6 | 16.6 | 14.7 | |
| Bias | 0.120 | 0.096 | 0.101 | 0.126 | 0.176 | 0.045 | −0.003 | |||
| 2-2-2-2-2 | 1.501 | Power | 14.3 | 14.3 | 12.0 | 12.4 | 12.1 | 22.0 | 20.5 | |
| Bias | 0.364 | 0.311 | 0.255 | 0.284 | 0.418 | 0.137 | 0.028 | |||
| Overall power | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |||
| 1-1-1-1-1 | 1.467 | Power | 47.9 | 49.4 | 29.8 | 32.4 | 30.4 | 68.9 | 68.1 | |
| Bias | 0.052 | 0.042 | 0.043 | 0.060 | 0.081 | 0.026 | −0.032 | |||
| 1-1-2-1-1 | 3.811 | Power | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
| Bias | −0.049 | −0.067 | 0.097 | 0.138 | 0.457 | −0.031 | −0.940 | |||
| 1-2-2-2-2 | 1.528 | Power | 92.3 | 93.4 | 75.3 | 74.4 | 74.0 | 99.4 | 98.2 | |
| Bias | 0.005 | −0.001 | 0.029 | 0.023 | 0.064 | 0.003 | −0.080 | |||
| 2-2-2-2-1 | 1.309 | Power | 64.4 | 65.5 | 38.6 | 38.1 | 37.6 | 83.3 | 78.2 | |
| Bias | 0.008 | 0.001 | 0.014 | 0.015 | 0.035 | 0.002 | −0.041 | |||
| 2-2-2-2-2 | 1.501 | Power | 71.7 | 73.8 | 44.4 | 44.9 | 43.6 | 89.2 | 86.9 | |
| Bias | 0.025 | 0.019 | 0.043 | 0.051 | 0.092 | 0.016 | −0.060 | |||
Power of the tests for checking the appropriateness of combining samples in presence of population stratification (PS).
| overall | 0.958 | 0.705 | 0.130 | 0.058 | |||
| 1-2 | 1 | haplotype | 0.865 | 0.595 | 0.082 | 0.057 | |
| 2-1 | 1 | haplotype | 0.842 | 0.250 | 0.087 | 0.051 | |
| 2-2 | 1 | haplotype | 0.325 | 0.257 | 0.105 | 0.043 | |
| overall | 1 | 1 | 0.208 | 0.086 | |||
| 1-2 | 1 | haplotype | 1 | 1 | 0.097 | 0.067 | |
| 2-1 | 1 | haplotype | 1 | 0.956 | 0.102 | 0.072 | |
| 2-2 | 1 | haplotype | 0.932 | 0.939 | 0.157 | 0.059 | |
PS1: Population cases and controls are randomly ascertained from admixed population consisting of two strata with disease prevalence (7, 18%) and HFs for Haplotype 1-1, 1-2, 2-1, and 2-2 at (0.5, 0.1, 0.3, 0.1) and (0.4, 0.3, 0.15, 0.15), respectively. Trios are sampled from one ancestral population with HFs at (0.4, 0.3, 0.15, 0.15). PS2: Population cases, controls and trios are randomly ascertained from the same admixed population as PS1:
Represents the power of MANOVA.
False positive rates of the modified HGLM (M-HGLM) with clustering methods (Kmeans and Ward) in the presence of population stratification (.
| FPR of the overall test | 0.062 | 0.916 | 0.047 | 0.047 | 0.050 | |||
| PS1 | 1-2 | 1 | FPR | 0.044 | 0.659 | 0.041 | 0.049 | 0.051 |
| Bias | 0.041 | −0.352 | 0.009 | 0.010 | 0.012 | |||
| 2-1 | 1 | FPR | 0.046 | 0.270 | 0.045 | 0.037 | 0.045 | |
| Bias | 0.074 | 0.394 | 0.023 | 0.028 | 0.024 | |||
| 2-2 | 1 | FPR | 0.057 | 0.243 | 0.045 | 0.036 | 0.041 | |
| Bias | 0.068 | −0.238 | 0.028 | 0.023 | 0.025 | |||
| FPR of the overall test | 0.042 | 0.913 | 0.053 | 0.042 | 0.067 | |||
| PS2 | 1-2 | 1 | FPR | 0.039 | 0.781 | 0.055 | 0.042 | 0.055 |
| Bias | 0.031 | −0.415 | 0.004 | 0.006 | 0.021 | |||
| 2-1 | 1 | FPR | 0.047 | 0.178 | 0.049 | 0.049 | 0.045 | |
| Bias | 0.029 | 0.244 | 0.008 | 0.010 | 0.018 | |||
| 2-2 | 1 | FPR | 0.040 | 0.192 | 0.050 | 0.046 | 0.052 | |
| Bias | 0.063 | −0.225 | 0.027 | 0.021 | 0.041 | |||
PS1: Population cases and controls are randomly ascertained from admixed population consisting of two strata with disease prevalence (7%, 18%) and HFs for Haplotype 1-1, 1-2, 2-1, and 2-2 at (0.5, 0.1, 0.3, 0.1) and (0.4, 0.3, 0.15, 0.15), respectively. Trios are sampled from one ancestral population with HFs at (0.4, 0.3, 0.15, 0.15). PS2: Population cases, controls and trios are randomly ascertained from the same admixed population as PS1.
False positive rates of the modified HGLM (M-HGLM) with clustering methods (Kmeans and Ward) in the presence of population stratification (.
| FPR of the overall test | 0.037 | 0.590 | 0.050 | 0.050 | 0.057 | |||
| PS1 | 1-2 | 1 | FPR | 0.054 | 0.390 | 0.048 | 0.037 | 0.056 |
| Bias | 0.009 | −0.112 | 0.001 | 0.002 | 0.002 | |||
| 2-1 | 1 | FPR | 0.051 | 0.146 | 0.055 | 0.046 | 0.050 | |
| Bias | 0.013 | 0.073 | 0.001 | 0.003 | 0.003 | |||
| 2-2 | 1 | FPR | 0.053 | 0.133 | 0.043 | 0.055 | 0.058 | |
| Bias | 0.015 | −0.069 | 0.006 | 0.004 | 0.004 | |||
| FPR of the overall test | 0.037 | 0.612 | 0.035 | 0.041 | 0.049 | |||
| PS2 | 1-2 | 1 | FPR | 0.030 | 0.457 | 0.046 | 0.038 | 0.048 |
| Bias | 0.008 | −0.127 | 0.001 | 0.002 | 0.002 | |||
| 2-1 | 1 | FPR | 0.044 | 0.105 | 0.038 | 0.048 | 0.043 | |
| Bias | 0.006 | 0.057 | <0.001 | 0.002 | 0.002 | |||
| 2-2 | 1 | FPR | 0.052 | 0.117 | 0.052 | 0.052 | 0.061 | |
| Bias | 0.014 | −0.067 | 0.006 | 0.003 | 0.004 | |||
PS1: Population cases and controls are randomly ascertained from admixed population consisting of two strata with disease prevalence (7%, 18%) and HFs for Haplotype 1-1, 1-2, 2-1, and 2-2 at (0.5, 0.1, 0.3, 0.1) and (0.4, 0.3, 0.15, 0.15), respectively. Trios are sampled from one ancestral population with HFs at (0.4, 0.3, 0.15, 0.15). PS2: Population cases, controls and trios are randomly ascertained from the same admixed population as PS1.