| Literature DB >> 28081206 |
James J Yang1, L Keoki Williams2,3, Anne Buu4.
Abstract
We propose a multivariate genome-wide association test for mixed continuous, binary, and ordinal phenotypes. A latent response model is used to estimate the correlation between phenotypes with different measurement scales so that the empirical distribution of the Fisher's combination statistic under the null hypothesis is estimated efficiently. The simulation study shows that our proposed correlation estimation methods have high levels of accuracy. More importantly, our approach conservatively estimates the variance of the test statistic so that the type I error rate is controlled. The simulation also shows that the proposed test maintains the power at the level very close to that of the ideal analysis based on known latent phenotypes while controlling the type I error. In contrast, conventional approaches-dichotomizing all observed phenotypes or treating them as continuous variables-could either reduce the power or employ a linear regression model unfit for the data. Furthermore, the statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that conducting a multivariate test on multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests. The proposed method also offers a new approach to analyzing the Fagerström Test for Nicotine Dependence as multivariate phenotypes in genome-wide association studies.Entities:
Mesh:
Year: 2017 PMID: 28081206 PMCID: PMC5231271 DOI: 10.1371/journal.pone.0169893
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Different types of correlation coefficients when the variables X and Y are continuous, binary, or ordinal.
| Continuous | Binary | Ordinal | ||
| Continuous | Kendall | Biserial | Polyserial | |
| Binary | Tetrachoric | Polychoric | ||
| Ordinal | Polychoric | |||
Simulation results for the correlation estimation based on Kendall’s τ, biserial, polyserial, tetrachoric, or polychoric correlation.
The choice of correlation methods depends on the measurement scale. The values of ρ ranges from −0.9 to 0.9. The correlation estimates and standard deviations for various methods are calculated based on 10,000 replications.
| Continuous-Continuous | Continuous-Binary | Continuous-Ordinal | Binary-Binary | Binary-Ordinal | Ordinal-Ordinal | |
|---|---|---|---|---|---|---|
| −0.9 | −0.8999 (0.0066) | −0.9003 (0.0112) | −0.9009 (0.0082) | −0.8999 (0.0159) | −0.9007 (0.0149) | −0.9004 (0.0120) |
| −0.8 | −0.7997 (0.0124) | −0.8002 (0.0185) | −0.8006 (0.0136) | −0.8000 (0.0249) | −0.8007 (0.0213) | −0.8008 (0.0160) |
| −0.7 | −0.6999 (0.0172) | −0.7004 (0.0238) | −0.7007 (0.0183) | −0.7000 (0.0315) | −0.7009 (0.0269) | −0.7008 (0.0206) |
| −0.6 | −0.6000 (0.0218) | −0.6004 (0.0287) | −0.6004 (0.0225) | −0.6002 (0.0374) | −0.6009 (0.0312) | −0.6008 (0.0252) |
| −0.5 | −0.4999 (0.0253) | −0.5001 (0.0324) | −0.5005 (0.0263) | −0.5000 (0.0418) | −0.5009 (0.0350) | −0.5006 (0.0287) |
| −0.4 | −0.4000 (0.0281) | −0.4005 (0.0351) | −0.4005 (0.0289) | −0.3998 (0.0453) | −0.4005 (0.0380) | −0.4005 (0.0311) |
| −0.3 | −0.3001 (0.0302) | −0.3004 (0.0371) | −0.3004 (0.0310) | −0.3003 (0.0477) | −0.3005 (0.0407) | −0.3005 (0.0333) |
| −0.2 | −0.2004 (0.0317) | −0.2010 (0.0388) | −0.2005 (0.0327) | −0.2010 (0.0493) | −0.2004 (0.0418) | −0.2009 (0.0352) |
| −0.1 | −0.1006 (0.0330) | −0.1008 (0.0398) | −0.1008 (0.0337) | −0.1008 (0.0509) | −0.1010 (0.0428) | −0.1011 (0.0359) |
| 0.0 | −0.0000 (0.0331) | 0.0002 (0.0397) | 0.0001 (0.0339) | −0.0001 (0.0506) | −0.0004 (0.0428) | 0.0002 (0.0364) |
| 0.1 | 0.0998 (0.0324) | 0.0999 (0.0392) | 0.0998 (0.0331) | 0.1000 (0.0499) | 0.0999 (0.0420) | 0.1000 (0.0354) |
| 0.2 | 0.1996 (0.0317) | 0.2003 (0.0390) | 0.1998 (0.0323) | 0.1997 (0.0494) | 0.1995 (0.0414) | 0.2000 (0.0348) |
| 0.3 | 0.2999 (0.0298) | 0.3002 (0.0370) | 0.3003 (0.0304) | 0.3001 (0.0477) | 0.3004 (0.0398) | 0.3006 (0.0326) |
| 0.4 | 0.3993 (0.0279) | 0.3996 (0.0352) | 0.4000 (0.0287) | 0.3994 (0.0449) | 0.4003 (0.0372) | 0.4003 (0.0311) |
| 0.5 | 0.4997 (0.0249) | 0.5004 (0.0319) | 0.5001 (0.0259) | 0.5001 (0.0416) | 0.5005 (0.0347) | 0.5006 (0.0284) |
| 0.6 | 0.6000 (0.0217) | 0.6006 (0.0285) | 0.6004 (0.0226) | 0.6000 (0.0377) | 0.6003 (0.0314) | 0.6009 (0.0249) |
| 0.7 | 0.6999 (0.0172) | 0.7004 (0.0237) | 0.7006 (0.0184) | 0.6999 (0.0317) | 0.7005 (0.0269) | 0.7008 (0.0206) |
| 0.8 | 0.8000 (0.0124) | 0.8004 (0.0184) | 0.8008 (0.0137) | 0.8003 (0.0250) | 0.8006 (0.0215) | 0.8011 (0.0162) |
| 0.9 | 0.8997 (0.0066) | 0.9001 (0.0113) | 0.9007 (0.0082) | 0.8996 (0.0160) | 0.9005 (0.0152) | 0.8997 (0.0112) |
Fig 1The relationship between the covariance cov[−2log(p), −2log(p)] and the correlation ρ.
The title in each panel indicates the types of data simulated. The solid curve in each panel corresponds to our covariance estimates using Eq (3). The dotted curves are the true covariances calculated from the simulated data.
Simulation results for the empirical power with varied correlations ρ and genetic effect sizes (e1, …, e6).
The Latent column is the power with the proposed method applied to 6 latent phenotypes. The Mixed column is the power with the proposed method applied to 6 observed phenotypes. The Dichotomous column is the power when the observed phenotypes are dichotomized. The Continuous column is the power when the phenotypes in mixed measurements are treated as continuous variables. The number of iterations is 106 when all genetic effect sizes are zero and 104 for other situations.
| Latent | Mixed | Dichotomous | Continuous | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00009 | 0.00008 | 0.00004 | 0.00009 |
| 0.35 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00057 | 0.00024 | 0.00005 | 0.00025 |
| 0.75 | 0 | 0 | 0 | 0 | 0 | 0 | 0.00027 | 0.00007 | 0.00002 | 0.00012 |
| 0 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.96 | 0.91 | 0.77 | 0.91 |
| 0.35 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.84 | 0.73 | 0.49 | 0.73 |
| 0.75 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.51 | 0.36 | 0.19 | 0.39 |
| 0 | 0 | 0.7 | 0 | 0.7 | 0 | 0.7 | 0.94 | 0.88 | 0.68 | 0.88 |
| 0.35 | 0 | 0.7 | 0 | 0.7 | 0 | 0.7 | 0.82 | 0.68 | 0.42 | 0.70 |
| 0.75 | 0 | 0.7 | 0 | 0.7 | 0 | 0.7 | 0.36 | 0.20 | 0.08 | 0.25 |
| 0 | 0.9 | 0.9 | 0 | 0 | 0 | 0 | 0.94 | 0.94 | 0.67 | 0.94 |
| 0.35 | 0.9 | 0.9 | 0 | 0 | 0 | 0 | 0.84 | 0.84 | 0.42 | 0.83 |
| 0.75 | 0.9 | 0.9 | 0 | 0 | 0 | 0 | 0.37 | 0.37 | 0.05 | 0.38 |
| 0 | 0 | 0 | 0.9 | 0.9 | 0 | 0 | 0.95 | 0.71 | 0.68 | 0.74 |
| 0.35 | 0 | 0 | 0.9 | 0.9 | 0 | 0 | 0.85 | 0.43 | 0.41 | 0.49 |
| 0.75 | 0 | 0 | 0.9 | 0.9 | 0 | 0 | 0.37 | 0.05 | 0.05 | 0.09 |
| 0 | 0 | 0 | 0 | 0 | 0.9 | 0.9 | 0.95 | 0.90 | 0.69 | 0.92 |
| 0.35 | 0 | 0 | 0 | 0 | 0.9 | 0.9 | 0.85 | 0.72 | 0.42 | 0.78 |
| 0.75 | 0 | 0 | 0 | 0 | 0.9 | 0.9 | 0.38 | 0.16 | 0.05 | 0.30 |
Fig 2The distributions of phenotypes: FTND 1, FTND 2, …, FTND 6, FTND total, and FTND total (Binary).
FTND total (Binary) is derived from FTDN total according to whether FTND total score is less than 6 or not.
The correlations among the 6 FTND items.
| correlation | |||||
|---|---|---|---|---|---|
| 0.6815 | 0.6758 | 0.7528 | 0.6446 | 0.7770 | |
| 0.4579 | 0.5895 | 0.4403 | 0.6838 | ||
| 0.4822 | 0.6394 | 0.5294 | |||
| 0.4452 | 0.6702 | ||||
| 0.5110 |
Fig 3The Q-Q plots of observed p-values versus expected p-values based on the marginal tests.
Fig 4The Q-Q plot of observed p-values versus expected p-values based on the multivariate test.