| Literature DB >> 27547214 |
Marijana Vujkovic1, Richard Aplenc1, Todd A Alonzo2, Alan S Gamis3, Yimei Li1.
Abstract
Regression analysis is commonly used in genome-wide association studies (GWAS) to test genotype-phenotype associations but restricts the phenotype to a single observation for each individual. There is an increasing need for analytic methods for longitudinally collected phenotype data. Several methods have been proposed to perform longitudinal GWAS for family-based studies but few methods are described for unrelated populations. We compared the performance of three statistical approaches for longitudinal GWAS in unrelated subjectes: (1) principal component-based generalized estimating equations (PC-GEE); (2) principal component-based linear mixed effects model (PC-LMEM); (3) kinship coefficient matrix-based linear mixed effects model (KIN-LMEM), in a study of single-nucleotide polymorphisms (SNPs) on the duration of 4 courses of chemotherapy in 624 unrelated children with de novo acute myeloid leukemia (AML) genotyped on the Illumina 2.5 M OmniQuad from the COG studies AAML0531 and AAML1031. In this study we observed an exaggerated type I error with PC-GEE in SNPs with minor allele frequencies < 0.05, wheras KIN-LMEM produces more than expected type II errors. PC-MEM showed balanced type I and type II errors for the observed vs. expected P-values in comparison to competing approaches. In general, a strong concordance was observed between the P-values with the different approaches, in particular among P < 0.01 where the between-method AUCs exceed 99%. PC-LMEM accounts for genetic relatedness and correlations among repeated phenotype measures, shows minimal genome-wide inflation of type I errors, and yields high power. We therefore recommend PC-LMEM as a robust analytic approach for GWAS of longitudinal data in unrelated populations.Entities:
Keywords: generalized estimating equations; genome wide association analysis; linear mixed effects model; longitudinal analysis; unrelated population
Year: 2016 PMID: 27547214 PMCID: PMC4974249 DOI: 10.3389/fgene.2016.00139
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Literature overview of longitudinal GWAS analysis.
| Sikorska et al., | Family | Linear | Time-varying | NA | MAR/MCAR | Reduced conditional LMEM + GWAS on random slopes | R |
| Choi et al., | Family | Binary | Stationary | Random effects | MAR | GEE, family-specific GLMEM | PLINK, SAS, R |
| Eu-Ahsunthornwattana et al., | Family, unrelated | Linear | Stationary | Variance-covariance structure | MAR | PHENO LIN REG, GWAS on residuals | PLINK, EMMAX, FaST-LMEM, GenABEL |
| Hossain and Beyene, | Family | Linear | Stationary | Variance-covariance structure | MAR | LMEM | R |
| Musolf et al., | Family, unrelated | Linear | Time-varying | (Q)TDT | NA | Cluster analysis, GWAS | SAS, TDT-HET |
| Tan et al., | Family | Linear | Stationary | Kinship matrix | NA | Two-level hierarchical linear model with random coefficients | |
| Vaitsiakhovich et al., | Unrelated | Delta | Stationary | NA | MAR, MCAR, MNAR + imputation | GWAS on mean change of imputed data | INTERSNP |
| Wang et al., | Family | Linear | Stationary | NA | MAR + imputation | LMEM and a two-stage approach: random intercept model + GWAS | R, PLINK |
| Xia and Lin, | Family | Binary | Stationary | Inbreeding coefficient in Bayesian | MAR | Logistic Bayesian LASSO, B-spline, partial GLMEM | R, Hapassoc |
| Furlotte et al., | Unrelated | Linear | NA | Variance-covariance structure | NA | Modified LMEM | R |
| Chang et al., | NA | Binary | Time-varying | Random effects | NA | GMM | SAS |
| Fradin and Fallin, | Family | Binary | Stationary | NA | NA | LOG REG, conditional on risk set | SAS |
| Kerner et al., | Family | Linear | Stationary | Random effects | NA | GMM, QTL analysis | Mplus, Goldenhelix |
| Luan et al., | Family | Linear | Stationary | Random effects | NA | LMEM | Stata, SAS |
| Park et al., | Family | Binary | Stationary | Pedigree membership as covariate | NA | GEE | SAS |
| Roslin et al., | Unrelated | Linear | Stationary | NA | NA | MLGM, LIN REG | Mplus, PLINK |
| Yan et al., | Unrelated | Binary | Stationary | NA | NA | PAF | SAS |
| Zhu et al., | Family | Linear | Time-varying | Sibship group membership | NA | MASAL | MASAL |
| Aulchenko et al., | Family | Linear | NA | Random effects | NA | GWAS on LMEM phenotype residuals | GRAMMAR |
GEE, generalized estimating equation; GLMEM, generalized linear mixed model; GRAMMAR, genome-wide rapid association using mixed model and regression; GWAS, genome-wide association studies; HWE, Hardy-Weinberg equilibrium; LASSO, least absolute shrinkage and selection operator; LMEM, linear mixed model; MAF, minor allele frequency; MCAR, missing completely at random; MGR, missing genotype rate; MNAR, missing not at random; PH, proportional hazard; SNP, single-nucleotide polymorphism; GMM, growth mixture modeling; QTL, quantitative trait locus; LMEM, linear mixed modeling; GEE, generalized estimating equations; MLGM, multivariate linear growth modeling; PAF, population attributable risk fraction; MASAL, multivariate adaptive splines for the analysis of longitudinal data.
Figure 1Intra- and inter-individual variability in chemotherapy course length. Each line represents a patient's median and interquartile range (IQR), sorted from the lowest to the highest median value. The overall median chemotherapy course length is 36 days (IQR 32–42 days). Course length observations less than −1.5*IQR below 25th percentile or more than 1.5*IQR above the 75th percentile are considered to be outliers and are shown as isolated points.
Figure 2Pairwise PC plots within 624 Caucasian patients with AML.
Figure 3Results of longitudinal GWAS analyses on three different methods. (A) PC-LMEM: MAF > 0.01; (B) PC-GEE: MAF > 0.01; (C) KIN-LMEM: MAF > 0.01.
Figure 4Q-Q plots of . (A) PC-LMEM: MAF > 0.01; (B) PC-GEE: MAF > 0.01; (C) KIN-LMEM: MAF > 0.01.
Figure 5Between method accuracy and rare variant distribution across . The 6 off-diagonal ROC curves represent the accuracy of each method predicting different P-value percentiles of competing methods (P < 0.5, P < 0.1, P < 0.01, P < 0.001, and P < 0.0001). The diagonals show the relative distribution of rare SNPs (MAF < 0.05) for various P-value cut-offs of each method (e.g., P > 0.1, 0.01 < P < 0.1, 0.001 < P < 0.01, 0.0001 < P < 0.001, and P < 0.0001).
Figure 6Results of longitudinal GWAS analyses on three different methods where MAF > 0.05. SNPs with MAF < 0.05 have been excluded. (A) PC-LMEM: MAF > 0.05; (B) PC-GEE: MAF > 0.05; (C) KIN-LMEM: MAF > 0.05.
Figure 7Q-Q plots of . SNPs with MAF < 0.05 have been excluded. (A) PC-LMEM: MAF > 0.05; (B) PC-GEE: MAF > 0.05; (C) KIN-LMEM: MAF > 0.05.