| Literature DB >> 25033443 |
Jakris Eu-Ahsunthornwattana1, E Nancy Miller2, Michaela Fakiola2, Selma M B Jeronimo3, Jenefer M Blackwell4, Heather J Cordell5.
Abstract
Approaches based on linear mixed models (LMMs) have recently gained popularity for modelling population substructure and relatedness in genome-wide association studies. In the last few years, a bewildering variety of different LMM methods/software packages have been developed, but it is not always clear how (or indeed whether) any newly-proposed method differs from previously-proposed implementations. Here we compare the performance of several LMM approaches (and software implementations, including EMMAX, GenABEL, FaST-LMM, Mendel, GEMMA and MMM) via their application to a genome-wide association study of visceral leishmaniasis in 348 Brazilian families comprising 3626 individuals (1972 genotyped). The implementations differ in precise details of methodology implemented and through various user-chosen options such as the method and number of SNPs used to estimate the kinship (relatedness) matrix. We investigate sensitivity to these choices and the success (or otherwise) of the approaches in controlling the overall genome-wide error-rate for both real and simulated phenotypes. We compare the LMM results to those obtained using traditional family-based association tests (based on transmission of alleles within pedigrees) and to alternative approaches implemented in the software packages MQLS, ROADTRIPS and MASTOR. We find strong concordance between the results from different LMM approaches, and all are successful in controlling the genome-wide error rate (except for some approaches when applied naively to longitudinal data with many repeated measures). We also find high correlation between LMMs and alternative approaches (apart from transmission-based approaches when applied to SNPs with small or non-existent effects). We conclude that LMM approaches perform well in comparison to competing approaches. Given their strong concordance, in most applications, the choice of precise LMM implementation cannot be based on power/type I error considerations but must instead be based on considerations such as speed and ease-of-use.Entities:
Mesh:
Year: 2014 PMID: 25033443 PMCID: PMC4102448 DOI: 10.1371/journal.pgen.1004445
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Summary of methods/software packages investigated.
| Package/method and version | Approach | Kinship estimation method | Reference(s) |
| EMMAX emmax-intel-binary-20120210.tar.gz | LMM (approximate) | Kinship matrix estimated internally using user-supplied set of SNPs, or set to theoretical/estimated values calculated externally |
|
| FaST-LMM v2.04 | LMM (approximate or exact) | Kinship matrix estimated internally using user- supplied set of SNPs, using SNPs selected through FaST-LMM-Select procedure, or set to theoretical/estimated values calculated externally |
|
| GEMMA v0.91 | LMM (exact) | Kinship matrix estimated internally using user-supplied set of SNPs, or set to theoretical/estimated values calculated externally |
|
| GenABEL v1.7-6 (FASTA) | LMM (approximate) | Kinship matrix estimated internally using user-supplied set of SNPs, or set to theoretical/estimated values calculated externally |
|
| GenABEL v1.7-6 (Grammar-Gamma) | LMM (approximate) | Kinship matrix estimated internally using user-supplied set of SNPs, or set to theoretical/estimated values calculated externally |
|
| GTAM (implemented in MASTOR v0.3) | LMM (approximate) | Kinship matrix calculated externally (assumed to reflect ‘known’ (theoretical) pedigree relationships) |
|
| Mendel v13.2 | LMM (approximate or exact) | Kinship matrix estimated internally using theoretical pedigree relationships, estimated within estimated pedigree clusters (using all SNPs), or fully estimated (using all SNPs) |
|
| MMM v1.01 | LMM (approximate or exact) | Kinship matrix estimated internally using user-supplied set of SNPs, or set to theoretical/estimated values calculated externally |
|
| FBAT v2.0.4 | Transmission of alleles within pedigrees | Method by definition uses ‘known’ (theoretical) pedigree relationships |
|
| MASTOR v0.3 | Retrospective quantitative trait version of MQLS | Kinship matrix calculated externally (assumed to reflect ‘known’ (theoretical) pedigree relationships) |
|
| MQLS v1.5 | Adjusted version of retrospective case/control test | Kinship matrix calculated externally (assumed to reflect ‘known’ (theoretical) pedigree relationships) |
|
| ROADTRIPS v1.2 (RM test) | Adjusted version of retrospective case/control test | Kinship matrix calculated externally (assumed to reflect ‘known’ (theoretical) pedigree relationships). Further correction based on genome-wide set of SNPs applied internally. |
|
Figure 1Comparison of kinship estimates (pruned SNPs) using different software packages.
Plots above the diagonal show a comparison of kinship measures, with correlations between the kinship measures indicated below the diagonal. EM_BN = EMMAX (Balding-Nichols), EM_IBS = EMMAX (IBS method), FLMM_C = FaST-LMM using covariance matrix, FLMM_R = FaST-LMM using realised relationship matrix, GA = GenABEL, GMA_C = GEMMA using centred genotypes, GMA_S = GEMMA using standardised genotypes, KING_H = KING with homogeneous population assumption, KING_R = KING with robust estimation.
Figure 2Genomic control factors obtained using different software packages and different strategies for modelling kinships.
PLINK = analysis in PLINK with no adjustment made for relatedness. Other methods/software packages are listed in Table 1 (see Table 2 for abbreviated names of methods). Pedigree = theoretical kinships based on known pedigree relationships used to adjust for relatedness. Thinned = kinships based on 1900 ‘thinned’ SNPs used to adjust for relatedness. Pruned = kinships based on 50,129 ‘pruned’ SNPs used to adjust for relatedness. Full = kinships based on 545,433 SNPs used to adjust for relatedness.
Genomic control inflation factors achieved in real data or in a single replicate of the simulated data sets.
| Trait analysed | ||||||
| Method | Description | Kinships used | Real disease (VL) | Simulated strong (sim-D1) | Simulated weak (sim-D2) | Simulated quantitative (sim-Q) |
| Unadjusted | Standard linear or logistic regression | None | 1.23 | 1.12 | 1.04 | 1.43 |
| EM_BN | EMMAX (Balding-Nichols kinships) | Estimated | 0.99 | 0.99 | 1.00 | 0.99 |
| EM_IBS | EMMAX (IBS kinships) | Estimated | 0.99 | 0.99 | 1.00 | 1.00 |
| FLMM_A | FaST-LMM (approximate calculation) | Estimated | 0.99 | 0.99 | 1.00 | 1.00 |
| FLMM_E | FaST-LMM (exact calculation) | Estimated | 1.00 | 0.99 | 1.01 | 1.00 |
| GA_FA | GenABEL (FASTA) | Estimated | 0.99 | 0.99 | 1.00 | 0.99 |
| GA_GRG | GenABEL (GRAMMAR-Gamma) | Estimated | 0.99 | 0.99 | 1.00 | 1.00 |
| GMA_C | GEMMA using centred genotypes | Estimated | 1.00 | 0.99 | 1.01 | 1.00 |
| GMA_S | GEMMA using standardised genotypes | Estimated | 1.00 | 0.99 | 1.01 | 1.00 |
| GTAM | GTAM (implemented in MASTOR) | Pedigree | 1.20 | 1.00 | 0.99 | 0.99 |
| Mendel_T | Mendel with theoretical kinships | Pedigree | 1.11 | 1.00 | 0.99 | 0.99 |
| Mendel_P | Mendel with kinships estimated within estimated pedigree clusters | Estimated | 1.10 | 1.00 | 0.99 | 0.99 |
| Mendel | Mendel with fully estimated kinships | Estimated | 1.03 | 0.99 | 1.00 | 1.00 |
| MMM_E | MMM (exact calculation) | Estimated | 1.00 | 0.99 | 1.01 | 1.00 |
| MMM_G | MMM (GLS approximation) | Estimated | 0.99 | 0.99 | 1.00 | 0.99 |
| FBATaff | FBAT (transmissions to affecteds only) | Pedigree | 1.02 | 1.01 | 1.00 | – |
| FBATboth | FBAT (transmissions to all individuals) | Pedigree | 1.01 | 1.00 | 1.01 | 1.00 |
| MASTOR | MASTOR (implemented in MASTOR) | Pedigree | 1.15 | 1.00 | 0.99 | 0.99 |
| MQLS1972 | MQLS (using 1972 genotyped individuals) | Pedigree | 1.15 | 1.01 | 0.99 | – |
| MQLS3626 |
| Pedigree | 1.16 | – | – | – |
| MQLS1972_E | MQLS using 1972 genotyped individuals and estimated kinships | Estimated | 0.94 | 0.90 | 0.91 | – |
| RT1972 | ROADTRIPS (using 1972 genotyped individuals) | Pedigree & estimated | 1.00 | 1.00 | 0.99 | – |
| RT3626 | ROADTRIPS (using all 3626 individuals with or without genotype data) | Pedigree & estimated | 1.00 | – | – | – |
FBATaff, MQLS and ROADTRIPS are only applicable to binary traits and so do not have results in the ‘Simulated quantitative’ column.
In the simulated data sets, MQLS and RT could only be based on the 1972 individuals with simulated phenotypes, and so no simulated trait results are displayed in the MQLS3626 and RT3626 rows.
Figure 3Power and type 1 error of different methods.
Powers (left hand plots) are defined as the proportion of replicates (out of 1000) in which both simulated disease loci are detected, with ‘detection’ corresponding to any SNP within 40 kb of the simulated disease locus reaching the specified p-value threshold. Type 1 errors (right hand plots) are defined as the proportion of null SNPs (out of 20,000 = 20 null SNPs times 1000 simulation replicates) that reach the specified p-value threshold. Horizontal dashed lines indicate the target p-value thresholds (i.e. the expected type 1 error rates).
Figure 4Manhattan plots for the real phenotype using FaST-LMM exact and alternative software packages.
The points marked in red denote the confirmed significant region from Fakiola et al. (2013). FLMM_E = FaST-LMM using exact calculation, MQLS1972 = MQLS using 1972 genotyped individuals, RT1972 = ROADTRIPS using 1972 genotyped individuals, FBATaff = FBAT using transmissions to affecteds only, FBATboth = FBAT using transmissions to both affecteds and unaffecteds. Results from all other LMM methods were indistinguishable from FLMM_E and so are not shown.
Concordance between top SNPs identified by different methods.
| Mean (standard deviation) in 1000 replicates of proportion of top | ||||||
| Trait | Method |
|
|
|
|
|
| sim-D1 | Unadjusted | 0.991 (0.042) | 0.990 (0.030) | 0.981 (0.033) | 0.975 (0.032) | 0.973 (0.027) |
| EM_IBS | 0.999 (0.017) | 0.999 (0.009) | 0.997 (0.015) | 0.997 (0.013) | 0.996 (0.012) | |
| FLMM_A | 1.000 (0.009) | 1.000 (0.003) | 1.000 (0.007) | 1.000 (0.004) | 1.000 (0.003) | |
| FLMM_E | 0.998 (0.021) | 1.000 (0.005) | 0.999 (0.008) | 0.999 (0.005) | 1.000 (0.004) | |
| GA_FA | 0.998 (0.018) | 1.000 (0.005) | 0.999 (0.011) | 0.999 (0.008) | 0.998 (0.008) | |
| GA_GRG | 0.998 (0.021) | 0.999 (0.011) | 0.996 (0.017) | 0.998 (0.010) | 0.998 (0.008) | |
| GMA_C | 0.998 (0.021) | 1.000 (0.004) | 0.999 (0.009) | 0.999 (0.005) | 1.000 (0.004) | |
| GMA_S | 0.998 (0.021) | 1.000 (0.005) | 0.999 (0.008) | 0.999 (0.005) | 1.000 (0.004) | |
| GTAM | 0.998 (0.022) | 0.995 (0.022) | 0.990 (0.025) | 0.988 (0.022) | 0.987 (0.020) | |
| Mendel | 0.997 (0.025) | 0.996 (0.019) | 0.991 (0.024) | 0.989 (0.021) | 0.989 (0.018) | |
| MMM_E | 0.991 (0.041) | 1.000 (0.004) | 0.999 (0.009) | 0.999 (0.005) | 1.000 (0.004) | |
| MMM_G | 0.993 (0.036) | 1.000 (0.003) | 1.000 (0.007) | 1.000 (0.005) | 0.999 (0.005) | |
| FBATaff | 0.684 (0.253) | 0.790 (0.115) | 0.773 (0.090) | 0.771 (0.080) | 0.760 (0.072) | |
| FBATboth | 0.859 (0.130) | 0.844 (0.084) | 0.811 (0.078) | 0.795 (0.075) | 0.777 (0.071) | |
| MASTOR | 0.993 (0.038) | 0.994 (0.024) | 0.989 (0.027) | 0.985 (0.024) | 0.985 (0.022) | |
| MQLS | 0.978 (0.062) | 0.981 (0.040) | 0.960 (0.043) | 0.951 (0.041) | 0.941 (0.038) | |
| RT | 0.981 (0.059) | 0.984 (0.037) | 0.962 (0.042) | 0.952 (0.041) | 0.942 (0.038) | |
| sim-D2 | Unadjusted | 0.982 (0.060) | 0.984 (0.041) | 0.979 (0.039) | 0.974 (0.040) | 0.973 (0.036) |
| EM_IBS | 0.997 (0.029) | 0.997 (0.024) | 0.995 (0.025) | 0.994 (0.028) | 0.994 (0.024) | |
| FLMM_A | 0.998 (0.027) | 0.998 (0.024) | 0.997 (0.025) | 0.997 (0.029) | 0.997 (0.026) | |
| FLMM_E | 0.995 (0.035) | 0.997 (0.025) | 0.997 (0.025) | 0.996 (0.030) | 0.997 (0.026) | |
| GA_FA | 0.992 (0.044) | 0.998 (0.024) | 0.997 (0.026) | 0.996 (0.030) | 0.996 (0.026) | |
| GA_GRG | 0.994 (0.038) | 0.997 (0.026) | 0.996 (0.027) | 0.995 (0.030) | 0.996 (0.026) | |
| GMA_C | 0.995 (0.035) | 0.997 (0.025) | 0.997 (0.025) | 0.996 (0.030) | 0.997 (0.026) | |
| GMA_S | 0.995 (0.035) | 0.997 (0.025) | 0.997 (0.025) | 0.996 (0.030) | 0.997 (0.026) | |
| GTAM | 0.988 (0.050) | 0.990 (0.036) | 0.983 (0.037) | 0.982 (0.036) | 0.982 (0.032) | |
| Mendel | 0.988 (0.051) | 0.992 (0.033) | 0.986 (0.035) | 0.984 (0.036) | 0.987 (0.031) | |
| MMM_E | 0.995 (0.037) | 0.997 (0.025) | 0.997 (0.025) | 0.996 (0.030) | 0.997 (0.026) | |
| MMM_G | 0.998 (0.028) | 0.998 (0.024) | 0.997 (0.025) | 0.997 (0.029) | 0.997 (0.026) | |
| FBATaff | 0.413 (0.255) | 0.571 (0.201) | 0.614 (0.157) | 0.639 (0.128) | 0.651 (0.102) | |
| FBATboth | 0.664 (0.246) | 0.718 (0.146) | 0.699 (0.111) | 0.691 (0.099) | 0.686 (0.088) | |
| MASTOR | 0.971 (0.075) | 0.988 (0.038) | 0.981 (0.038) | 0.978 (0.039) | 0.979 (0.033) | |
| MQLS | 0.934 (0.107) | 0.962 (0.056) | 0.942 (0.053) | 0.928 (0.051) | 0.917 (0.047) | |
| RT | 0.943 (0.099) | 0.965 (0.055) | 0.943 (0.053) | 0.930 (0.052) | 0.919 (0.047) | |
| sim-Q | Unadjusted | 0.987 (0.049) | 0.983 (0.038) | 0.962 (0.040) | 0.963 (0.034) | 0.954 (0.033) |
| EM_IBS | 0.998 (0.020) | 0.998 (0.016) | 0.993 (0.020) | 0.994 (0.017) | 0.993 (0.015) | |
| FLMM_A | 1.000 (0.000) | 1.000 (0.000) | 1.000 (0.004) | 1.000 (0.005) | 1.000 (0.004) | |
| FLMM_E | 1.000 (0.009) | 0.999 (0.008) | 1.000 (0.005) | 1.000 (0.005) | 0.999 (0.005) | |
| GA_FA | 1.000 (0.006) | 0.999 (0.010) | 0.998 (0.010) | 0.998 (0.010) | 0.996 (0.012) | |
| GA_GRG | 0.994 (0.034) | 0.999 (0.010) | 0.995 (0.018) | 0.996 (0.014) | 0.996 (0.012) | |
| GMA_C | 1.000 (0.009) | 1.000 (0.007) | 1.000 (0.004) | 1.000 (0.004) | 1.000 (0.004) | |
| GMA_S | 1.000 (0.009) | 0.999 (0.008) | 1.000 (0.005) | 1.000 (0.005) | 0.999 (0.005) | |
| GTAM | 0.995 (0.032) | 0.991 (0.028) | 0.984 (0.030) | 0.985 (0.024) | 0.984 (0.022) | |
| Mendel | 0.998 (0.021) | 0.996 (0.020) | 0.987 (0.027) | 0.988 (0.022) | 0.988 (0.019) | |
| MMM_E | 0.899 (0.100) | 0.999 (0.008) | 1.000 (0.004) | 1.000 (0.004) | 1.000 (0.004) | |
| MMM_G | 0.903 (0.100) | 1.000 (0.003) | 1.000 (0.003) | 1.000 (0.004) | 1.000 (0.003) | |
| FBAT | 0.906 (0.101) | 0.896 (0.067) | 0.869 (0.059) | 0.844 (0.067) | 0.814 (0.066) | |
| MASTOR | 0.998 (0.020) | 0.992 (0.027) | 0.984 (0.030) | 0.984 (0.025) | 0.983 (0.023) | |
See Table 2 for description of methods.
Genomic control factors achieved in naive analysis of a single replicate of the simulated longitudinal data sets.
| Trait analysed | ||
| Method | Longitudinal (sim-L20) | Longitudinal polygenic (sim-P20) |
| Unadjusted | 20.82 | 21.53 |
| EM_BN | 1.01 | 1.01 |
| EM_IBS | 0.99 | 0.97 |
| FLMM_A | 1.01 | 1.01 |
| FLMM_E | 1.01 | 1.01 |
| GA_FA | 1.06 | 2.39 |
| GA_GRG | 0.66 | 0.47 |
| GMA_C | 1.01 | 1.01 |
| GMA_S | 1.01 | 1.01 |
| MMM_E | 1.01 | 3.52 |
| MMM_G | 1.01 | 3.52 |
See Table 2 for description of methods.