| Literature DB >> 25885710 |
Christiaan A de Leeuw1, Joris M Mooij2, Tom Heskes3, Danielle Posthuma4.
Abstract
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.Entities:
Mesh:
Year: 2015 PMID: 25885710 PMCID: PMC4401657 DOI: 10.1371/journal.pcbi.1004219
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Overview of Crohn’s Disease analyses.
| Name | Analysis | Input | Settings |
|---|---|---|---|
| MAGMA-main | gene, self-cont., comp. | Raw data | Multiple regression model (per gene) |
| MAGMA-mean | gene | Raw data | Mean SNP |
| MAGMA-top | gene | Raw data | Top SNP |
| MAGMA-pval | gene | SNP p-values, HapMap data | Mean SNP |
| MAGMA-pval-1K | gene, self-cont., comp. | SNP p-values, 1,000 Genomes data | Mean SNP |
| VEGAS | gene | SNP p-values, HapMap data | Mean SNP |
| PLINK-avg | gene, self-contained | Raw data | Mean SNP |
| PLINK-prune | gene, self-contained | Raw data | Mean SNP |
| PLINK-top | gene | Raw data | Top SNP |
| ALIGATOR | competitive | SNP p-values | 4 SNP p-value cut-offs |
| INRICH | competitive | SNP p-values | 4 SNP p-value cut-offs |
| MAGENTA | competitive | SNP p-values | 2 gene score quantile cut-offs |
Number of significant genes at different p-value thresholds.
| P-value threshold | ||||||
|---|---|---|---|---|---|---|
| Method | 0.05 | 0.01 | 0.001 | 0.0001 | Bonf. | Total genes |
|
| ||||||
| MAGMA-main | 1203 | 379 | 95 | 32 | 10 | 13172 |
| MAGMA-mean | 917 | 250 | 70 | 16 | 5 | 13172 |
| MAGMA-top | 934 | 244 | 61 | 16 | 5 | 13172 |
| MAGMA-pval | 927 | 241 | 64 | 16 | 5 | 12797 |
| MAGMA-pval-1K | 901 | 245 | 61 | 13 | 5 | 13075 |
| PLINK-avg | 944 | 239 | 56 | 16 | 4 | 13172 |
| PLINK-top | 903 | 242 | 64 | 13 | 5 | 13172 |
| PLINK-prune | 973 | 257 | 58 | 16 | 4 | 13172 |
| VEGAS | 915 | 225 | 61 | 17 | 6 | 12455 |
|
| ||||||
| MAGMA-main | 1141 | 352 | 89 | 28 | 8 | 13172 |
| MAGMA-mean | 897 | 240 | 62 | 14 | 4 | 13172 |
| MAGMA-top | 934 | 230 | 63 | 12 | 4 | 13172 |
|
| ||||||
| MAGMA-main | 1611 | 505 | 126 | 45 | 13 | 16970 |
| MAGMA-mean | 1215 | 377 | 97 | 25 | 7 | 16970 |
| MAGMA-top | 1247 | 337 | 89 | 16 | 8 | 16970 |
‘Total genes’ gives the number of genes analysed. This was lower for the summary statistics analyses because some genes contained no SNPs present in both CD data and reference data and because VEGAS does not analyse the X chromosome. As such, those genes effectively have a p-value of 1 by default. For permutation-based methods, p-values were based on up to 1,000,000 permutations. No stratification correction was used in the analyses except the three under the ‘Strat. Correction’ header.
Fig 1Comparison of gene analysis results for different test-statistics.
Gene -log10 p-values from the CD data gene analysis in MAGMA for three different gene test-statistics, comparing analyses using (A) the mean χ 2 statistic with the top χ 2 statistic, (B) the mean χ 2 statistic and the PC regression model and (C) the top χ 2 statistic and the PC regression model. P-values below 10–8 are truncated to 10–8 (grey points) to preserve the visibility of the other points.
Number of significant gene sets at different p-value thresholds.
| P-value threshold | |||||
|---|---|---|---|---|---|
| Method | 0.05 | 0.01 | 0.001 | FWER | Tested gene sets |
|
| |||||
| MAGMA-main | 448 | 253 | 120 | 39 | 1320 |
| MAGMA-pval-1K | 257 | 108 | 28 | 4 | 1320 |
| PLINK-avg | 329 | 160 | 67 | 19 | 1320 |
| PLINK-prune | 361 | 181 | 86 | 27 | 1320 |
|
| |||||
| MAGMA-main | 85 | 25 | 9 | 1 | 1320 |
| MAGMA-main (no size correction) | 105 | 33 | 9 | 3 | 1320 |
| MAGMA-pval-1K | 80 | 11 | 3 | 1 | 1320 |
| ALIGATOR (cut-off = 0.01) | 94 | 38 | 12 | 0 | 653 |
| ALIGATOR (cut-off = 0.005) | 85 | 23 | 7 | 0 | 508 |
| ALIGATOR (cut-off = 0.001) | 59 | 34 | 10 | 0 | 149 |
| ALIGATOR (cut-off = 0.0001) | 28 | 24 | 6 | 0 | 35 |
| INRICH (cut-off = 0.01) | 79 | 22 | 3 | 0 | 777 |
| INRICH (cut-off = 0.005) | 74 | 23 | 7 | 0 | 602 |
| INRICH (cut-off = 0.001) | 66 | 39 | 15 | 0 | 213 |
| INRICH (cut-off = 0.0001) | 41 | 22 | 8 | 3 | 57 |
| MAGENTA (cut-off = 5th quant.) | 83 | 20 | 4 | 0 | 952 |
| MAGENTA (cut-off = 1st quant.) | 50 | 25 | 6 | 0 | 389 |
The FWER column corresponds to p-values below 0.05 after family-wise error correction, using Bonferroni correction for MAGMA, PLINK and MAGENTA and built-in FWER methods for INRICH and ALIGATOR. The ‘Tested gene sets’ column shows the number of gene sets for which p-values were computed, which were lower for INRICH, ALIGATOR and MAGENTA because some gene sets contained insufficiently many SNPs/intervals/genes with p-value below the chosen cut-off. Note that such gene sets do remain part of the analysis and count towards the total number of tests conducted, their p-values are effectively set to 1.
a in this analysis the default correction for gene size and gene density was turned off
Fig 2Comparison of self-contained gene-set analysis results.
Gene set—log10 p-values from the CD data self-contained gene-set analysis for MAGMA and PLINK. Panel (A) shows the PLINK-avg (no pruning) results compared with the MAGMA-main analysis, panel (B) the PLINK-prune results compared with the MAGMA-main analysis and (C) the two PLINK analyses compared to each other. P-values below 10–8 are truncated to 10–8 (grey points) to preserve the visibility of the other points.
Competitive gene-set p-values for MAGMA and INRICH significant gene-sets.
| MAGMA-main | MAGMA-pval | INRICH | |||
|---|---|---|---|---|---|
| Gene-set | Size correction | No correction | Cut-off = 0.0001 | Cut-off = 0.01 | |
| Regulation of AMPK activity via LKB1 |
|
| 0.059 | 1 | 0.37 |
| ECM receptor interaction | 0.000094 |
| 0.00052 | 1 | 0.08 |
| Cell adhesion molecules | 0.0001 |
| 0.012 | 1 | 0.11 |
| Cytokine receptor interaction | 0.004 | 0.01 |
| 0.0007 | 0.091 |
| TCR calcium pathway | 0.034 | 0.024 | 0.11 |
| 0.074 |
| NKT pathway | 0.052 | 0.073 | 0.034 |
| 0.0022 |
| IL27 pathway | 0.3 | 0.36 | 0.22 |
| 0.123 |
Significant p-values are highlighted in bold. MAGMA p-values compared against a Bonferroni-corrected threshold of 0.05/1320 = 0.000038. For INRICH, corrected p-values (not shown) are compared against a threshold of 0.05; corrected p-value for all three significant gene-sets is 0.049.
a p-values were not computed because fewer than two genes in the set overlapped with an associated interval; p-values are therefore effectively equal to 1
Fig 3Comparison of competitive gene-set analysis results.
Gene set -log10 p-values from the CD data competitive gene-set analysis for MAGMA, ALIGATOR, INRICH and MAGENTA. Results for ALIGATOR and INRICH are shown for each for the SNP p-value cutoff that yielded the highest observed power (0.01 and 0.0001 respectively), MAGENTA at the advised 5th percentile cutoff. P-values for gene sets not evaluated by one of the methods are shown in grey. The shown correlations are for the -log10 p-values for gene-sets evaluated by both methods.
Fig 4Comparison of competitive gene-set analysis results at different SNP cut-offs.
Comparison of gene set -log10 p-values from the CD data competitive gene-set analysis at different SNP p-value cut-offs for ALIGATOR (top row), INRICH (middle row) and MAGENTA (bottom row). The highest cut-off on the horizontal axis is compared to each of the lower cut-offs. P-values for gene sets not evaluated at the lower cut-off are shown in grey. The shown correlations are for the -log10 p-values for gene-sets evaluated at both cut-offs. Horizontal and vertical grey dotted lines demarcate the p = 0.05 nominal significance threshold.
Computation times for gene and gene-set analyses.
| Method | Computation time | Factor | Type |
|---|---|---|---|
|
| |||
| MAGMA-main | 00:00:44 | 1 | Raw data |
| MAGMA-mean | 00:01:00 | 1.4 | Raw data |
| MAGMA-top | 00:25:18 | 34.5 | Raw data |
| MAGMA-pval | 00:00:10 | 0.3 | Summary |
| MAGMA-pval-1K | 00:00:54 | 1.2 | Summary |
| PLINK-avg | 11:35:05 | 947.8 | Raw data |
| PLINK-prune | 08:55:13 | 729.8 | Raw data |
| PLINK-top | 10:59:26 | 899.2 | Raw data |
| VEGAS | 03:14:05 | 264.7 | Summary |
| MAGMA-main (10 covariates) | 00:00:58 | 1.3 | Raw data |
| PLINK-avg (1 covariate) | 160:39:03 | 13144.2 | Raw data |
| PLINK-avg (10 covariates) | > 857:54:57 | > 70193.1 | Raw data |
|
| |||
| MAGMA-main | 00:01:56 | 1 | Raw data |
| MAGMA-pval-1K | 00:01:09 | 0.6 | Summary |
| PLINK-avg | 44:20:40 | 1376.2 | Raw data |
| PLINK-prune | 62:35:24 | 1942.4 | Raw data |
| ALIGATOR total (4 cut-offs) | 02:37:11 | 81.3 | Summary |
| Cut-off = 0.01 | 01:23:15 | 43.1 | Summary |
| Cut-off = 0.0001 | 00:07:54 | 4.1 | Summary |
| INRICH total (4 cut-offs) | 01:09:22 | 35.9 | Summary |
| Cut-off = 0.01 | 00:33:41 | 17.4 | Summary |
| Cut-off = 0.0001 | 00:05:16 | 2.7 | Summary |
| MAGENTA | 00:24:35 | 12.7 | Summary |
‘Factor’ indicates the increase in computation time relative to MAGMA-main. MAGMA computation times for gene-set analysis include both self-contained and competitive tests. All analyses were run on the same system.
a up to 100,000 permutations
b up to 10,000 permutations
c covariates are PCs used for stratification correction
d 1,000 permutations
e did not complete
f 5,000 permutations, 1,000 replications
g 10,000 replicates, 10,000 bootstraps