| Literature DB >> 20714348 |
Ayellet V Segrè1, Leif Groop, Vamsi K Mootha, Mark J Daly, David Altshuler.
Abstract
Mitochondrial dysfunction has been observed in skeletal muscle of people with diabetes and insulin-resistant individuals. Furthermore, inherited mutations in mitochondrial DNA can cause a rare form of diabetes. However, it is unclear whether mitochondrial dysfunction is a primary cause of the common form of diabetes. To date, common genetic variants robustly associated with type 2 diabetes (T2D) are not known to affect mitochondrial function. One possibility is that multiple mitochondrial genes contain modest genetic effects that collectively influence T2D risk. To test this hypothesis we developed a method named Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA; http://www.broadinstitute.org/mpg/magenta). MAGENTA, in analogy to Gene Set Enrichment Analysis, tests whether sets of functionally related genes are enriched for associations with a polygenic disease or trait. MAGENTA was specifically designed to exploit the statistical power of large genome-wide association (GWA) study meta-analyses whose individual genotypes are not available. This is achieved by combining variant association p-values into gene scores and then correcting for confounders, such as gene size, variant number, and linkage disequilibrium properties. Using simulations, we determined the range of parameters for which MAGENTA can detect associations likely missed by single-marker analysis. We verified MAGENTA's performance on empirical data by identifying known relevant pathways in lipid and lipoprotein GWA meta-analyses. We then tested our mitochondrial hypothesis by applying MAGENTA to three gene sets: nuclear regulators of mitochondrial genes, oxidative phosphorylation genes, and approximately 1,000 nuclear-encoded mitochondrial genes. The analysis was performed using the most recent T2D GWA meta-analysis of 47,117 people and meta-analyses of seven diabetes-related glycemic traits (up to 46,186 non-diabetic individuals). This well-powered analysis found no significant enrichment of associations to T2D or any of the glycemic traits in any of the gene sets tested. These results suggest that common variants affecting nuclear-encoded mitochondrial genes have at most a small genetic contribution to T2D susceptibility.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20714348 PMCID: PMC2920848 DOI: 10.1371/journal.pgen.1001058
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Description of Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA) method.
(A) Step 1: Map genetic variants and their association scores onto genes. MAGENTA uses as input the association z-scores or p-values of DNA sequence variants across the entire genome. In this work, we used association p-values of single-nucleotide polymorphisms, SNPs (circles) from a genome-wide association study or meta-analysis, denoted as for SNP i. Gene boundaries (vertical dashed lines) are defined here as predetermined physical distances added upstream and downstream to the most extreme transcript start and end sites of the gene (red arrow), respectively. Linkage-based distances can also be used. Each gene is assigned a set of SNPs that fall in its gene region boundaries. Two genes are shown for simplicity. (B) Step 2: Score genes based on their local SNP . Here the most significant of all SNPs i that lie within the extended gene boundaries is assigned to each gene g in the genome (). (C) Step 3: Correct for confounding effects on the gene score, in the absence of genotype data. In this study we used step-wise multivariate linear regression analysis to regress out of the confounding effects of several physical and genetic properties of genes (listed in Table 1); refers to the corrected gene p-value for gene g. In cases where two genes are assigned the same best SNP p-value, tends to be more significant for small genes than for large genes. (D) Step 4: Calculate a gene set enrichment p-value for each biological pathway or gene set of interest. We used a non-parametric statistical test to test whether for all genes in gene set gs are enriched for highly ranked gene scores more than would be expected by chance, compared to randomly sampled gene sets of identical size from the genome. refers to the nominal gene set enrichment p-value for gene set gs.
Correlation between type 2 diabetes gene association scores and potential gene score confounders.
| Mean across 1,000 permuted DGI GWA datasets | DGI GWA study | |||
| Gene property | Correlation with | Correlation with | Correlation with | Correlation with |
| Gene size, kilobase (kb) | 0.26 | 0.25 | −0.03 | 0.01 |
| # SNPs per kb | 0.38 | 0.39 | −0.05 | −0.02 |
| # independent SNPs per kb | 0.32 | 0.31 | −0.07 | −0.001 |
| # recombination hotspots per kb | 0.17 | 0.14 | −0.04 | 0.01 |
| Linkage disequilibrium units per kb | 0.22 | 0.19 | −0.06 | 0.02 |
| Genetic distance, centi-Morgan per kb | 0.19 | 0.16 | −0.05 | 0.03 |
Pearson's correlation coefficients were calculated between , or and six different physical and genetic properties of genes. is a vector of the unadjusted best SNP per gene z-scores for all genes in the genome, is a vector of corrected gene z-scores using regression analysis for all genes, and is a vector of corrected gene z-scores using phenotype permutation analysis for all genes. This was computed for 1,000 phenotype permutation data sets of the Diabetes Genetics Initiative (DGI) GWA study and the actual DGI GWA study. Aside for gene size, all gene properties were converted to per kilobase (kb) units for each gene by dividing by gene region size using the extended physical boundaries. All correlations between and the six variables were statistically significant (mean p<2e-70 across 1,000 DGI permutations and p<1e-74 for the actual DGI study). Similar correlations were obtained for the five latter variables in Table 1 before normalizing to gene region size (data not shown).
†: These gene properties were significant in almost all 1,000 DGI GWA permutations tested under a step-wise multivariate linear regression model of regressed against the six gene properties (see Table S3).
*The linkage disequilibrium units per kb variable was significant under the regression model for about half of the permutations tested (Table S3).
Figure 2Regression analysis corrects for majority of confounding effects on gene association scores in a genotype-independent manner.
The performance of a step-wise regression analysis approach in correcting for confounders on was evaluated against permutation analysis correction, since the latter corrects for all confounders without requiring a priori knowledge of them. T2D gene association p-values were plotted for all genes g in the genome (A) before gene score adjustment () and (B) after correction for confounders using regression analysis (), as a function of corrected gene p-values using phenotype permutation analysis (). The Diabetes Genetics Initiative (DGI) GWA study was used for the analysis, since we had access to all individuals' genotypes. is the association p-value of the best regional SNP for gene g before correction (y-axis in A). To compute (y-axis in B), step-wise multivariate linear regression analysis was applied to against the first four confounders listed in Table 1 (this approach does not require genotype data). The Pearson's correlation coefficient (calculated between p-value vectors before log transformation) increased significantly following the regression-based correction (from r = 0.69 to r = 0.95). The spread around the diagonal (red line) also decreased following the regression correction (from a coefficient of variation (mean/std) of 1.13 to 0.56). The minimum is 10−4 as the p-values were calculated based on 1,000 permutations for genes with , and 10,000 permutations for genes with . Some of the variation in the low p-value tail is due to having done only 10,000 permutations (), and some to limitations of the linear regression method. Note that the four dots in (A) with contain ten overlapping dots that refer to four sets of 2–3 genes, each set assigned the same . Gene association p-values are plotted on a −log10(p-value) scale.
Figure 3Estimating power of the GSEA algorithm in MAGENTA using computer simulations.
We used simulations to assess the power (sensitivity) of the gene set enrichment analysis (GSEA) algorithm in MAGENTA to detect enrichment of genes with modest effect sizes that are hard to detect with single SNP analysis. Power is plotted as a function of fraction (A) or number (B) of causal genes of modest effect in gene sets of 25 (triangles), 100 (squares), or 1,000 (circles) genes. The modest effect size spiked into genes is equivalent to 1% power of detecting an association at genome-wide significance using single SNP analysis. A total of 100 causal genes in the genome were assumed here. Randomized vectors from case/control permutations of the DGI study were used as the background association values. Simulations were repeated 1,000 times for each unique set of parameters. Power was calculated as the fraction of times the simulated gene set received a <0.01. For specificity estimations we used SNPs with no effect size, sampled from a null distribution that assumes no association. The false positive rate of the method (1-specificity) was comparable to the p-value cutoff used (0.3–1.7%). Note the x-axis in both panels is on a log10 scale.
Top GSEA results for lipid-related pathways using LDL cholesterol, HDL cholesterol, and triglyceride GWA meta-analyses.
| Database | Gene set | # genes analyzed by GSEA | Nominal | Nominal | Genes near validated lipid SNPs |
|
| |||||
| GO, BP | LIPID TRANSPORT | 27 | 0.0001* | 0.0352 | APOE, LDLR |
| GO, BP | LIPID HOMEOSTASIS | 14 | 0.0005* | 0.0204 | APOE, PCSK9 |
| GO, BP | LIPOPROTEIN METABOLIC PROCESS | 31 | 0.0010* | 0.0038 | LDLR |
| GO, BP | LIPID METABOLIC PROCESS | 291 | 0.0013* | 0.0046 | APOC1, APOC2, APOC4, LDLR |
| GO, BP | FATTY ACID METABOLIC PROCESS | 58 | 0.0019* | 0.0024 | - |
| GO, BP | LIPID CATABOLIC PROCESS | 36 | 0.0079 | 0.0078 | - |
| GO, MF | LIPID TRANSPORTER ACTIVITY | 27 | 0.0090 | 0.0352 | APOC4 |
| GO, MF | LIPOPROTEIN BINDING | 18 | 0.0106 | 0.0466 | LDLR |
| PANTHER | FATTY ACID METABOLISM | 88 | 0.0120 | 0.0112 | - |
| GO, BP | REGULATION OF LIPID METABOLIC PROCESS | 11 | 0.0140 | 0.0143 | - |
|
| |||||
| GO, BP | TRIACYLGLYCEROL METABOLIC PROCESS | 9 | 1e-6* | 8.3e-5* | APOC3, CETP, LPL, APOA5 |
| GO, BP | LIPID TRANSPORT | 27 | 1e-6* | 0.0023 | ABCA1, APOA1, APOA4, APOC3, CETP, LCAT |
| GO, MF | LIPID BINDING | 79 | 1.8e-5* | 0.0036* | APOA1, APOA4, CETP, APOA5 |
| GO, BP | LIPID HOMEOSTASIS | 14 | 1e-5* | 0.0012* | ABCA1, APOA1, APOA4, CETP, LCAT |
| GO, MF | PHOSPHOLIPID BINDING | 43 | 2.8e-5* | 0.012 | APOA1, APOA4, CETP, APOA5 |
| PANTHER | LIPID AND FATTY ACID TRANSPORT | 99 | 4e-5* | 0.0162 | ABCA1, APOA1, APOA4, APOC3, CETP, PLTP, APOA5 |
| GO, BP | LIPID METABOLIC PROCESS | 287 | 6e-5* | 0.0179 | APOA1, APOA4, APOA5, APOC3, CETP, HNF4A, LCAT, FADS1, FADS2, LPL, MVK, PLTP |
| GO, BP | CELLULAR LIPID METABOLIC PROCESS | 229 | 0.0003* | 0.0548 | APOA1, APOC3, CETP, LCAT, FADS1, LPL |
| GO, MF | STEROL BINDING | 9 | 0.0004* | 0.0435 | APOA1, CETP |
| GO, BP | LIPID CATABOLIC PROCESS | 36 | 0.0006* | 0.0068 | APOA4, APOA5 |
| GO, BP | CELLULAR LIPID CATABOLIC PROCESS | 33 | 0.005 | 0.0206 | APOA5 |
| GO, BP | LIPID BIOSYNTHETIC PROCESS | 87 | 0.0110 | 0.2327 | APOA1, LCAT, FADS1, FADS2, MVK |
|
| |||||
| GO, BP | LIPID HOMEOSTASIS | 14 | 0.0001* | 0.0974 | APOA1, APOA4, ANGPTL3 |
| GO, BP | TRIACYLGLYCEROL METABOLIC PROCESS | 9 | 0.0008* | 0.307 | APOC3, LPL, APOA5 |
| GO, MF | LIPID TRANSPORTER ACTIVITY | 25 | 0.0012* | 0.3238 | APOA1, APOA4 |
| GO, BP | LIPID TRANSPORT | 26 | 0.0023 | 0.3154 | APOA1, APOC3, ANGPTL3, APOA4 |
| GO, BP | LIPOPROTEIN METABOLIC PROCESS | 31 | 0.0044 | 0.4123 | APOA1, APOA4, ANGPTL3 |
| GO, BP | PHOSPHOLIPID METABOLIC PROCESS | 69 | 0.0081 | 0.0061 | APOA1, FADS1, LPL |
| GO, BP | LIPID CATABOLIC PROCESS | 36 | 0.0083 | 0.0811 | APOA4, APOA5, ANGPTL3 |
| GO, BP | GLYCEROPHOSPHOLIPID METABOLIC PROCESS | 42 | 0.0149 | 0.0036 | APOA1 |
The most significant lipid-related biological gene sets with a gene set enrichment p-value of <0.015 are presented using GWA meta-analyses of LDL cholesterol, HDL cholesterol and triglyceride blood levels across a total of 19,840 individuals. Complete results for all 51 lipoprotein and lipid related pathways are presented in Tables S5, S6, S7. GSEA p-values marked with an asterisk are significant under a conservative Bonferroni correction (each database was corrected separately due to considerable overlap between gene sets across the different databases). The number of genes per gene set analyzed with MAGENTA in column three is after removing genes without SNPs in their extended gene boundaries and after adjusting for chromosomal proximity between subsets of genes in a gene set (see Materials and Methods). The fifth column contains GSEA p-values following exclusion of genes near validated SNPs for the relevant lipid trait (19 genes for LDL cholesterol, 20 genes for HDL cholesterol and 19 genes for triglyceride levels; taken from Table 2 in [34]). The sixth column lists all genes near validated lipid SNPs (as of [34]) that fall in a given gene set, including the genes removed due to adjustment for physical proximity in the genome. GO stands for Gene Ontology, BP for Biological Process, and MF for Molecular Function.
Mitochondria-related gene sets are not enriched for associations with type 2 diabetes.
| Gene set | Total # genes | # genes without SNPs in vicinity | # genes removed due to physical clustering in genome* | Effective # genes | Nominal |
| Nuclear regulators of mitochondrial genes | 16 | 0 | 0 | 16 | 0.1889 |
| Oxidative phosphorylation genes | 91 | 0 | 0 | 91 | 0.4722 |
| Nuclear-encoded mitochondrial genes | 966 | 11 | 70 | 885 | 0.9125 |
is the nominal gene set enrichment p-value for a given gene set gs, calculated here using the DIAGRAM+ T2D GWA study meta-analysis and an enrichment cutoff that equals the 95th percentile of all gene p-values, .
‡: The effective number of genes is the number of genes analyzed after removing genes with no SNPs in their extended gene boundaries, and after correcting for chromosomal clustering of subsets of genes in a gene set, i.e. removing all but one gene of each subset of genes assigned the same best local SNP p-value (*).
Mitochondria-related gene sets are not enriched for associations with type 2 diabetes-related glycemic traits.
| Glycemic trait | Nuclear-encoded mitochondrial genes | OXPHOS genes | Nuclear regulators of mitochondrial genes |
| Fasting glucose | 0.1255 | 0.8354 | 0.5568 |
| Fasting insulin | 0.2489 | 0.9490 | 0.1878 |
| 2 hour glucose | 0.3026 | 0.6696 | 1.0000 |
| 2 hour insulin | 0.2900 | 0.9462 | 1.0000 |
| HOMA-IR | 0.6567 | 0.9429 | 0.1855 |
| HOMA-B | 0.7678 | 0.8375 | 0.5661 |
| HbA1c | 0.0179 | 0.9901 | 1.0000 |
is the nominal gene set enrichment p-value for gene set gs computed for each glycemic trait separately. The enrichment cutoff calculated for each phenotype is the 95th percentile of all gene p-values computed from the corresponding GWA study meta-analysis. HOMA-IR is an index for insulin resistance, HOMA-B is an index for ß-cell function, and HbA1c represents glycated hemoglobin concentrations, which is a measure of long-term plasma glucose concentrations.
‡: Not significant after Bonferroni correction (most stringent cutoff p<0.002 given 3 gene sets and 8 traits; a less stringent cutoff, p<0.0083 correcting for 3 gene sets and 2 traits due to correlation between the glucose and insulin-related traits).