| Literature DB >> 26509271 |
Hong-Hee Won1, Pradeep Natarajan1, Amanda Dobbyn2, Daniel M Jordan3, Panos Roussos4, Kasper Lage5, Soumya Raychaudhuri6, Eli Stahl7, Ron Do8.
Abstract
Large genome-wide association studies (GWAS) have identified many genetic loci associated with risk for myocardial infarction (MI) and coronary artery disease (CAD). Concurrently, efforts such as the National Institutes of Health (NIH) Roadmap Epigenomics Project and the Encyclopedia of DNA Elements (ENCODE) Consortium have provided unprecedented data on functional elements of the human genome. In the present study, we systematically investigate the biological link between genetic variants associated with this complex disease and their impacts on gene function. First, we examined the heritability of MI/CAD according to genomic compartments. We observed that single nucleotide polymorphisms (SNPs) residing within nearby regulatory regions show significant polygenicity and contribute between 59-71% of the heritability for MI/CAD. Second, we showed that the polygenicity and heritability explained by these SNPs are enriched in histone modification marks in specific cell types. Third, we found that a statistically higher number of 45 MI/CAD-associated SNPs that have been identified from large-scale GWAS studies reside within certain functional elements of the genome, particularly in active enhancer and promoter regions. Finally, we observed significant heterogeneity of this signal across cell types, with strong signals observed within adipose nuclei, as well as brain and spleen cell types. These results suggest that the genetic etiology of MI/CAD is largely explained by tissue-specific regulatory perturbation within the human genome.Entities:
Mesh:
Year: 2015 PMID: 26509271 PMCID: PMC4625039 DOI: 10.1371/journal.pgen.1005622
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Fig 1Contributions of three genomic compartments to the polygenicity of MI/CAD.
Polygenic risk score analysis was performed across three different genomic compartments. The top bar plot represents the strength of association for the polygenic risk score analysis whereas the bottom bar plot represents the number of SNPs within each of the compartments. The strongest polygenic association signals were within noncoding regions adjacent to protein-coding genes (“genic noncoding”). MI, myocardial infarction; CAD, coronary artery disease; SNP, single nucleotide polymorphism. Genic coding, variants that code amino acid sequence within ±10 kilobases of the 3′ or 5′ untranslated regions of a gene. Genic noncoding, variants that do not code amino acid sequence within ±10 kilobases of the 3′ or 5′ untranslated regions of a gene. Intergenic, variants that are beyond ±10 kilobases of the 3′ or 5′ untranslated regions of a gene.
Heritability of MI/CAD explained by three genomic compartment sets.
We calculated the SNP-heritability in three genomic compartment sets for MI/CAD in a meta-analysis of the MIGen and WTCCC CAD studies using the Genome-wide Complex Trait Analysis (GCTA) software. We observed increased enrichment in variance in both “genic coding” and “genic noncoding” regions.
| Genomic compartments | Variance | V-SE |
| Number of SNPs | % Variance of total | % SNPs of total | Enrichment of variance | Deviation from expected variance |
|---|---|---|---|---|---|---|---|---|
| Genic coding | 0.042 | 0.023 | 0.07 | 37,142 | 10.0 | 0.5 | 19.1 | 0.088 |
| Genic noncoding | 0.25 | 0.041 | 1×10−9 | 3,355,483 | 58.9 | 47.3 | 1.2 | 0.23 |
| Intergenic | 0.13 | 0.034 | 0.0001 | 3,703,319 | 31.1 | 52.2 | 0.6 | 0.0089 |
| Whole genome as sum | 0.42 | 7,095,944 | 100.0 | 100.0 | 1.0 |
Heritability estimates were inferred independently first in MIGen and WTCCC CAD from a single model involving three variance components (“genic coding”, “genic noncoding” and “intergenic”) using the GCTA software [22,23]. Heritability estimates shown here are from a meta-analysis of the Variance and standard error (V-SE) from these models using as weights the inverse variance from these models.
1Variance and V-SE are estimates from the ratio of genetic variance to phenotypic variance for the specified variance component whereas the P value (V-P) is from the likelihood ratio test of a reduce model with the specified genetic variance component dropped from the full model, from the restricted maximum likelihood method in the GCTA software [22,23].
2Enrichment of variance was calculated as the % variance of total divided by % SNPs of total. MI, myocardial infarction; CAD, coronary artery disease; SNP, single nucleotide polymorphism.
3 P value from difference in the observed variance minus the expected variance (variance of whole genome as sum multiplied by % SNPs of total). Genic coding, variants that code amino acid sequence within ±10 kilobases of the 3′ or 5′ untranslated regions of a gene. Genic noncoding, variants that do not code amino acid sequence within ±10 kilobases of the 3′ or 5′ untranslated regions of a gene. Intergenic, variants that are beyond ±10 kilobases of the 3′ or 5′ untranslated regions of a gene.
Fig 2Polygenic, enrichment and heritability analysis of three histone modification marks across cell types.
We performed three different analyses to test for cell type specific effects on the genetic risk for MI/CAD. Analyses were conducted on SNPs residing in the three histone marks (H3K27ac, H3K4me3, H3K9ac) that were present in the different cell types. (A) Polygenic risk score analysis. We performed polygenic risk score association analysis on SNPs with MIGen discovery association P<0.05. Negative logarithm of P values from association testing of the polygenic risk score performed in the WTCCC CAD was shown. Cell types were sorted based on the strength of polygenic association. Orange vertical line represents a significant level with 5% alpha error. (B) Enrichment of association. Enrichment analyses were performed by comparing the proportion of significant variants passing a specific association P threshold of a variant set with that of a baseline set. Different association P thresholds 5×10−7, 5×10−6, 5×10−5 from the CARDIoGRAM study were tested [24]. The variant sets in this analysis were SNPs in the specified histone marks that were present in the indicated cell type. For the baseline set, we test SNPs in regions that are outside of these histone marks within 10 kilobases (kb) of the protein coding regions of the genome. To reduce the effects of linkage disequilibrium, these baseline SNPs were selected to be 5 kb away from the histone marks. In the plot, each triangular point represents the strongest enrichment result for each mark in each cell type across the three possible association P thresholds. (C) Heritability analysis. Heritability analysis was performed within histone marks in the MIGen study. Each point in the plot represents the variance in liability generated from a joint model involving two variance components using the Genome-wide Complex Trait Analysis software [22,23]. The two variance components include 1) SNPs in the specified histone mark that was present in the indicated cell type and 2) all other SNPs outside of these regions. The variance in liability is an estimate from the ratio of genetic variance to phenotypic variance for the specified variance component (i.e. the specified variance component is all SNPs within the specified histone mark) whereas the P value is from the likelihood ratio test of a reduce model with the specified genetic variance component dropped from the full model, from the restricted maximum likelihood method in the Genome-wide Complex Trait Analysis software [22,23]. MI, myocardial infarction; CAD, coronary artery disease; SNP, single nucleotide polymorphism.
Fig 3Hierarchical clustering of 45 MI/CAD GWAS SNPs and specific cell types for a histone modification mark (H3K27ac).
We mapped 45 MI/CAD GWAS SNPs, as well as SNPs in high linkage disequilibrium (r 2≥0.8), to H3K27ac in different cell types. Hierarchical clustering was based on the presence or absence of a SNP residing in H3K27ac in different cell types and was performed using the heatmap function in R (R Project for Statistical Computing). We observed unique patterns between the different GWAS loci and cell types. For example, 12 of the 45 GWAS loci were expressed in more than 80% of the cell types, whereas 13 of the 45 GWAS loci were expressed in less than 20%. Red color indicates a lead SNP or tag SNPs (linkage disequilibrium value of r 2≥0.8) residing in H3K27ac in different cell types (See S9 and S10 Figs for H3K9ac and H3K4me3, respectively). MI, myocardial infarction; CAD, coronary artery disease; GWAS, genome-wide association study; SNP, single nucleotide polymorphism.
Significant enrichment of 45 MI/CAD-associated SNPs in specific cell types detected by enrichment analysis.
We examined whether 45 MI/CAD-associated loci were enriched in regions of inferred strong enhancer chromatin states [27] in specific cell types using NIH Roadmap data and two mammalian conservation algorithms, GERP and SiPhy-omega, implemented in HaploReg v2 [26]. We observed significant enrichment of MI/CAD-associated SNPs in specific cell types, including adipose nuclei, spleen and brain tissue. MI, myocardial infarction; CAD, coronary artery disease; SNP, single nucleotide polymorphism.
| Cell types | Observed SNPs | Expected SNPs | Fold |
| Corrected |
|---|---|---|---|---|---|
| Brain substantia nigra | 10 | 1.3 | 7.7 | 1.00×10−6 | 8.10×10−5 |
| Brain angular gyrus | 10 | 1.7 | 6.1 | 4.00×10−6 | 3.24×10−4 |
| Adipose nuclei | 8 | 1.3 | 6.2 | 3.80×10−5 | 3.08×10−3 |
| Induced pluripotent stem DF 19.11 cell line | 7 | 1 | 7.2 | 4.80×10−5 | 3.89×10−3 |
| Spleen | 8 | 1.4 | 5.7 | 6.80×10−5 | 5.51×10−3 |
| Brain anterior caudate | 8 | 1.5 | 5.3 | 1.08×10−4 | 8.75×10−3 |
| Brain cingulate gyrus | 8 | 1.5 | 5.3 | 1.14×10−4 | 9.23×10−3 |
| Embryonic stem cell line | 4 | 0.3 | 11.9 | 3.67×10−4 | 2.97×10−2 |
| Mobilized CD34 primary cells | 5 | 0.6 | 8.1 | 3.78×10−4 | 3.06×10−2 |
| Brain mid frontal lobe | 7 | 1.5 | 4.8 | 5.64×10−4 | 4.57×10−2 |
| Gastric | 6 | 1 | 5.8 | 5.71×10−4 | 4.63×10−2 |