Ruth McPherson1. 1. From the Division of Cardiology, Atherogenomics Laboratory, Ruddy Canadian Cardiovascular Genetics Centre, University of Ottawa Heart Institute, ON, Canada.
Abstract
Recent studies have led to a broader understanding of the genetic architecture of coronary artery disease and demonstrate that it largely derives from the cumulative effect of multiple common risk alleles individually of small effect size rather than rare variants with large effects on coronary artery disease risk. The tools applied include genome-wide association studies encompassing over 200 000 individuals complemented by bioinformatic approaches including imputation from whole-genome data sets, expression quantitative trait loci analyses, and interrogation of ENCODE (Encyclopedia of DNA Elements), Roadmap Epigenetic Project, and other data sets. Over 160 genome-wide significant loci associated with coronary artery disease risk have been identified using the genome-wide association studies approach, 90% of which are situated in intergenic regions. Here, I will describe, in part, our research over the last decade performed in collaboration with a series of bright trainees and an extensive number of groups and individuals around the world as it applies to our understanding of the genetic basis of this complex disease. These studies include computational approaches to better understand missing heritability and identify causal pathways, experimental approaches, and progress in understanding at the molecular level the function of the multiple risk loci identified and potential applications of these genomic data in clinical medicine and drug discovery.
Recent studies have led to a broader understanding of the genetic architecture of coronary artery disease and demonstrate that it largely derives from the cumulative effect of multiple common risk alleles individually of small effect size rather than rare variants with large effects on coronary artery disease risk. The tools applied include genome-wide association studies encompassing over 200 000 individuals complemented by bioinformatic approaches including imputation from whole-genome data sets, expression quantitative trait loci analyses, and interrogation of ENCODE (Encyclopedia of DNA Elements), Roadmap Epigenetic Project, and other data sets. Over 160 genome-wide significant loci associated with coronary artery disease risk have been identified using the genome-wide association studies approach, 90% of which are situated in intergenic regions. Here, I will describe, in part, our research over the last decade performed in collaboration with a series of bright trainees and an extensive number of groups and individuals around the world as it applies to our understanding of the genetic basis of this complex disease. These studies include computational approaches to better understand missing heritability and identify causal pathways, experimental approaches, and progress in understanding at the molecular level the function of the multiple risk loci identified and potential applications of these genomic data in clinical medicine and drug discovery.
Genetic risk for coronary artery disease is largely due to multiple common variants of small effect size, interacting in pathways and networks, many related to immune function, cell proliferation, interactions at the arterial wall.The majority of the over 160 genome-wide association studies–identified loci are in noncoding regions of the genome, implying local or long-distance effects on the expression of genes via a number of mechanisms or on their proximal regulators including microRNAs or long noncoding RNAs. Functional characterization of each of these loci remains challenging.The vast amount of available genome-wide association studies data can be used to create robust polygenetic risk scores to identify otherwise healthy individuals at high future risk for coronary artery disease.It is a great honor to be chosen as the first Canadian to present the George Lyman Duff Memorial Lecture established in 1956 by the Society for the Study of Arteriosclerosis. Dr Duff was one of Canada’s most distinguished pathologists whose studies of experimental atherosclerosis in the rabbit provided a foundation for years of work that followed. Here, I will describe, in part, our research over the last decade performed in collaboration with a series of bright trainees and an extensive number of groups and individuals around the world as it applies to our understanding of the genetic basis of this complex disease.
Genetics of Coronary Artery Disease
Rare coding variants, primarily in genes regulating LDL (low-density lipoprotein) cholesterol (LDL-C) metabolism, for example, LDLR, PCSK9, and APOB, contribute significantly to premature coronary artery disease (CAD) risk and the importance of early diagnosis and treatment of familial hypercholesterolemia is well established. However, CAD is a complex phenotype, largely driven by the cumulative effect of multiple common variants affecting numerous biological processes, and the majority of individuals with premature CAD do not harbor a causative mutation in a single gene.Recent progress in understanding the genetics of CAD and other complex disease has been driven by technological advances including high-throughput DNA microarray technology using chips containing up to a million DNA markers consisting of single-nucleotide polymorphisms (SNPs) that tag common variation (allele frequency ≥5% of individuals) across the human genome. Of note, these are tag SNPs that point to a causative locus, but are rarely themselves functional variants. This approach makes use of linkage disequilibrium (LD) that is the nonrandom coinheritance of genetic variants across the human genome. Comparison of the allele frequency of each SNP in cases and controls provides an agnostic approach with discovery potential, unlike the candidate gene studies of the previous century. Given that comparison of a million markers implies a P of ≤5×10−8 for genome-wide association studies (GWAS) significance, success has been contingent on the allele frequency and effect size of a given variant and requires large meta-analyses involving thousands of carefully phenotyped CAD cases and controls and collaboration among many groups across the world.Using this approach, concurrently with 2 other groups, we identified the first robust association with CAD in a large LD block containing multiple highly correlated SNPs at 9p21.3.[1-3] The early discovery of this risk locus was facilitated by its large effect size (>1.3) and high-risk allele frequency (≈0.48). Approximately 25% of Europeans carry 2 copies of the risk allele and have a 40% increased risk of CAD in general and a 2-fold risk of premature CAD. The estimated population attributable risk is 10% to 15% of CAD incidence in the United States, making it the largest known genomic contributor to direct and indirect healthcare costs.[4] For our discovery sample, we applied an extreme phenotype approach comparing young, multivessel CADpatients to elderly, physically active, asymptomatic subjects. We demonstrated that 9p21.3 as a particularly strong locus for early disease as later confirmed by the CARDIoGRAM study (Coronary Artery Disease Genome-Wide Replication and Meta-Analysis) where the allele-specific odds ratio for CAD in subjects with onset before 50 years of age was 1.41 versus 1.24 for later onset CAD.[5] The 9p21 locus associates with the extent and severity of atherosclerosis.[6,7] It has also been consistently shown that the risk conferred by this locus is independent of known CAD risk factors and associates with other vascular phenotypes including carotid atherosclerosis,[8] stroke,[9] peripheral arterial disease,[10] abdominal aortic, and intracranial aneurysms,[11] highlighting possible effects on vascular wall integrity. A recent study by the GENIUS-CHD (Genetics of Subsequent Coronary Heart Disease) consortium, albeit a survivors’ cohort, reveals that this locus is not associated with recurrent events in patients with established CAD. It does, however, confer an increased risk for revascularization,[12] consistent with a primary role in atherosclerosis.As of 2019, over 160 GWAS-significant loci for CAD have been identified, with close to twice this number when false discovery rate <0.05 variants are included.[13-16] This success has been driven by collaboration among numerous groups across the world, important initiatives such as the UK Biobank and the ability to use whole genome data from the 1000 Genomes Project to impute lower frequency variants. The 1000 Genomes Project phase 1, v3 data set includes over 38 million variants, half of which have a minor allele frequency (MAF) <0.005, as well as insertions and deletions (indels). These data were leveraged in a meta-analysis led by groups at Oxford, the Broad and the Ottawa Heart Institute, that included over 185 000 CAD cases and controls from 48 studies providing imputed data on 9.4 million SNPs including 2.7 million low-frequency SNPs (MAF, 0.005–0.05).[16]Several of these loci contain multiple independent signals. With the exception of a few including 9p21.3 and LPA, the effect size is generally ≤1.08. Given that the heritability of CAD is ≈50% and conventional risk factors are moderately predictive of atherosclerotic burden,[17] it is not surprising that the majority of novel loci do not relate to lipoprotein metabolism, hypertension, or glycemic traits. Indeed, multiple biological processes are highlighted (Figure 1).
Figure 1.
Genome-wide association studies (GWAS)–identified coronary artery disease (CAD) loci indicate a role for multiple biological processes including lipid metabolism, inflammation and innate immunity, thrombosis, vascular remodeling and extracellular matrix integrity, and NO signaling. For many novel GWAS signals, the link between the associated gene(s) and processes underlying CAD risk remain unknown. ECM indicates extracellular matrix; LDL, low-density lipoprotein; LP(a), lipoprotein (a); and TG-rich LP, triglyceride-rich lipoproteins.
Genome-wide association studies (GWAS)–identified coronary artery disease (CAD) loci indicate a role for multiple biological processes including lipid metabolism, inflammation and innate immunity, thrombosis, vascular remodeling and extracellular matrix integrity, and NO signaling. For many novel GWAS signals, the link between the associated gene(s) and processes underlying CAD risk remain unknown. ECM indicates extracellular matrix; LDL, low-density lipoprotein; LP(a), lipoprotein (a); and TG-rich LP, triglyceride-rich lipoproteins.Individually, several GWAS findings provide new understanding of processes related to atherosclerosis or its metabolic precedents. The poster child for this approach was the identification of a 1p13 locus encompassing CELSR2/PSRC1/SORT1 for LDL-C and CAD that prompted a series of elegant functional studies leading to a previously unknown role for sortilin-1 in lipoprotein metabolism.[18]The discovery of the 9p21.3 risk locus has been instrumental to our understanding of vascular smooth muscle cell (VSMC) behavior in atherosclerosis. The risk haplotype consists of ≈60 SNPs in perfect LD, some of which overlap the terminal exons of a long noncoding RNA, ANRIL/CDKN2B-AS1. Both long (19 exons) and short (13 exons) transcripts of ANRIL are identified by RNA sequencing. In an earlier study, we demonstrated increased expression of short variants of ANRIL and increased expression of the long variant in carriers of the risk alleles. Relevant to atherosclerosis, genome-wide expression profiling demonstrated upregulation of gene sets modulating cellular proliferation in carriers of the risk allele.[19] In a recent elegant study, Lo Sardo et al[20] generated multiple hiPSC (human induced pluripotent stem cells) clones from homozygous carriers of the risk (RR) and nonrisk (NN) alleles and then used TALENs (transcription activator like effector nucleases) to delete the 60-kb haplotype in each of the hiPSC clones. Whole-transcriptome analysis during VSMC development showed that deletion of the risk haplotype in RR clones resulted in transcriptional profiles similar to that of the NN clones. Thus the RR haplotype essentially confers a dedifferentiated state with more proliferative and less contractile VSMCs. The RR VSMCs exhibit globally altered transcriptional networks that encompass a number of GWAS-identified CAD genes converging on aberrant adhesion, contraction, and proliferation pathways. The risk haplotype is associated with preferential expression of the shorter ANRIL isoform that appears to mediate, in part, these phenotypic and functional changes.[20]Identification of SNPs at 8q24 associated with plasma lipoproteins and hepatic fat downstream from the TRIB1 gene uncovered an unknown role for tribbles-1 in hepatic lipid metabolism—an effect that may be mediated by a long noncoding RNA that we have named TRIBAL.[21] Seven independent signals at the COL4A1/4A2 locus have highlighted the role of the extracellular matrix in atherosclerosis.[22]
Integrative Analyses to Extend GWAS Findings
In recent studies, we have mined additional data sets to extend GWAS findings. Complex phenotypes including atherosclerosis and its clinical sequelae derive from multiple nonlinear and interactive processes. Thus, genetic associations with disease can only be understood by the integration of multiple layers of information and the knowledge that many genetic effects are isolated in time and place that is susceptible to epigenetic effects that take place during a particular phase of development or cell cycle and are specific to a particular cell type or tissue.[23]The simplest level of integration includes pairwise quantitative trait loci (QTL) analysis of mRNA (expression QTL [eQTL]), miRNA-eQTL (miRQTL), protein eQTL, etc. An eQTL effect, that is, an effect on mRNA expression on a nearby (cis-eQTL) or distant (trans-eQTL) gene, provides strong evidence for a functional effect of a given SNP or one in close LD. Whereas some eQTLs are shared among tissues, others are tissue specific and interrogation of eQTL databases, for example, genotype-tissue expression requires careful consideration of the cell type or tissue of interest.[24] Finally, as noted above, the direct contribution of any given locus may be temporally restricted; for instance, early transient activation followed by sustained epigenetic effects.
Gene-Environment Interaction
Identifying gene-environment interactions can further our understanding of the architecture and underlying biology of complex traits.[25] A difference in the effect size of a genetic variant in individuals differing in an environmental or clinical exposure suggests the presence of a gene-environment interaction, rather than simply additive effects of the 2 exposures.[26] Unfortunately, the ability to measure accurately environmental exposures is a limiting factor. However, there has been limited success. Saleheen et al[27] demonstrated that the CAD-protective effect of a GWAS that lowers expression of ADAMTS7 is attenuated in smokers—an effect attributed to their higher vascular ADAMTS7 expression. More recently, Hindy et al[28] showed highly significant interaction of smoking with a polygenic risk score (PRS) for CAD with a marked exacerbation of risk in current and to a lesser extent, past smokers. We utilized a weighted PRS constructed from loci previously reported by the Global Lipids Genetics Consortium and demonstrated, in a large population of lean and obese individuals, a significant interaction between obesity status and the effect of risk alleles on each of triglycerides and HDL (high-density lipoprotein) cholesterol but not LDL-C. For plasma triglycerides, the effect size (β) of a weighted PRSTG in the obese population was nearly double that of the lean population (0.48 versus 0.26).[29]
Pathway Analysis
Clinically informative polymorphisms related to atherosclerosis occur in systems of closely interacting genes.[30] Thus, weakly associated variants that do not reach significance in GWAS may provide important information on its biological basis when such variants cluster within a common functional module or pathway.[31] Thus, beyond single SNP interrogation of GWAS analysis, novel insights into the genetic architecture of CAD can be obtained by interpreting genetic findings in the context of biological processes and functional interactions among genes. We performed a 2-stage pathway-based gene set enrichment analysis of 16 GWAS data sets for CAD, comprised of over 25 000 CADpatients and >66 000 controls[31] using the i-GSEA4GWAS tool[32] and the Reactome pathway database.[33] A total of 32 Reactome pathways demonstrated convincing association with CAD (Figure 2). These resided in core biological processes and included pathways relevant to innate immunity, extracellular matrix integrity, axon guidance, PDGF (platelet-derived growth factor) signaling, NOTCH, and the TGFβ (transforming growth factor-β)/SMAD receptor complex, confirming and adding to a substantial body of basic research.
Figure 2.
Major biological processes highlighted by pathway analysis of genome-wide association studies including 29 391 coronary artery disease cases and 66 819 controls.[ Significant Reactome pathways are indicated in bold. c-Myc indicates c-master regulator of cell cycle entry and proliferative metabolism; HDL, high-density lipoprotein; Hes-1, hairy and enhancer of split-1; IFN-β, interferon-beta; TGFβ, transforming growth factor-β; and VSMC, vascular smooth muscle cell.
Major biological processes highlighted by pathway analysis of genome-wide association studies including 29 391 coronary artery disease cases and 66 819 controls.[ Significant Reactome pathways are indicated in bold. c-Myc indicates c-master regulator of cell cycle entry and proliferative metabolism; HDL, high-density lipoprotein; Hes-1, hairy and enhancer of split-1; IFN-β, interferon-beta; TGFβ, transforming growth factor-β; and VSMC, vascular smooth muscle cell.To extend this analysis, we then interrogated molecular networks based on interactions among diverse molecules including genes, proteins, and metabolites that underlie the complex architecture of complex disease.[34] As described,[31] network analysis of unique genes within the replicated pathways further confirmed known processes such as lipid metabolism but also revealed a number of interconnected functional and topologically interacting modules representing novel associations. These included the semaphorin-regulated axonal guidance pathway—a known modulator of cellular adhesion, migration, proliferation, differentiation, survival, and synaptic plasticity, processes involving highly conserved families of guidance molecules, including netrins, slits, semaphorins, and ephrins, and their cognate receptors.[35] Network centrality analysis further identified genes (eg, NCAM1, FYN, and FURIN) that may be central to the maintenance and functioning of several of the replicated pathways.[31]
Heritability of CAD
Decades of epidemiological research have identified multiple psychosocial and lifestyle variables associated with CAD risk. These risk factors are partially modifiable, account for approximately half of the burden of CAD risk in Western countries, and are used clinically in the application of imperfect risk scores to estimate individual risk. The other half of CAD susceptibility is heritable, that is due to genetic and possibly epigenetic factors. This is termed the broad-sense heritability (Η2) of CAD, and family-based and twin-pair studies estimate Η2 to be 40% to 60%. The analysis of the contribution of GWAS findings commonly focuses on the extent to which they explain narrow-sense heritability (h), that is fraction of risk estimated to be genetic in pathogenesis.[36]One of the challenges in understanding the genetics of CAD and other complex traits is the discrepancy between the fraction of CAD that is estimated to be heritable and that explained by GWAS-identified variants. The 58 GWAS-significant loci identified in our 2015 article explained only ≈11% of h for CAD. Beyond considering only GWAS-significant loci (P≤5×10−8), a study also leveraging UK Biobank data and using a definition of CAD that included angina, 304 independent variants associated at 5% false discovery rate were identified that together explain 21.2% of estimated CAD heritability (h).[14] The phenomenon of missing heritability is attributed to the notion that much of the heritability of a complex trait lies in low-frequency SNPs with extremely low effect size that are not detectable at current sample sizes. Beyond considering only GWAS or false discovery rate–significant associations, a mega-analysis comprising 1 745 180 genetic variants across the genome increased this number to 26.8%.[37]
Computational Approaches to the Understanding of Missing Heritability
To address this issue, we and others have used mixed linear models (MLMs). Unlike the univariate GWAS approach, MLMs quantify the overall contribution of SNPs to variation of a trait without testing the SNPs individually or setting a significance threshold level. MLMs can decompose the phenotypic variance into genetic and environmental components and estimate the heritability of a trait or the genetic correlations between 2 groups. Because these methodologies require knowledge of the genetic relatedness of individuals, individual genome-wide genotypes are required rather than the summary data provided by many GWAS centers. MLMs also allow investigation of heritability enrichment in a subset of SNPs defined according to functional annotation, providing the opportunity to leverage annotation information and genome-wide genotype data to gain novel insights into the biology of complex disease.Exploiting the 1000 Genomes imputed data, we used the MLM approach implemented in genome-wide complex trait analysis to further investigate the genetic basis of CAD.[38] We included only genetically unrelated cases and controls, genotyped at this site, and calculated the variance explained by genome-wide autosomal SNPs. The variance estimate derives from the average genome-wide similarity between all pairs of individuals determined using all SNPs. Heritability is estimated when case-case pairs and control-control pairs are on average more similar across the genome than case-control pairs. We then partitioned this heritability by sex, gene modules, and SNP annotation. Using over 3 million autosomal genome-wide SNPs (MAF≥0.01) and MLMs in a sample of genetically unrelated 4535 cases and 2977 controls, we reported that genome-wide SNPs explain 22% of liability to CAD (55% of narrow-sense heritability). Consistent with our earlier pathway analysis,[31] we identified a number of CAD-associated modules related to immune responses, cell adhesion, and proliferation. Of note, genes involved in inflammation accounted for one-fifth of SNP-based heritability. As expected, given that 90% of identified CAD loci are in noncoding regions of the genome, heritability-enrichment analysis highlighted epigenetic sites associated with transcriptional activity.Taken together, these recent studies interrogating pathways and networks derived from a large body of GWAS data[31,38,39] confirm an important role for immune-mediated processes in CAD. At a single-gene level, there are robust data for a direct role of the interleukin (IL)-6 signaling pathway[40,41] in atherosclerosis, consistent with clinical trial evidence for a protective effect of IL-6 inhibition using an IL-1β monoclonal antibody, canakinumab.[42]
Identification of GWAS Signal for microRNAs Regulating Cardiometabolic Phenotypes
GWAS signals for a complex trait commonly associate with expression of a protein-coding gene (eQTLs). We queried whether pleiotropic effects associated with several GWAS loci may be due to effects on expression of microRNAs. Leveraging miRNA sequencing data and the 1000 Genomes imputed genotypes, we explored this question in a sample of 710 unrelated subjects of European ancestry, using data from the Framingham and the Geuvadis studies to replicate our findings.[43] At least 1 GWAS-significant miRQTL was identified for 143 circulating miRNAs. Although the majority accounted for a small portion (<1%), a few explained 4% to 20% of variation in plasma miRNA levels. Identified cis-miRQTLs associated with their counterpart mature miRNAs (P<0.0001) suggesting that they mainly regulate the expression of primary miRNAs unlike the trans-miRQTLs that appear to exert their effect through processes that affect the stability of mature miRNAs. We then used the identified miRQTLs to investigate the links between circulating miRNAs and plasma biomarkers through Mendelian randomization (MR) analysis. We found miR-1908-5p plays an important role in regulating LDL-C levels and glycemic traits, whereas miR-10b-5p mediates the trans-regulatory effect of the ABO locus on several plasma proteins, LDL-C, and on the risk for CAD. We also demonstrated a causal relationship between plasma levels of miR-199a and LDL-C (Figure 3).[43] This analysis provides new insights relevant to lipid metabolism and is a focus of ongoing functional studies in our laboratory.
Figure 3.
miRNA expression quantitative trait loci (eQTL) data were integrated with summary statistics from previous genome-wide association studies to identify single-nucleotide polymorphisms (SNPs) for which the effect on blood phenotypes is mediated by miRNAs. An SNP name indicates its rsID-effect allele. At the FADS locus, miR-1908—an eQTL for rs174548—plays an important role in regulating LDL (low-density lipoprotein) cholesterol (LDL-C), fasting blood glucose (FBG), and A1c. At the TMED locus on chromosome 11, miR-199-3p/5p—an eQTL for rs3760781—regulates LDL-C levels. Positive associations are indicated by solid red lines, negative associations by dotted green lines. Summary-based Mendelian randomization differentiates between a pleiotropic effect and a linkage effect.[43]
miRNA expression quantitative trait loci (eQTL) data were integrated with summary statistics from previous genome-wide association studies to identify single-nucleotide polymorphisms (SNPs) for which the effect on blood phenotypes is mediated by miRNAs. An SNP name indicates its rsID-effect allele. At the FADS locus, miR-1908—an eQTL for rs174548—plays an important role in regulating LDL (low-density lipoprotein) cholesterol (LDL-C), fasting blood glucose (FBG), and A1c. At the TMED locus on chromosome 11, miR-199-3p/5p—an eQTL for rs3760781—regulates LDL-C levels. Positive associations are indicated by solid red lines, negative associations by dotted green lines. Summary-based Mendelian randomization differentiates between a pleiotropic effect and a linkage effect.[43]
Interrogation of Rare Coding Variants Contributing to CAD
The large body of GWAS data strongly supports the conclusion that genetic susceptibility to CAD is largely derived from the effect of multiple common SNPs individually of small effect size.[23] GWAS is limited by the inability to reliably interrogate rare variants (MAF<0.005), that is present in <1 in 200 individuals, including nonsynonymous mutations in genes of possible relevance to CAD. This approach can, however, be successful in genetic isolates. In a GWAS that included only 809 Amish participants, Shuldiner et al identified an 11q23 SNP, 823 kb away from the APOA1/C3/A4/A5 gene cluster that associated with markedly reduced plasma TG and protection. Resequencing of the region identified the causal variant, a null mutation R19X in APOC3, extremely rare in the general population, but with an MAF of 0.05 in the Amish.[44]Whole-exome sequencing is a comprehensive approach to identify and classify coding variants linked to human disease and is of particular importance in Mendelian disease where the effect size is high and affected and unaffected family members are available for study.[45] Musunuru et al[46] performed whole-exome sequencing in a large kindred previously reported to have low levels of both triglycerides and LDL-C without evidence of atherosclerosis to uncover the protective effects of ANGPTL3 deficiency, now a novel and important drug target.[47,48]There has been a rapid increase in the number and scale of large human sequencing projects. These include DiscovEHR,[49] UK BioBank,[50] ExAC/gnomAD,[51] and others. With reference to CAD per se, the MIGEN (Myocardial Infarction Genetics Consortium) Exome Sequencing Project reported on whole-exome sequencing efforts in ≈11 000 individuals. Of note, the identified coding variants were limited to genes relevant to lipoprotein metabolism including LDLR,[52]
APOA5,[52]
APOC3,[53] and NPC1L1[54] and for which there was a large body of preexisting information. In a more recent MIGEN study that included validation in the Geisinger Health System DiscovEHR cohort, rare damaging mutations in LPL were associated with CAD. Overall, these findings were somewhat disappointing but did provide further credence to the relevance of triglyceride metabolism to atherosclerosis[55,56] and the validity of NPC1L1 as a drug target.[57,58] Given the perceived need for large sample sizes and subsequent replication, the sample included CAD cases with limited phenotypes such as acute MI without angiographic documentation of disease burden and controls that may have harbored a burden of nonobstructive atherosclerosis. The challenges inherent in the demonstration of cosegregation in family members, often with a multitude of other CAD risk factors, also limit the ability to confirm novel mutations linked to atherosclerosis.
Somatic Mutations
In a novel extension to our understanding of the molecular pathogenesis of CAD, Jaiswal et al[59] have recently described an age-related disorder characterized by the acquisition of somatic mutations in hematopoietic cells. This phenomenon termed clonal hematopoiesis of indeterminate potential allows selective expansion of mutated clones in stem cells, lymphocytes, monocytes, and granulocytes over time. Clonal hematopoiesis of indeterminate potential is detectable in ≈10% of individuals over the age of 70 years and confers an increased risk for hematologic malignancies but also CAD. Increased CAD risk varies from 2-fold for individuals with a mean age of 60 years to 4-fold for those under the age of 50 years. Four of the most commonly mutated genes, DNMT3A, TET2, ASXL1, and JAK, showed individually significant associations with CAD. This study that included 8255 participants from 4 case-control cohorts adds credence to the role of inflammation in atherosclerosis. DNMT3A, TET2, and JAK have reported anti-inflammatory functions. Mice transplanted with Tet2 heterozygous or homozygous deficient bone marrow exhibited increased expression of several CXC chemokines including Cxcl1, Cxcl2, Cxcl3, Pf4, and Ppbp and proinflammatory cytokine genes including Il6 and Il1b and accelerated atherosclerosis. The JAK2 V617F mutation was associated with a 12.1-fold CAD risk.[59] Relevant to this finding, a JAK inhibitor tofacitinib is approved for rheumatoid arthritis and is currently under investigation for other immune-mediated diseases. The overall contribution of clonal hematopoiesis of indeterminate potential to age-related atherosclerosis remains to be fully understood but is likely to be significant.
From Locus to Function
Although GWASs have identified >150 loci associated with CAD, it has been a much more tedious process to understand their function, the majority of which are in noncoding regions of the genome. Over 90% of GWAS-identified loci for CAD are situated in intergenic regions. Few are nonsynonymous coding variants with predicted effects on protein structure or function, implying, as supported by our computational studies, local or long-distance effects on the expression of genes via a number of mechanisms or on their proximal regulators including microRNAs or long noncoding RNAs.Our program has included studies on the TRIB1 locus for plasma triglycerides and CAD[21,60-62] and has a particular focus on the functional interrogation of novel GWAS loci for CAD that converge on VSMC and macrophage biology including an lncRNA, CDKN2BAS at the 9p21.3 locus,[19]
SMAD3,[63]
TGFB1, MFBG8,[64] and a locus encompassing the COL4A1/4A2/CARS2[22] genes that harbors 7 independent signals for CAD.[16]
Interrogation of Noncoding Polymorphisms
The GWAS-identified SNPs are rarely causal but rather in LD with a neighboring or even distal causal polymorphism. The standard approach is to first visualize the various LD blocks in the closest gene using Haploview software and linkage data from the 1000 Genomes Browser. We then seek possible causative SNP(s) by interrogating large-scale consortium data (ENCODE [Encyclopedia of DNA Elements], Roadmap Epigenomics) for chromatin marks identified by chromatin immunoprecipitation sequencing to denote histone modifications associated with promoters, primed and active enhancer regions, transcribed genes, and transcription factor binding. This is supplemented with information in other publically available data bases (RegulomeDB). Evidence for a cis or trans effect on gene expression (genotype tissue expression, SCAN [SNP and CNV Annotation Database]) is particularly informative where eQTL data are available on relevant tissues including liver and artery (STARNET [Stockholm-Tartu Atherosclerosis Reverse Network Engineering Task]). Drs U. Hedin and L. Folkersen (BiKE [Biobank of Karolinska Endarterectomies] and ASAP [Advanced Study of Aortic Pathology]) have provided us with eQTL data in normal and diseased arterial wall.[10] Finally, we use extensive laboratory analysis, for example, enhancer and promoter assays, allele-specific chromatin immunoprecipitation, allele-specific expression analyses, CRISPRCas9 gene editing,[65] in iPS cells and cell types relevant to atherosclerosis to identify and molecularly characterize the functional variant underlying the GWAS signal (Figure 4).
Figure 4.
From locus to function.
A, Typical Manhattan plot of a significant locus for coronary artery disease identified in a meta–genome-wide association studies (GWAS). B, Visualization of the various linkage disequilibrium (LD) blocks in the region using Haploview and 1000 Genomes LDL (low-density lipoprotein) data. C, UCSC genome browser annotation and ENCODE (Encyclopedia of DNA Elements) data for the region bearing the lead single-nucleotide polymorphisms (SNPs), including chromatin regulatory features, histone modifications, and predicted transcription factor binding from chromatin immunoprecipitation (ChIP) sequencing data. D, Interrogation of publically available databases can provide further information relevant to function. E, Finally, extensive laboratory analyses are necessary to identify and molecularly characterize the functional variant underlying the GWAS signal. 3-D indicates 3-dimensional; CRISPR-CAS9, clustered regularly interspaced short palindromic repeats/clustered regularly interspaced short palindromic repeat–associated 9; EMSA, electro-mobility shift assay; eQTL, expression quantitative trait loci; and UCSC, University of California, Santa Cruz.
From locus to function.
A, Typical Manhattan plot of a significant locus for coronary artery disease identified in a meta–genome-wide association studies (GWAS). B, Visualization of the various linkage disequilibrium (LD) blocks in the region using Haploview and 1000 Genomes LDL (low-density lipoprotein) data. C, UCSC genome browser annotation and ENCODE (Encyclopedia of DNA Elements) data for the region bearing the lead single-nucleotide polymorphisms (SNPs), including chromatin regulatory features, histone modifications, and predicted transcription factor binding from chromatin immunoprecipitation (ChIP) sequencing data. D, Interrogation of publically available databases can provide further information relevant to function. E, Finally, extensive laboratory analyses are necessary to identify and molecularly characterize the functional variant underlying the GWAS signal. 3-D indicates 3-dimensional; CRISPR-CAS9, clustered regularly interspaced short palindromic repeats/clustered regularly interspaced short palindromic repeat–associated 9; EMSA, electro-mobility shift assay; eQTL, expression quantitative trait loci; and UCSC, University of California, Santa Cruz.As an example of this approach, our GWAS meta-analysis identified intronic SNP in SMAD3, rs56062135C>T with the minor allele (T) associating with protection from CAD.[16] This locus is of interest because SMAD3 is a key contributor to the TGFβ pathway signaling. We then sought to identify causal CAD-associated SNPs at the SMAD3 locus and unravel mechanisms underlying the association with CAD. Although there are other weaker CAD signals at the SMAD3 locus, in our conditional and joint association analysis, only rs56062135 remained significant. This SNP lies within a haplotype block, containing 5 other intronic SNPs. Genetic and epigenetic fine mapping highlighted an SNP rs17293632C>T that was in tight LD with the GWAS tag SNP in intron 1 of SMAD3. RegulomeDB and ENCODE data indicated functional effects including enhancer histone marks, DNase hypersensitivity sites, alteration of transcription factor binding site, and chromatin immunoprecipitation sequencing data demonstrating binding of multiple transcription factors. In contrast, the GWAS tag SNP had no data to suggest functional effects. In a series of experiments, we showed that the sequence encompassing rs17293632 acts as a strong enhancer in human arterial smooth muscle cells. The common allele (C) preserves an AP-1 (activator protein-1) site and enhancer function, whereas the protective (T) allele disrupts the AP-1 site and significantly reduces enhancer activity (P<0.001). Pharmacological inhibition of AP-1 activity upstream demonstrated that this allele-specific enhancer effect is AP-1 dependent (P<0.001). Chromatin immunoprecipitation experiments revealed binding of several AP-1 component proteins with preferential binding to the C allele. We then showed that rs17293632 is an eQTL for SMAD3 in blood and atherosclerotic plaque with reduced expression of SMAD3 in carriers of the protective allele. Finally, siRNA knockdown of SMAD3 in human arterial smooth muscle cells increased cell viability.[63] Consistently, Iyer et al[66] recently demonstrated that SMAD3 inhibits proliferation and facilitates differentiation of HCASMCs (human coronary artery smooth muscle cells) to promote atherosclerosis.
Missense Variants
A small number of GWAS-identified CAD risk SNPs are coding variants, including rs11556924 in ZC3HC1, resulting in an Arg363His substitution in NIPA (nuclear interaction partner of ALK). In functional studies, we demonstrated that the ensuing cellular phenotype of the protective variant is one of increased NIPA expression and nuclear mobility and lower rates of cell growth.[67]
Applications of Genomic Data in Clinical Medicine and Drug Development
There are 2 potential applications of GWAS findings in clinical cardiovascular medicine. These are MR analyses to identify causal biomarkers and the application of a PRS to assess future CAD risk relevant to the earlier application of preventive therapies in otherwise healthy individuals.
Mendelian Randomization
Clinical and epidemiological studies have identified a large number of biomarkers that are robustly associated with CAD. However, many of these are not causal risk factors because of confounding and reverse causation. MR is based on a simple tenet. If a biomarker has a causal association with a disease, the genetic determinants of the biomarker will also associate with disease risk.[68] An important requirement of MR analysis is lack of pleiotropy, meaning that the genetic variant(s) in question influences only the biomarker of interest. This is not always the case for plasma lipid traits, where pleiotropic effects are evident for several genes including CETP, LPL, and APOA5.Using MR, there have been major advances in the demonstration of a causal association between plasma levels of lipoprotein (a), LDL-C, and plasma triglycerides (as a marker of remnant cholesterol) and risk of CAD. Elevated levels of LDL-C are causally associated with CAD risk, as evident in individuals with monogenic familial hypercholesterolemia, most often due to a mutation in the LDLR gene. MR studies demonstrate that genetic variants at the PCSK9, NPC1L1, and HMGCR loci uniquely associate with plasma concentrations of LDL-C and are predictive of CAD risk. Thus, these data provide robust genetic evidence for the value of specific pharmaceutical treatments, not only statins but also ezetimibe and PCSK9 inhibitors.In contrast, although CRP (C-reactive protein) and Lp-PLA2 (lipoprotein-associated phospholipase A2) are robust CAD risk markers,[69] they have not been shown to be causal.[70,71] MR analyses would have predicted the negative outcome of large randomized controlled trials targeting Lp-PLA2.[72] A closer inspection of MR data before drug design and initiation of clinical trials could mitigate the high costs of drug development.
Utility of a Polygenic Genetic Risk Score for Risk Assessment and Treatment Decisions
Although the effect of the identified susceptibility variants for CAD is individually small, their effects are independent and additive. These can be incorporated into a PRS consisting of the number of risk alleles adjusted for their individual effect size. In 2010, we demonstrated the ability of 12 robustly associated CAD loci to augment the ability of CAD risk prediction beyond 9p21.3.[73] In 2011, as part of the CARDIoGRAM study,[5] we explored a weighted score based on the 23 validated CAD risk variants and observed a >3-fold difference in CAD risk between the top and bottom deciles of the risk scores. The availability of larger GWAS data sets, improved computational methods, and large-scale biobank cohorts for testing and replication has now made it possible to create and validate PRS for CAD using ≤6.6 million SNPs across the genome[74] (Figure 5).
Figure 5.
Predictive value of a polygenic risk score for coronary artery disease (CAD).
A, In an angiographic study, we demonstrated that the number of copies of the 9p21 risk allele strongly predicted atherosclerosis burden.[6]
B, In an early analysis from CARDIoGRAM (Coronary Artery Disease Genome-Wide Replication and Meta-Analysis), using a weighted score based on the 23 validated CAD risk variants, we demonstrated a >3-fold difference in CAD risk between the top and bottom deciles of the risk scores.[5]
C, Using ≤6.6 million single-nucleotide polymorphisms across the genome, Khera et al[74] demonstrated that a polygenic risk score for CAD confers a progressive increase in CAD risk ≤10-fold for individuals in the top 1 percentile. OR indicates odds ratio; and MAF, minor allele frequency.
Predictive value of a polygenic risk score for coronary artery disease (CAD).
A, In an angiographic study, we demonstrated that the number of copies of the 9p21 risk allele strongly predicted atherosclerosis burden.[6]
B, In an early analysis from CARDIoGRAM (Coronary Artery Disease Genome-Wide Replication and Meta-Analysis), using a weighted score based on the 23 validated CAD risk variants, we demonstrated a >3-fold difference in CAD risk between the top and bottom deciles of the risk scores.[5]
C, Using ≤6.6 million single-nucleotide polymorphisms across the genome, Khera et al[74] demonstrated that a polygenic risk score for CAD confers a progressive increase in CAD risk ≤10-fold for individuals in the top 1 percentile. OR indicates odds ratio; and MAF, minor allele frequency.Using this approach, Khera et al demonstrated that among patients with an MI before the age of 55 y, 17% have a high PRS for CAD as compared with 2% harboring a mutation in the LDLR, APOB, or PCSK9 genes. Furthermore, they showed that a high PRS for CAD confers a relative risk of 4- to 5-fold, similar to that of familial hypercholesterolemia.[75] In a separate analysis, the UK Biobank CardioMetabolic Consortium CHD Working Group created a meta-GRS (genetic risk score) consisting of 1.7 million genetic variants. Their study demonstrated that those in the top 20% of the meta-GRS distribution had a hazard ratio for CAD of 4.2 compared with those in the bottom 20%. Importantly, the corresponding hazard ratio was 2.8 among individuals on lipid-lowering or antihypertensive medications, reinforcing the mitigating effects of preventive therapies.[37] Thus, a PRS can provide important information on future CAD risk beyond traditional risk factors and in the future may have both clinical and economic value in identifying those persons who will benefit most from the early application of various disease-deferring therapies, including statin treatment.
Future Perspectives
The last decade of GWAS has provided a broader understanding of genetic risk for CAD, complementing the earlier candidate gene approach. Although traditional risk factors remain important, application of these data using a systems genetics approach, has pointed to substantial roles for genes and pathways relevant to vessel wall biology and immune function. However, the published GWAS data provide insight rather than mechanism. Moving forward, a major priority is to develop and apply high-throughput methodology to understand at the molecular and cellular levels the function of each of the novel loci, the majority of which are in noncoding regions of the genome. This is a tedious process as outlined above. Bioinformatic analysis of available epigenomic, eQTL, mQTL (metabolite QTL), and miQTL data and traditional molecular approaches at the cellular level are a first step. CRISPR-Cas9 (clustered regularly interspaced short palindromic repeats/clustered regularly interspaced short palindromic repeat–associated 9) and other techniques allow creation of genetically modified human iPSC that can then be differentiated into a relevant cell type for further study. A promising approach, albeit with many steps and acknowledged pitfalls.[76] Finally, animal models have been the backbone of atherosclerosis research, but studies in mice are limited by the small effect size of GWAS-identified genetic variants crossed into an Ldlr−/− or Apoe−/− background. Furthermore, deletion of an entire CAD risk locus, as shown for 9p21,[77] may not result in an expected phenotype.Beyond function, these data have been of clinical utility for interrogation of biomarker causality and drug discovery, through MR, albeit with a number of caveats. Polygenetic risk scores for a host of human diseases can be created at birth, at a genotyping/computational cost of <$100. However, questions remain as to how these data could be best communicated to the patient to encourage lifestyle changes, early screening, or medication adherence without causing undue psychological distress. Another important limitation is that the published PRSs apply to individuals of European ancestry. Equitable application of these clinical tools will require expanded GWAS in all other ethnic groups.
Sources of Funding
This study was funded by a Canadian Institutes of Health Research Foundation grant.
Authors: Sonny Dandona; Alexandre F R Stewart; Li Chen; Kathryn Williams; Derek So; Ed O'Brien; Christopher Glover; Michel Lemay; Olivia Assogba; Lan Vo; Yan Qing Wang; Marino Labinaz; George A Wells; Ruth McPherson; Robert Roberts Journal: J Am Coll Cardiol Date: 2010-08-03 Impact factor: 24.094
Authors: Anna Helgadottir; Gudmar Thorleifsson; Kristinn P Magnusson; Solveig Grétarsdottir; Valgerdur Steinthorsdottir; Andrei Manolescu; Gregory T Jones; Gabriel J E Rinkel; Jan D Blankensteijn; Antti Ronkainen; Juha E Jääskeläinen; Yoshiki Kyo; Guy M Lenk; Natzi Sakalihasan; Konstantinos Kostulas; Anders Gottsäter; Andrea Flex; Hreinn Stefansson; Torben Hansen; Gitte Andersen; Shantel Weinsheimer; Knut Borch-Johnsen; Torben Jorgensen; Svati H Shah; Arshed A Quyyumi; Christopher B Granger; Muredach P Reilly; Harland Austin; Allan I Levey; Viola Vaccarino; Ebba Palsdottir; G Bragi Walters; Thorbjorg Jonsdottir; Steinunn Snorradottir; Dana Magnusdottir; Gudmundur Gudmundsson; Robert E Ferrell; Sigurlaug Sveinbjornsdottir; Juha Hernesniemi; Mika Niemelä; Raymond Limet; Karl Andersen; Gunnar Sigurdsson; Rafn Benediktsson; Eric L G Verhoeven; Joep A W Teijink; Diederick E Grobbee; Daniel J Rader; David A Collier; Oluf Pedersen; Roberto Pola; Jan Hillert; Bengt Lindblad; Einar M Valdimarsson; Hulda B Magnadottir; Cisca Wijmenga; Gerard Tromp; Annette F Baas; Ynte M Ruigrok; Andre M van Rij; Helena Kuivaniemi; Janet T Powell; Stefan E Matthiasson; Jeffrey R Gulcher; Gudmundur Thorgeirsson; Augustine Kong; Unnur Thorsteinsdottir; Kari Stefansson Journal: Nat Genet Date: 2008-01-06 Impact factor: 38.330
Authors: Frederick E Dewey; Viktoria Gusarova; Richard L Dunbar; Colm O'Dushlaine; Claudia Schurmann; Omri Gottesman; Shane McCarthy; Cristopher V Van Hout; Shannon Bruse; Hayes M Dansky; Joseph B Leader; Michael F Murray; Marylyn D Ritchie; H Lester Kirchner; Lukas Habegger; Alex Lopez; John Penn; An Zhao; Weiping Shao; Neil Stahl; Andrew J Murphy; Sara Hamon; Aurelie Bouzelmat; Rick Zhang; Brad Shumel; Robert Pordy; Daniel Gipe; Gary A Herman; Wayne H H Sheu; I-Te Lee; Kae-Woei Liang; Xiuqing Guo; Jerome I Rotter; Yii-Der I Chen; William E Kraus; Svati H Shah; Scott Damrauer; Aeron Small; Daniel J Rader; Anders Berg Wulff; Børge G Nordestgaard; Anne Tybjærg-Hansen; Anita M van den Hoek; Hans M G Princen; David H Ledbetter; David J Carey; John D Overton; Jeffrey G Reid; William J Sasiela; Poulabi Banerjee; Alan R Shuldiner; Ingrid B Borecki; Tanya M Teslovich; George D Yancopoulos; Scott J Mellis; Jesper Gromada; Aris Baras Journal: N Engl J Med Date: 2017-05-24 Impact factor: 91.245
Authors: Toni I Pollin; Coleen M Damcott; Haiqing Shen; Sandra H Ott; John Shelton; Richard B Horenstein; Wendy Post; John C McLenithan; Lawrence F Bielak; Patricia A Peyser; Braxton D Mitchell; Michael Miller; Jeffrey R O'Connell; Alan R Shuldiner Journal: Science Date: 2008-12-12 Impact factor: 47.728
Authors: Ruth McPherson; Alexander Pertsemlidis; Nihan Kavaslar; Alexandre Stewart; Robert Roberts; David R Cox; David A Hinds; Len A Pennacchio; Anne Tybjaerg-Hansen; Aaron R Folsom; Eric Boerwinkle; Helen H Hobbs; Jonathan C Cohen Journal: Science Date: 2007-05-03 Impact factor: 47.728
Authors: Nadeem Sarwar; Manjinder S Sandhu; Sally L Ricketts; Adam S Butterworth; Emanuele Di Angelantonio; S Matthijs Boekholdt; Willem Ouwehand; Hugh Watkins; Nilesh J Samani; Danish Saleheen; Debbie Lawlor; Muredach P Reilly; Aroon D Hingorani; Philippa J Talmud; John Danesh Journal: Lancet Date: 2010-05-08 Impact factor: 79.321
Authors: Jeppe Zacho; Anne Tybjaerg-Hansen; Jan Skov Jensen; Peer Grande; Henrik Sillesen; Børge G Nordestgaard Journal: N Engl J Med Date: 2008-10-30 Impact factor: 91.245
Authors: Juan P Casas; Ewa Ninio; Andrie Panayiotou; Jutta Palmen; Jackie A Cooper; Sally L Ricketts; Reecha Sofat; Andrew N Nicolaides; James P Corsetti; F Gerry R Fowkes; Ioanna Tzoulaki; Meena Kumari; Eric J Brunner; Mika Kivimaki; Michael G Marmot; Michael M Hoffmann; Karl Winkler; Winfred März; Shu Ye; Heide A Stirnadel; S Matthijs Boekholdt; Kay-Tee Khaw; Steve E Humphries; Manjinder S Sandhu; Aroon D Hingorani; Philippa J Talmud Journal: Circulation Date: 2010-05-17 Impact factor: 29.690
Authors: Ru Li; Huan Zhang; Fan Tang; Chengcheng Duan; Dan Liu; Naqiong Wu; Yonghong Zhang; Laiyuan Wang; Xingbo Mo Journal: Front Cardiovasc Med Date: 2022-09-20