Literature DB >> 33878186

Cis-epistasis at the LPA locus and risk of cardiovascular diseases.

Lingyao Zeng¹, Sylvain Moser^2,3, Nazanin Mirza-Schreiber^2,4, Claudia Lamina⁵, Stefan Coassin⁵, Christopher P Nelson^6,7, Tarmo Annilo⁸, Oscar Franzén^9,10, Marcus E Kleber¹¹, Salome Mack⁵, Till F M Andlauer^2,12, Beibei Jiang², Barbara Stiller¹, Ling Li¹, Christina Willenborg¹³, Matthias Munz^13,14,15, Thorsten Kessler^1,16, Adnan Kastrati^1,16, Karl-Ludwig Laugwitz¹⁷, Jeanette Erdmann^13,14, Susanne Moebus^18,19, Markus M Nöthen²⁰, Annette Peters^21,22, Konstantin Strauch^21,22,23, Martina Müller-Nurasyid^21,22,23,24, Christian Gieger^21,25, Thomas Meitinger²⁶, Elisabeth Steinhagen-Thiessen²⁷, Winfried März^11,28, Andres Metspalu^8,29, Johan L M Björkegren^9,10, Nilesh J Samani^6,7, Florian Kronenberg⁵, Bertram Müller-Myhsok^2,30,31, Heribert Schunkert^1,14.

Abstract

AIMS: Coronary artery disease (CAD) has a strong genetic predisposition. However, despite substantial discoveries made by genome-wide association studies (GWAS), a large proportion of heritability awaits identification. Non-additive genetic effects might be responsible for part of the unaccounted genetic variance. Here, we attempted a proof-of-concept study to identify non-additive genetic effects, namely epistatic interactions, associated with CAD. METHODS AND
RESULTS: We tested for epistatic interactions in 10 CAD case-control studies and UK Biobank with focus on 8068 SNPs at 56 loci with known associations with CAD risk. We identified a SNP pair located in cis at the LPA locus, rs1800769 and rs9458001, to be jointly associated with risk for CAD [odds ratio (OR) = 1.37, P = 1.07 × 10-11], peripheral arterial disease (OR = 1.22, P = 2.32 × 10-4), aortic stenosis (OR = 1.47, P = 6.95 × 10-7), hepatic lipoprotein(a) (Lp(a)) transcript levels (beta = 0.39, P = 1.41 × 10-8), and Lp(a) serum levels (beta = 0.58, P = 8.7 × 10-32), while individual SNPs displayed no association. Further exploration of the LPA locus revealed a strong dependency of these associations on a rare variant, rs140570886, that was previously associated with Lp(a) levels. We confirmed increased CAD risk for heterozygous (relative OR = 1.46, P = 9.97 × 10-32) and individuals homozygous for the minor allele (relative OR = 1.77, P = 0.09) of rs140570886. Using forward model selection, we also show that epistatic interactions between rs140570886, rs9458001, and rs1800769 modulate the effects of the rs140570886 risk allele.
CONCLUSIONS: These results demonstrate the feasibility of a large-scale knowledge-based epistasis scan and provide rare evidence of an epistatic interaction in a complex human disease. We were directed to a variant (rs140570886) influencing risk through additive genetic as well as epistatic effects. In summary, this study provides deeper insights into the genetic architecture of a locus important for cardiovascular diseases.

Entities: Chemical

Keywords: Coronary artery diseases; Epistasis; LPA; Statistical genetics

Mesh：

Substances：
Lipoprotein(a)

Year: 2022 PMID： 33878186 PMCID： PMC8930071 DOI： 10.1093/cvr/cvab136

Source DB: PubMed Journal: Cardiovasc Res ISSN： 0008-6363 Impact factor: 10.787

Introduction

Coronary artery disease (CAD) is one of the largest contributors to morbidity and mortality worldwide. A fundamental aspect of CAD is its complex and multi-factorial aetiology, which includes numerous environmental risk factors, such as obesity and smoking, as well as a strong genetic predisposition. Overall, the genetic variance is estimated to explain 40–50% of the variability in disease manifestation. A decade of genome-wide association studies (GWAS) shed light on the genetic architecture of the disease, discovering 163 genetic loci associated with CAD risk., About a quarter of CAD heritability can be explained by additive effects of these and other common genetic variants., More complex models involving gene regulatory networks may help to better explain the heritability of the disease. In addition, at some of these loci, multiple independent signals were described, showing intra-locus allelic heterogeneity. Until now, non-additive genetic effects, such as epistatic interactions, are largely neglected for explaining the heritability of CAD. However, epistasis has been postulated by some to account for part of this ‘missing heritability’ and has also been found to act alongside additive effects to influence complex phenotypes., Epistatic interactions have profound effects in bacteria as well as in other higher model organisms and have been shown to regulate some quantitative traits in humans. However, evidence of epistasis in human genetics remains very scarce, because individual-level data with large sample sizes are required for epistasis studies. Moreover, the combinatorial nature of epistasis makes hypothesis-free genome-wide interaction analyses (GWIAs) computationally demanding and plagued with a high multiple testing burden. Finally, associations based on interactions appear to suffer from a low replication rate, and genetic interactions are sometimes difficult to disentangle from the tagging of haplotypes. Indeed, a non-causal combination of alleles at multiple SNPs co-inherited with a rare causal variant could act as a tag for this variant. To face the computational complexities in search for interacting loci affecting risk for CAD, we conducted a two-stage statistical scanning procedure for epistasis using a GPU-accelerated software on individual-level data from several GWAS on CAD. The scan was based on susceptibility regions defined around the top 56 known CAD loci, thereby limiting the multiple testing correction burden while maximizing the likelihood to discover biologically relevant interactions.

Methods

Cohorts

CAD case–control studies

Individual-level genotypes were obtained from 10 CAD case–control studies. From Germany: the German Myocardial Infarction Family Studies (GerMIFS) I, II, III (KORA), IV, V, VI; the LUdwigshafen RIsk and Cardiovascular Health Study (LURIC); from Germany, England, and France: Cardiogenics; from England: Wellcome Trust Case Control Consortium (WTCCC),; from France, Italy, Germany, and the USA: Myocardial Infarction Genetics Consortium (MIGen)., Data from the WTCCC were obtained via the Leducq network ‘CADgenomics’ (https://www.fondationleducq.org/network/understanding-coronary-artery-disease-genes/). MIGen data were obtained via the database of Genotypes And Phenotypes (dbGaP; project ID #49717-3). The genotype processing procedures including QC and imputation are provided in Supplementary material online, Methods. The final sample sizes for each study after QC are listed in Supplementary material online, . All participants were of European origin and gave prior written informed consent, which specifically addressed that the materials will be used for genetic studies. All studies obtained institutional review board approval from their local Ethical Committees and were performed in accordance with the 1964 Helsinki Declaration and its later amendments. Ascertainment and assessment methods for CAD of each study are provided in the corresponding publications.

UK Biobank

The UK Biobank (UKBB) project (http://www.ukbiobank.ac.uk) is a large prospective cohort study of ∼500 000 individuals from across the UK, aged 40–69 years at recruitment. In the present study, CAD cases were defined using the ‘SOFT’ and ‘HARD’ criteria, i.e., as individuals with fatal or nonfatal myocardial infarction (MI), percutaneous transluminal coronary angioplasty (PTCA), coronary artery bypass grafting (CABG), chronic ischaemic heart disease (IHD), and angina. Peripheral arterial disease (PAD) cases were defined as self-reported history of PAD, leg claudication/intermittent claudication or either hospitalization or death due to ICD9-443.9, ICD9-444, ICD10-I73.9, or ICD10-I74. Aortic valve stenosis cases were defined as a self-reported history of aortic stenosis or either hospitalization or death due to ICD9-424.1 or ICD10-I35.0. The post-imputation sample quality control (QC) performed in the UKBB dataset is detailed in the Supplementary material online, Methods. UKBB data were accessed under the approval of UKBB within project 9922. The study was conducted following the principles of the declaration of Helsinki and all participants gave prior written informed consent.

KORA F3/F4 studies and STARNET-Study

Individual-level genotypes were obtained from population studies from Augsburg, Germany: KORA F3 and KORA F4, along with lipid measurements including total lipoprotein(a) (Lp(a)) levels and the number of Kringle repeats of the Lp(a) protein. RNAseq data were generated from liver tissue of 522 CABG CAD patients from the Stockholm-Tartu Reverse Network Engineering Task (STARNET) study. These studies obtained institutional review board approval from their local Ethical Committees and were performed in accordance with the 1964 Helsinki Declaration and its later amendments. All participants gave prior written informed consent. Further information about these studies is provided in the Supplementary material online, Methods.

Epistasis scan

Broad sense CAD susceptibility region

We focused our analysis on 56 loci with previous evidence from GWAS on CAD, (Supplementary material online, ) in order to restrict the number of variants for testing of statistical epistasis. Our aim was to enhance computation time and the likelihood of true positive findings by easing the multiple testing correction burden. CAD susceptibility regions were defined as ±500 kb around each of the 56 lead SNPs., This window size was chosen to capture the loci as completely as possible while minimizing the computational burden: the variance explained by the lead SNPs accounted for only 46% of the variance explained when including their flanking ±500 kb regions (Supplementary material online, and Methods). We then pruned the variants in each region to 8,068 SNPs with pairwise r2 < 0.5 located in the broad CAD susceptibility regions.

Statistical interaction analysis

We used the general framework for detecting statistical epistasis in quantitative genetics as proposed by Hansen and Wagner on the pairwise epistasis between two loci (SNPs) and implemented a two-stage statistical scanning procedure (Figure ). The first step of the testing procedure consisted in a loose but fast statistical filtering using the GLIDE GPU computation tool. For each possible pair of SNPs, we fitted a linear model with the CAD phenotype as the dependent variable and the marginal effect of the two SNPs and their interaction term as predictors [Eq. (1)]. Each SNP’s genotype was encoded in four different models, dosage, dominant, recessive, and heterozygous with respect to the minor allele and all combinations of 24 models were tested for each pair of SNPs. A relatively loose and arbitrary significance level (P < 1 × 10−8) was applied for primary filtering, with the assumption that if true epistasis existed between two SNPs, signals of moderate strength should be detectable between the SNPs within the corresponding linkage disequilibrium (LD) block. This threshold was defined with the aim to detect such pair in LD with the true epistasis signal and to forward a manageable number of pairs to the second step. Scheme of the two-stage statistical interaction scanning procedure. Step 1 aimed at the fast identification of potential significant interaction terms using the GLIDE GPU computation tool. For each pair of LD-independent SNPs in the susceptibility regions (N = 8068 SNPs), we fitted a linear model with the additive and interaction effect of the two SNPs in each of the 10 CAD studies separately. The 10 P-values were then meta-analysed. A loose and arbitrary defined significance level (P < 1e–8) was applied with the assumption that if there exists true epistasis between two lead SNPs, loose signals should be detectable between the SNPs within the corresponding LD block. Step 2 aimed at validating the results of the first step using logistic regression model including the first 10 multi-dimensional scaling (MDS) components of the genetic relationship matrix to correct for population structure. Step 2 also allowed the fine-mapping of candidate SNP pairs by screening for the strongest signal among all the SNPs within the LD blocks forwarded from Step 1. In this second step, we applied a stringent significance threshold of 4.6 × 10−9, calculated as a Bonferroni correction (0.05/(nSNP_indep × (nSNP_indep − 1)/2) = 4.6178e−9) on the number of LD-independent SNPs resulting from Step 1 (nSNP_indep = 4654). The second step included the fine-mapping of candidate SNP pairs to screen for the strongest signal among the SNPs in the same LD block. For this purpose, we used R to fit a logistic regression model, slower than the linear model used in step 1, but suited better for the binary CAD phenotype, and extended Eq. (1) to correct for population structure by adding the first 10 multi-dimensional scaling (MDS) components of the genetic relationship matrix [designated as MDS1..10 in Eq. (2) and following equations]. In this second step, we applied a stringent significance threshold of 4.6 × 10−9, calculated as a Bonferroni correction (0.05/(nSNP_indep × (nSNP_indep−1)/2) = 4.6178e−9) on the number of LD-independent SNPs resulting from Step 1 (nSNP_indep = 4654). Each SNP pair was encoded in the genetic model displaying the highest significance in Step 1. In the discovery phase, the same epistasis testing procedure was performed in each of the 10 CAD case–control studies separately. The models used genotype data imputed to the 1000 Genomes Phase 3 (1000GP3) reference panel. This regression analysis was followed by fixed-effects meta-analysis to estimate the overall effect size and standard error. The final epistasis pair of interest was then re-analysed in the same studies imputed using the Haplotype Reference Consortium (HRC) reference panel, to enable a more complete coverage of the region of interest in all 10 cohorts. Thereafter, this imputation based on the larger HRC reference was used for the remainder of the manuscript.

Prioritizing candidate SNP pairs of epistasis of CAD

After the detection of SNP pairs showing statistically significant epistatic effects on the risk for CAD, we prioritized candidate pairs based on the following four criteria: We retained only SNP pairs with a high replication potential [i.e. displaying statistical epistasis both significantly (P < 4.6 × 10−9) and consistently (effect sizes pointing in the same direction) in at least eight of the 10 studies in the discovery data, based on both imputations]. LD between two target SNPs located on the same chromosome r2 < 0.2. Weak interaction signals detectable between SNPs that show an LD r2 > 0.5 with any of the two interacting SNPs. The effect of the interaction term is independent (i.e. P-value in conditional models <7.8 × 10−6) of any available third variant in conditional analyses.

Conditional analysis

The aim of the conditional analysis was to test whether the statistical epistasis effects were independent from a third SNP. To this end, we tested for the independence of the interaction term against the SNPs located within a ± 200kb window around the epistatic loci and any known CAD GWAS SNPs that survived the original QC procedure. This window size was chosen to capture all SNPs in significant LD with the pair of interest. Indeed, it has been shown that LD decay with physical distance and is close to 0 at 200 kb., For each of these SNPs, we used R to compute a likelihood ratio test (LRT) between a model including the additive effect of the two target SNPs and the additive effect of the conditioning SNP (all coded as minor allele dosages) [Eq. (3)] and a model including the interaction term in addition [Eq. (4)]. The interaction term was considered dependent on the conditioning SNP if the LRT did not reach a Bonferroni-corrected significance threshold defined on the total number of conditioning SNPs. This analysis was performed on a merged dataset of the 10 CAD studies. Here, the MDS components of the genetic relationship matrix used as covariates were re-calculated on the merged dataset.

2.4 Relative effect sizes and analyses of intermediate traits

Genotypic effect sizes for the different rs140570886 genotypes were computed by regression analysis in R using the dosage genetic model. Association analysis for the continuous intermediate traits Lp(a) protein levels, LPA mRNA levels, and KIV repeats were also performed using linear regression. Lp(a) proteins levels were highly skewed and Inverse Normal Transformation was applied prior association. The relative effect size for the three-SNP haplotypes was computed via haplotype estimation followed by fitting a generalized linear model with the R package happassoc. More detailed descriptions of these statistical procedures are provided in the Supplementary material online, Methods.

Results

Discovery of SNP pairs associated with CAD risk

We identified 56 previously known CAD risk loci from two previous GWAS, (Supplementary material online, ). For our study, we extracted 8068 LD-independent candidate variants within 500 kb of the respective lead SNPs. We observed that these extended regions explained more phenotypic variance than the respective lead SNPs alone (Supplementary material online, and Methods). Testing for statistical interactions was carried out on all pairwise SNPs along with a two-step scheme (described in Figure ) on imputed genotypes from 29 755 participants of 10 European CAD case–controls studies, (Figure ). Four SNP pairs displayed consistent (i.e. in at least 8 of 10 studies) and significant (i.e. P ≤ 4.618 × 10−9) effects and thus met our criteria as candidates for epistasis (Supplementary material online, ). Among these four pairs, two (rs1800769 × rs9458001 and rs116632378× rs3823438) did replicate in the UKBB. The top SNP pair (rs1800769 × rs9458001) showed the strongest effect in a dosage–dosage model and was prioritized for further investigation (Supplementary material online, ). Analysis workflow and datasets. Schematic of the analysis workflow. (A) The epistasis scan was performed as a meta-analysis in the 10 CAD individual GWAS. Association of SNP1 (rs1800769) and SNP2 (rs9458001) interaction with CAD was replicated in the UK Biobank (UKBB). KORA F3/F4, LURIC, and STARNET were used for the association analysis of the interacting pair with proximal phenotypes. (B) The association of the additive effect of rs140570886 with CAD was assessed in a meta-analysis of the 10 CAD studies and replicated in the UKBB. KORA F3/F4 and GTEx were used for the association analysis of with rs140570886 proximal phenotypes. (C) The forward model selection, the dissection of the best model, and the comparison with the haplotypes effect were conducted on a merged dataset of the 10 CAD studies and the UKBB in order to achieve higher power. The 3-SNP haplotypes analysis on the other hand was carried out on the meta-analysis of the 10 CAD studies, because the algorithm used for fitting the Generalized Linear Model could not converge on the merged dataset which was too big. Both rs1800769 and rs9458001 map to chromosome 6, close to the LPA locus (Figure ), and are not in LD with each other (r2 = 0.014, D = 0.535, Table ). None of the SNPs were associated with CAD risk by itself in an additive model [P = 0.59, odds ratio (OR) = 0.99 for rs1800769[T]; P = 0.08, OR = 1.04 for rs9458001[A], Supplementary material online, ]. However, the interaction term displayed a strong association (ORint = 1.42, P = 1.75 × 10−13 for the rs1800769[T] × rs9458001[A] interaction term). In this case, as both SNP were encoded in the additive genetic model, the OR can be interpreted as the increase in likeliness to suffer from CAD associated with an increase of one unit in the product between the number of minor alleles at each of the interacting SNPs. The results were reproduced in the same dataset imputed with the HRC reference panel using rs1652507 (LD with rs1800769, r2 = 0.965, D’ = 0.991, ) as a proxy for rs1800769 (OR = 0.98, P = 0.38 for rs1652507[C]; OR = 1.03, P = 0.1 for rs9458001[A], and ORint = 1.36, P = 1.07 × 10−11 for the rs1652507[C] × rs9458001[A] interaction term, Supplementary material online, ). Thus, the results were qualitatively independent of the imputation panel. This newer and denser imputation with this proxy variant was used for the remainder of the manuscript.

Table 1

Linkage disequilibrium and minor allele frequency

	rs3798220	rs140570886	rs1652507	rs1800769	rs9458001
rs3798220	0.01932745	0.703204	0.05355715	NA	0.0273448
rs140570886	0.703204	0.01551685	0.0759549	NA	0.0393634
rs1652507	0.05355715	0.0759549	0.1591815	0.965	0.0177401
rs1800769	NA	NA	0.965	NA	0.014
rs9458001	0.0273448	0.0393634	0.0177401	0.014	0.2250345

This table shows the pairwise r2 measure of Linkage Disequilibrium (LD) between the reported SNPs and their respective minor allele frequency (MAF) in bold on the diagonal. Values in Italic were computed in the European sub-samples of the 1000 Genomes Project using the LDmatrix tool (https://ldlink.nci.nih.gov/) as rs1800769 was absent from the HRC imputation panel. Other values were computed in each of the 10 CAD studies separately and averaged.

The common variant rs140570886, located in the LPA locus, increases CAD risk in a meta-analysis of 10 CAD studies and replicates in the UK Biobank. (A) Forest plot displaying the log odds ratio (OR) across 10 studies for rs140570886 as well as the fixed meta-analysis (N = 29 755) summary effect (shown as diamond) and the log OR in the replication dataset. The effect in the UK BioBank (UKBB, N = 312 312) is of the same sign and significant, therefore fulfilling the criteria for replication. (B) Manhattan plot showing the regional signal at the LPA locus taken from a recent genome-wide association study and indicating the variants in LD with rs140570886 (red), rs1800769 (blue), rs9458001 (green), and rs3798220 (purple). Linkage disequilibrium and minor allele frequency This table shows the pairwise r2 measure of Linkage Disequilibrium (LD) between the reported SNPs and their respective minor allele frequency (MAF) in bold on the diagonal. Values in Italic were computed in the European sub-samples of the 1000 Genomes Project using the LDmatrix tool (https://ldlink.nci.nih.gov/) as rs1800769 was absent from the HRC imputation panel. Other values were computed in each of the 10 CAD studies separately and averaged. ANOVA table reporting likelihood ratio test results for nested model in the model selection procedure The table displays the result of a series of successive likelihood ratio test between a nested model of increasing complexity performed on the merged dataset including the 10 CAD studies and the UK Biobank dataset. The first and second columns report the Residual Deviance and degrees of freedom of from each row’s model. The ‘Df’ and ‘Deviance’ columns respectively report the difference in degrees of freedom and deviance between each row’s model and the model from the previous row. The ‘P-value’ column reports the P-value of the likelihood ratio test between each row’s model and the previous one. The ‘P-value LRT model 2’ column reports the P-value of the likelihood ratio test between each model and the model containing only the additive effect of rs140570886. The * operator denotes factor crossing: a*b is interpreted as a + b+a × b and a*b*c as a + b+c+ a × b + a × c + b × c. The 10 multi-dimensional scaling components of the genetic variance and the study were included as covariates in every model. Tables showing the results of the same analysis with 3, 5, or 7 MDS components are provided in the Supplementary material online, . Model selection using AIC and Likelihood ratio test confirms epistatic interactions at the LPA locus This table displays the Akaike Information Criterion (AIC) and results of likelihood ratio test for nested models of increasing complexity performed on the merged dataset including the 10 CAD studies and the UK Biobank dataset. The ‘Comparison M_SNPs’ and ‘Comparison M_interact’ columns respectively report the P-values of the likelihood ratio tests with the M_SNPs and M_interact models as null model. The 10 multi-dimensional scaling components of the genetic variance and were included as covariates in every model. NA, non-applicable.

Replication and association with further traits

The UKBB dataset (controls/cases n = 285 520/26 792), used as an external replication sample (Figure ), showed a consistent interaction effect of this SNP pair for CAD (ORint = 1.15, P = 5.67 × 10−10 for the rs1652507 [C]×rs9458001[A] interaction term with the SNPs encoded in the dosage model, Supplementary material online, ). Moreover, we found interaction effects in the same direction and with a comparable magnitude on peripheral vascular disease (controls/cases n = 475 059/4460, ORint = 1.22, P = 2.32 × 10−4) and aortic valve stenosis (controls/cases n = 477 496/2,023, ORint = 1.47, P = 6.95 × 10−7) (Supplementary material online, ), conditions known to be affected by Lp(a) plasma levels., Next, we analysed the influence of the interaction term rs1800769×rs9458001 on circulating Lp(a) levels in a German population-based study (KORA F3/F4,n = 5953) (Figure ). In addition to the association of each SNP separately, we identified a strong interaction effect of both SNPs on inverse-rank normal-transformed (INT) Lp(a) levels (beta = 0.58, P = 8.7 × 10−32, with the SNP encoded in the dosage model, Supplementary material online, ). In the LURIC study, we replicated the significant statistical interaction for INT Lp(a) levels (beta = 0.56, P = 6.93 × 10−16) and found no other circulating factor displaying such effects (data not shown). Finally, we extended our investigation to LPA mRNA expression in liver tissue (Section 2, STARNET study, n = 522) (Figure ), where LPA is transcribed into Apo(a) and further assembled with an LDL-like particle into Lp(a). A significant interaction between the two SNPs was found (P = 1.4 × 10-8) and the effects on LPA mRNA expression correlated with the circulating Lp(a) levels measured in KORA F3/F4 for various genotype subgroups (Supplementary material online, ), suggesting that differential gene expression activity underlies a large component of statistical interaction related to the two SNPs.

Rs140570886-related effects at the LPA locus

An inherent challenge in testing for epistasis of nearby SNPs, even if they are in very low LD, is to discriminate interacting SNPs from SNPs representing a specific haplotype. In order to explore the latter possibility, we assessed the interaction effect after conditioning for any known susceptibility SNPs for CAD (n = 158, Supplementary material online, ) or any available SNP in the flanking ±200 kb region. The LPA region conditional analysis (see Section 2) did not yield any significant results (Supplementary material online, ). However, studying GWAS lead SNPs (Supplementary material online, ) uncovered that rs3798220 reduced the significance of the rs1652507×rs9458001 interaction term (increase from P = 1.07 × 10−11 to P = 2.08 × 10−5, likelihood ratio test). In follow-up analyses, we also analysed the influence of the rare variant rs140570886 at the LPA locus, previously shown to be univariately associated with Lp(a) levels. This variant was not included in our primary analysis because its minor allele frequency was lower than our QC threshold (see Section 2), but was pointed out to us as requiring special attention. We therefore specifically investigated rs140570886 in the conditional analysis and observed a drastic decrease in the statistical support for the rs1652507×rs9458001 interaction term (from P = 8.95 × 10−14 to P = 0.022, likelihood ratio test). In order to test if these two SNPs (LD between rs140570886 and rs3798220: r2 = 0.808, D’ = 0.899) represented independent signals, we performed model selection using the likelihood ratio test. Adding rs3798220 to a model already containing rs140570886 did not improve the fit significantly (P = 0.49, likelihood ratio test). We therefore conclude that rs3798220 is not independent of rs140570886 and did not assess this SNP in further analyses. We next investigated the additive effect of rs140570886 on CAD risk and found a significant association (OR = 1.98, P = 1.14 × 10−21, Figure , Supplementary material online, ). We replicated this association in the UKBB dataset (OR = 1.46, P = 2.77 × 10−32) (Figure ). Furthermore, In the UKBB, we found an association in the same direction and comparable magnitude for peripheral arterial disease (controls/cases n = 315 072/29 877, OR = 1.43, P = 7.83 × 10−6) and aortic valve stenosis (controls/cases n = 315 072/29 877, OR = 1.71, P = 1.25 × 10−7) (Supplementary material online, ), both of which are manifestations of atherosclerosis in coronary arteries for which Lp(a) plasma levels affect risk., To assess the contribution of rs140570886 genotypes to disease risk beyond the additive model, we next computed genotypic ORs for heterozygous [T/C] and minor allele homozygous genotypes [C/C] compared to the major allele homozygous reference genotype [T/T]. The genotypic model has the advantage that it does not make any assumption on the underlying genetic model. In the meta-analysis of the 10 CAD studies, we observed an OR of 1.88 (P = 2.32 × 10−18) for the T/C heterozygous genotype (Figure Supplementary material online, ). A reliable effect estimate could not be calculated for the minor allele homozygous genotype C/C, due to its low frequency. The result for the T/C genotype was replicated in UKBB (OR = 1.46, P = 9.97 × 10−32) where we observed a trend for a higher relative OR for CC-homozygous subjects, although this was non-significant, likely due to its low frequency (OR = 1.77, P = 0.09; Figure ). This increase of the genotypic OR with the number of minor alleles suggests that the additive genetic model is indeed likely correct for rs140570886. Coherent with this, the saturated genotypic model does not provide a better fit than the additive model (P = 0.12, likelihood ratio test). We also observed a strong association of rs140570886 with Lp(a) levels (beta = 1.54, P = 9.52 × 10−82). As was the case for CAD risk, analyses of genotypic models indicated a linear increase with the minor allele count and thus supported an additive model (Figure ). Genotype specific effect of rs140570886 on CAD risk and intermediate factors. (A) Genotypic log odds ratios (OR) (with reference to the genotype [T/T]) for the genotype subgroup [T/C] on CAD risk in the meta-analysis of ten CAD studies (N = 29 755). The OR for the minor allele homozygous genotype (C/C) is not displayed because of its low sample size and high standard error. Error bars represents the standard error of the log OR. (B) Genotypic OR (with reference to the genotype [T/T]) for the genotype subgroup [T/C] and [C/C] on CAD risk in the UK Biobank dataset (N = 312 312). Error bars represents the standard error of the log Odds Ratio. C. Relative effect size (with reference to the genotype [T/T]) for the genotype subgroup T/C on intermediate factors, namely inverse normal transformed Lp(a) levels (blue), KIV size of the dominantly expressed apo(a) isoform (purple), and the inverse normal transformed Lp(a) levels independent of the KIV (orange) in the KORA F3/F4 studies (N = 5953). Relative effect size for the minor allele homozygous genotype (C/C) is not displayed, because not represented in the KORA studies. Error bars represent standard error of the effect sizes. Circulating Lp(a) levels are modulated by at least two independent mechanisms. First, they are inversely correlated with the number of Kringle IV type 2 repeats (KIV-2 CNV),, with fewer KIV-2 CNV repeats associated with more Lp(a) release from liver cells. They account for about 18% of the variability in Lp(a) levels in Western Europeans. However, individuals with the same number of KIV-2 CNV repeats may still differ up to 200-fold with respect to their Lp(a) levels,, suggesting transcriptional mechanisms. In the KORA cohorts, we observed an association of rs140570886 with the KIV-2 CNV, with heterozygous rs140570886 carriers having fewer KIV-2 CNV repeats (beta = −5.74, P = 3.55 × 10−26) (Supplementary material online, and Methods). However, rs140570886 was in minimal LD with the reported 61 KIV-2 CNV-representing variants and the three independent modifier variants that influence the relationship between KIV-2 CNV and Lp(a) cholesterol (data not shown). More importantly, the effect of rs140570886 on Lp(a) levels remained highly significant after adjustment for the KIV-2 CNV (beta = 1.11, P = 1.94 × 10−57) (Figure , Supplementary material online, ). This strongly suggests that the effect of rs140570886 on Lp(a) levels is independent of the KIV-2 CNV and might therefore be modulated by transcriptional regulation. In accordance with this hypothesis, we found rs140570886 to be part of a significant expression quantitative trait locus (eQTL) with LPA mRNA expression levels in liver tissue, where LPA is transcribed to Apo(a) and further assembled into Lp(a) (GTEx V8, normalized effect size = 0.98, P = 1.2 × 10−7).

Interaction between rs140570886 and the rs1652507-rs9458001 pair

Although it appeared that part of the rs1652507×rs9458001 interaction was due to tagging of an rs140570886-related effect, we wondered if epistasis could still be present. To investigate this possibility, we applied a likelihood ratio tests-based forward model selection procedure starting with only rs140570886 going up to a model including all main effects and interactions between the four SNPs, rs1652507, rs9458001, rs140570886, and rs3798220 (Supplementary material online, Methods). To increase statistical power, we here analysed the CAD studies and UKBB jointly. We observed a significant increase in model fit when rs1652507 and rs9458001 were added as predictors to the model already containing rs140570886 (. The addition of the rs1652507×rs9458001 interaction term to this second model did not improve the fit further, coherent with the observed drop in the significance when conditioning the model containing the interaction term on rs140570886. However, the model fit increased significantly and reached its best level when all two-way and three-way interactions were added to the model. The direct comparison of this two-and-three-way interaction model to the rs140570886-only model yielded a P-value of the same magnitude as the original P-value threshold used for the epistasis screening (P = 6.46 × 10−9). Using a type-III Sum of Squares ANOVA to dissect this final model, provided further insights into the importance of the different coefficients (Supplementary material online, ): we observed in the two-and-three-way interactions model that the additive effect of rs140570886 became non-significant while the additive effect of rs1652507 reached significance. Moreover, albeit the originally discovered rs1652507 × rs9458001 interaction term became non-significant, we observed nominally significant interactions of both rs1652507 and rs9458001 with rs140570886. These results suggest that the additive effect of rs140570886 on CAD risk might actually be caused by more complicated patterns of cis-epistatic interactions.

Table 2

ANOVA table reporting likelihood ratio test results for nested model in the model selection procedure

Model	Residuals. Df	Residuals deviance	Df	Deviance	P-value	P-value LRT model 2
(1) CAD ∼ covariates	342 046	222 021	NA	NA	NA	NA
(2) CAD ∼ rs140570886 + covariates	342 045	221 804	1	217.04	4 × 10⁻⁴⁹	NA
(3) CAD ∼ rs140570886 + rs9458005 + rs1652507 + covariates	342 043	221 770	2	33.57	5.1 × 10⁻⁰⁸	5.1 × 10⁻⁰⁸
(4) CAD ∼ rs140570886 + rs9458005 * rs1652507 + covariates	342 042	221 768	1	2.32	0.13	7.9 × 10⁻⁰⁸
(5) CAD ∼ rs140570886 * rs9458005 * rs1652507 + covariates	342 039	221 755	3	13.42	0.004	6.5 × 10⁻⁰⁹
(6) CAD ∼ rs140570886 * rs9458005 * rs1652507 +rs3798220+ covariates	342 038	221 753	1	1.93	0.16	8.2 × 10⁻⁰⁹
(7) CAD ∼ rs140570886 * rs9458005 * rs1652507 * rs3798220+ covariates	342 031	221 746	7	6.22	0.51	8.2 × 10⁻⁰⁹

The table displays the result of a series of successive likelihood ratio test between a nested model of increasing complexity performed on the merged dataset including the 10 CAD studies and the UK Biobank dataset. The first and second columns report the Residual Deviance and degrees of freedom of from each row’s model. The ‘Df’ and ‘Deviance’ columns respectively report the difference in degrees of freedom and deviance between each row’s model and the model from the previous row. The ‘P-value’ column reports the P-value of the likelihood ratio test between each row’s model and the previous one. The ‘P-value LRT model 2’ column reports the P-value of the likelihood ratio test between each model and the model containing only the additive effect of rs140570886. The * operator denotes factor crossing: a*b is interpreted as a + b+a × b and a*b*c as a + b+c+ a × b + a × c + b × c. The 10 multi-dimensional scaling components of the genetic variance and the study were included as covariates in every model. Tables showing the results of the same analysis with 3, 5, or 7 MDS components are provided in the Supplementary material online, .

To better understand the genetics underlying this statistical model, we computed the relative OR for each of the eight possible haplotypes. It appeared that all haplotypes including the major T allele for rs140570886 showed similar ORs. Interestingly, we observed that the effect size varied profoundly across haplotypes containing the rs140570886 minor allele C, depending on the rs1652507 genotype (red vs. blue on Figure , Supplementary material online, ). Moreover, for the haplotype rs140570886[C]—rs1652507[T], we observed that the ORs were much lower for the [A] as compared to the [G] allele at rs9458001, although the standard errors were large due to the low frequencies of rarer haplotypes (Figure , Supplementary material online, ). These two observations reflect the marginally significant interaction coefficients between rs140570886 and rs1652507 and between rs140570886 and rs9458001 in the two- and three-way interaction model (Supplementary material online, ). Relative effect of the rs140570886-rs1652507-rs9458001 3-SNP haplotypes on CAD risk. Relative odds ratio (OR; with reference to the most frequent TTG haplotypes) for the eight possible 3-SNP haplotypes on CAD risk. The red and blue colours represent the base at the rs1652507 SNP, the square and triangle shapes represent the base at the rs9458001 SNP and the position on the X-axis represent the base at the rs140570886 SNP. Together they indicate one of the eight possible 3-SNP haplotypes. The putative haplotypes were computed using the happasoc R package on a merged dataset of the 10 CAD studies (N = 29 755). Error bars represent the standard error of the log OR. Finally, in a further attempt to distinguish epistatic interactions involving these three SNPs from a haplotype effect, we compared different models containing SNPs encoded in the additive model and either haplotypes, interactions, or both, using the Akaike Information Criterion (AIC) (). The model including SNPs and their interactions but no haplotypes showed the best AIC. This result firstly confirms that interactions between the three SNPs improve model fit compared to an additive only model. Secondly, it suggests the presence of a real epistatic interaction between the three SNPs rather than an exclusive haplotype effect. The likelihood ratio test applied to the nested model confirmed this interpretation. Indeed, the most complex model, including SNPs, haplotypes, and interactions, did not provide a better fit than the one without the haplotypes (P = 0.26, likelihood ratio test), whereas adding interactions to the SNPs model improved the fit significantly (P = 0.0034, likelihood ratio test). Although we cannot exclude the involvement of other rare or non-typed variants and lack the statistical power necessary for these two interactions to reach the significance threshold pre-defined for the scan, these results demonstrate a complex genetic architecture involving non-additive and likely epistatic effects in the LPA region, underlying the regulation of Lp(a) expression and CAD risk.

Table 3

Model selection using AIC and Likelihood ratio test confirms epistatic interactions at the LPA locus

Model	Model name	AIC	Comparison M_SNPs	Comparison M_interact
No genetics	M_null	222 063.0	NA	NA
Haplotypes	M_haplo	221 813.6	NA	NA
SNPs	M_SNPs	221 818.3	NA	NA
SNPs + interactions	M_interact	221 810.6	0.0034	NA
Haplotypes + SNPs + interactions	M_full	221 814.4	0.0268	0.262

This table displays the Akaike Information Criterion (AIC) and results of likelihood ratio test for nested models of increasing complexity performed on the merged dataset including the 10 CAD studies and the UK Biobank dataset. The ‘Comparison M_SNPs’ and ‘Comparison M_interact’ columns respectively report the P-values of the likelihood ratio tests with the M_SNPs and M_interact models as null model. The 10 multi-dimensional scaling components of the genetic variance and were included as covariates in every model.

NA, non-applicable.

Discussion

We report a two-stage testing procedure for epistatic interactions affecting CAD risk. Our analysis identified two SNPs at the LPA locus that individually had no effect but jointly displayed a strong statistical association with expression of LPA mRNA in liver, Lp(a) levels in serum, and with risk for CAD, peripheral arterial disease, and aortic stenosis. Further exploration of the locus revealed that parts of these associations were explained by tagging of a low-frequency variant (rs140570886), which, in parallel with our study, was found to be associated with Lp(a) levels. In addition, we detected a complex pattern of interactions between this variant and two other SNPs in the LPA region. Together, these findings firstly provide evidence of epistatic interaction in a complex human disease and provide deeper insights into the genetic architecture of an important locus for cardiovascular risk. At the same time, these data highlight the challenges in confirming epistatic interactions affecting disease risk in humans. We focused our search for pairwise epistatic interactions on 8068 SNPs at 56 regions that had been found to be associated at genome-wide significance with CAD., Indeed, the selected window of LD-pruned SNPs contained two-fold more information on CAD heritability than the respective lead SNPs. Nevertheless, we found only four potentially interacting SNP pairs, which highlights the challenge to identify true epistasis modulating a human trait. The top-ranking interacting pair was located in cis at the LPA locus. Conditional analyses, aiming to determine the independence of the epistatic signal between rs1652507 and rs9458001 from other neighbouring SNPs, revealed a strong dependence on rs140570886. Thus, the seemingly strongest epistatic SNP pair tagged a rare genotype with profound effects on the phenotype. Further investigation of this variant showed its strong association with CAD and Lp(a) protein levels. rs140570886 has been previously associated with cardiovascular disease (CVD) using a new integrative framework named FIODOR. Our result, thus, using a well-established logistic regression model, confirmed the association of rs140570886 with diseases of the cardiovascular system. We, moreover, replicated the association of rs140570886 with Lp(a) levels reported by Mack et al. and identified data collections that indicate an association of rs140570886 with CAD. In addition, we report rs140570886 effects on Lp(a) levels to be independent of the KIV CNV repeats and to be a strong eQTL for LPA gene expression. Taken together, these findings support the hypothesis that rs140570886 mediates CAD risk through the Lp(a) levels via transcriptional regulation. An important methodological point highlighted by this study is the importance of the conditional follow-up analyses in the investigation of epistatic interactions. Indeed, an inherent challenge in testing for epistasis of nearby SNPs, even if they are in very low LD, is to discriminate truly interacting SNPs from SNPs tagging a specific haplotype. Resolving the dependence structure at the epistatic locus, by conditioning the interaction effect on the neighbouring SNPs, allowed us to simultaneously identify a tagged rarer variant and to fine-map the epistatic interaction at the LPA locus. The combinatorial nature of interactions has been a major hold-up in epistasis testing because it leads to an enormous search space and a high multiple testing correction burden., Methods to reduce this space can be divided into two categories: data-driven and knowledge-driven methods. We applied a data-driven approach in the present study, focusing on previously associated loci, for two reasons. First, variants already shown to be linked to the disease are likely to be functionally important. Second, if epistatic effects were detected among such variants, these effects would be more likely to affect the condition. Since regulatory variants might be located in the flanking region of the prioritized loci, we extended the search space to these regions. The discovery of four pairs of interacting SNPs using this filtering approach demonstrates its advantage over a hypothesis-free approach, in which these pairs would not have reached statistical significance due to having to correct for more tests. The findings relative to the genetic architecture of the LPA locus reported in this study carry a special clinical relevance for CAD risk detection and treatment. Indeed, Lp(a) concentrations have been shown to be usable for CAD risk prediction. For example, the Copenhagen City Heart Study showed for individuals above the 95th percentiles of the Lp(a) concentration to have a 2.5-fold higher CAD risk compared to individuals in the lowest quartile. Although Lp(a) concentration measurement and isoform determination are sufficient assays to estimate CAD risk encoded at the LPA locus, polygenic risk scores might play an additional role in the assessment of CAD risk in the future. Indeed, with the rapid drop of genotyping cost, individual genotype data are becoming a basic component of biobanks and clinical settings. With this perspective, a better understanding of the genetic architecture of the LPA locus and the incorporation of non-additive genetic effects, such as those reported in this study, might enhance the predictive power of polygenic risk scores and help the development of individually tailored disease prevention, which, in the future, may involve a pharmacological Lp(a) reduction. While a single epistatic interaction as reported in this manuscript is very unlikely to improve risk prediction on its own compared to polygenic risk score based on millions of SNPs, numerous interactions—if identified—might do so. Indeed, several observations argue in this direction. First, simulations and analyses by others indicate that epistasis cannot be ruled out as an important factor. Particularly, results from the UK Biobank are compatible with an upper bound of epistasis explaining slightly more than half as much as additive variance, and a point estimate of epistasis explaining a quarter of the amount of variance explained by additively acting loci. A further extension of epistasis scans, to testing combinations of variants from disease susceptibility regions against the whole genome, or even to genome-wide scans with different, a-priori-defined functional, information-based filters, might discover new epistatic interactions, thereby, improving both our understanding of disease aetiology and possibly prediction models.

Supplementary material

Supplementary material is available at Cardiovascular Research online. Click here for additional data file.

53 in total

1. Modeling genetic architecture: a multilinear theory of gene interaction.

Authors: T F Hansen; G P Wagner
Journal: Theor Popul Biol Date: 2001-02 Impact factor: 1.570

Review 2. KORA--a research platform for population based health research.

Authors: R Holle; M Happich; H Löwel; H E Wichmann
Journal: Gesundheitswesen Date: 2005-08

3. Genome-wide association analysis of high-density lipoprotein cholesterol in the population-based KORA study sheds new light on intergenic regions.

Authors: Iris M Heid; Eva Boes; Martina Müller; Barbara Kollerits; Claudia Lamina; Stefan Coassin; Christian Gieger; Angela Döring; Norman Klopp; Ruth Frikke-Schmidt; Anne Tybjaerg-Hansen; Anita Brandstätter; Andreas Luchner; Thomas Meitinger; H-Erich Wichmann; Florian Kronenberg
Journal: Circ Cardiovasc Genet Date: 2008-10

4. Estimation of non-additive genetic variance in human complex traits from a large sample of unrelated individuals.

Authors: Valentin Hivert; Julia Sidorenko; Florian Rohart; Michael E Goddard; Jian Yang; Naomi R Wray; Loic Yengo; Peter M Visscher
Journal: Am J Hum Genet Date: 2021-05-06 Impact factor: 11.025

Review 5. Structure, function, and genetics of lipoprotein (a).

Authors: Konrad Schmidt; Asma Noureen; Florian Kronenberg; Gerd Utermann
Journal: J Lipid Res Date: 2016-04-13 Impact factor: 5.922

6. Parental history is an independent risk factor for coronary artery disease: the Framingham Study.

Authors: R H Myers; D K Kiely; L A Cupples; W B Kannel
Journal: Am Heart J Date: 1990-10 Impact factor: 4.749

7. Are Interactions between cis-Regulatory Variants Evidence for Biological Epistasis or Statistical Artifacts?

Authors: Alexandra E Fish; John A Capra; William S Bush
Journal: Am J Hum Genet Date: 2016-09-15 Impact factor: 11.025

Review 8. Epistasis and quantitative traits: using model organisms to study gene-gene interactions.

Authors: Trudy F C Mackay
Journal: Nat Rev Genet Date: 2013-12-03 Impact factor: 53.242

9. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study.

Authors: Salim Yusuf; Steven Hawken; Stephanie Ounpuu; Tony Dans; Alvaro Avezum; Fernando Lanas; Matthew McQueen; Andrzej Budaj; Prem Pais; John Varigos; Liu Lisheng
Journal: Lancet Date: 2004 Sep 11-17 Impact factor: 79.321

10. Identification of 64 Novel Genetic Loci Provides an Expanded View on the Genetic Architecture of Coronary Artery Disease.

Authors: Pim van der Harst; Niek Verweij
Journal: Circ Res Date: 2017-12-06 Impact factor: 17.367

3 in total

Review 1. Lipoprotein(a) beyond the kringle IV repeat polymorphism: The complexity of genetic variation in the LPA gene.

Authors: Stefan Coassin; Florian Kronenberg
Journal: Atherosclerosis Date: 2022-05 Impact factor: 6.847

2. The effect of LPA Thr3888Pro on lipoprotein(a) and coronary artery disease is modified by the LPA KIV-2 variant 4925G>A.

Authors: Rebecca Grüneis; Claudia Lamina; Silvia Di Maio; Sebastian Schönherr; Peter Zoescher; Lukas Forer; Gertraud Streiter; Annette Peters; Christian Gieger; Anna Köttgen; Florian Kronenberg; Stefan Coassin
Journal: Atherosclerosis Date: 2022-04-26 Impact factor: 6.847

Review 3. Risk Prediction of Cardiovascular Events by Exploration of Molecular Data with Explainable Artificial Intelligence.

Authors: Annie M Westerlund; Johann S Hawe; Matthias Heinig; Heribert Schunkert
Journal: Int J Mol Sci Date: 2021-09-24 Impact factor: 5.923

3 in total