| Literature DB >> 33171725 |
Solomon M Adams1, Habiba Feroze1, Tara Nguyen1, Seenae Eum1, Cyrille Cornelio1, Arthur F Harralson1.
Abstract
Predicting risk for major adverse cardiovascular events (MACE) is an evidence-based practice that incorporates lifestyle, history, and other risk factors. Statins reduce risk for MACE by decreasing lipids, but it is difficult to stratify risk following initiation of a statin. Genetic risk determinants for on-statin MACE are low-effect size and impossible to generalize. Our objective was to determine high-level epistatic risk factors for on-statin MACE with GWAS-scale data. Controlled-access data for 5890 subjects taking a statin collected from Vanderbilt University Medical Center's BioVU were obtained from dbGaP. We used Random Forest Iterative Feature Reduction and Selection (RF-IFRS) to select highly informative genetic and environmental features from a GWAS-scale dataset of patients taking statin medications. Variant-pairs were distilled into overlapping networks and assembled into individual decision trees to provide an interpretable set of variants and associated risk. 1718 cases who suffered MACE and 4172 controls were obtained from dbGaP. Pathway analysis showed that variants in genes related to vasculogenesis (FDR = 0.024), angiogenesis (FDR = 0.019), and carotid artery disease (FDR = 0.034) were related to risk for on-statin MACE. We identified six gene-variant networks that predicted odds of on-statin MACE. The most elevated risk was found in a small subset of patients carrying variants in COL4A2, TMEM178B, SZT2, and TBXAS1 (OR = 4.53, p < 0.001). The RF-IFRS method is a viable method for interpreting complex "black-box" findings from machine-learning. In this study, it identified epistatic networks that could be applied to risk estimation for on-statin MACE. Further study will seek to replicate these findings in other populations.Entities:
Keywords: cardiovascular disease; epistasis; pharmacogenomics; random forest; statin
Year: 2020 PMID: 33171725 PMCID: PMC7712544 DOI: 10.3390/jpm10040212
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Population demographics.
| Variable | Control | Case |
|
|---|---|---|---|
| Female (%) | 38.0% | 31.2% | <0.001 |
| White (%) | 99.3% | 99.4% | 0.507 |
| BMI (Mean ± SD) | 29.03 ± 7.37 | 28.57 ± 7.035 | 0.253 |
| Age First MACE (Median ± IQR) | N/A | 65 ± 16 | 1 |
Figure 1Manhattan plots for statistical GWAS analysis with PLINK (top) vs. the initial RF model with ranger (bottom). Red dots correspond to variants that were selected with r2VIM, and show that a purely statistical approach fails to identify variants that are likely relevant to the outcome due to interactions.
Figure 2Paired selection frequency based on the combined independent variant probabilities (X axis) vs. the actual frequency of variants being selected together in a decision tree. Variants that are selected together at a lower-than-expected frequency are expected to be correlated with respect to the outcome, suggesting that they are in linkage disequilibrium (blue). Variants selected together more often than expected (red) are predicted to exhibit epistasis with respect to the phenotype.
Significant Ensemble and Regression Variant Pairs.
| Variant 1 | Variant 2 |
|
|
|---|---|---|---|
| sex | CDCA7 (3’ 242.48 kb) rs6731912 | 0.001 | 0.029 |
| sex | NAALADL2 (3’ 441.92 kb) rs1471695 | <0.001 | 0.082 |
| sex | HAND2-AS1 (3’ 157.42 kb) rs9312547 | <0.001 | 0.007 |
| sex | NNMT (5’ 4.01 kb) rs2244175 | 0.021 | 0.016 |
| sex | ANKFN1 (5’ 115.11 kb) rs8082489 | <0.001 | 0.007 |
| SZT2 rs2842180 | COL4A2 rs9515203 | <0.001 | 0.004 |
| VAV3-AS1 rs3747945 | NPAS3 rs8008403 | 0.001 | 0.011 |
| KCNT2 (3’ 1239.56 kb) rs6693848 | PECAM1 rs2812 | <0.001 | 0.004 |
| KCNT2 (3’ 1239.56 kb) rs6693848 | PECAM1 rs9303470 | <0.001 | 0.004 |
| KCNT2 (3’ 1239.56 kb) rs6693848 | PECAM1 (5’ 1.22 kb) rs6504218 | 0.032 | 0.004 |
| ALCAM (5’ 150.37 kb) rs9818420 | STMND1 rs927629 | <0.001 | 0.001 |
| NAALADL2 (3’ 441.92 kb) rs1471695 | RFX7 (5’ 10.73 kb) rs2713935 | <0.001 | 0.005 |
| PDGFC rs1425486 | FTMT (5’ 478.9 kb) rs246210 | <0.001 | 0.001 |
| FTMT (5’ 478.9 kb) rs246210 | DAB2IP rs7025486 | <0.001 | 0.001 |
| ZFP2 rs953741 | CDKN2B-AS1 rs1333042 | 0.011 | 0.004 |
| STMND1 rs927629 | SMOC2 rs13205533 | <0.001 | 0.004 |
| SMOC2 rs13205533 | PECAM1 rs2812 | 0.043 | 0.016 |
| TBXAS1 rs6464448 | COL4A2 rs9515203 | 0.014 | 0.009 |
| TMEM178B rs7790976 | COL4A2 rs9515203 | 0.043 | 0.004 |
| CDKN2B-AS1 rs2383207 | SERPINA13 rs17826595 | 0.001 | 0.016 |
| SFMBT2 rs10453997 | CWF19L2 (3’ 106.96 kb) rs4754193 | <0.001 | 0.001 |
| CWF19L2 (3’ 106.96 kb) rs4754193 | NNMT (5’ 4.01 kb) rs2244175 | 0.006 | 0.011 |
| GATM (3’ 12.69 kb) rs2461700 | ZNF404 rs1978723 | <0.001 | 0.005 |
Figure 3Decision trees incorporating overlapping epistasis-variant-pairs show six unique networks of genes and variants. Odds ratios in terminal nodes represent subject odds of on-statin MACE in someone carrying the collection of alleles shown in the network relative to those who did not carry those variants. This shows a practical interpretation of epistasis findings that might be more practical to incorporate into clinical practice, though validation and replication in independent populations will be necessary to drive clinical translation.
Gene network associated disease processes
| Diseases or Functions | Genes | FDR |
|---|---|---|
| Angiogenesis | ALCAM CDKN2B COL4A2 DAB2IP PDGFC PECAM1 SMOC2 VAV3 | 0.0188 |
| Carotid artery disease | NNMT VAV3 | 0.034 |
| Development of vasculature | ALCAM CDKN2B COL4A2 DAB2IP NPAS3 PDGFC PECAM1 SMOC2 VAV3 | 0.0188 |
| Endothelial cell development | COL4A2 PDGFC PECAM1 SMOC2 | 0.0291 |
| Formation of blood vessel | CDKN2B COL4A2 PECAM1 | 0.0242 |
| Formation of endothelial tube | COL4A2 PECAM1 | 0.0291 |
| Function of endothelial tissue | PECAM1 VAV3 | 0.0188 |
| Migration of endothelial cells | ALCAM COL4A2 PECAM1 SMOC2 VAV3 | 0.0188 |
| Quantity of endothelial cells | ALCAM PDGFC | 0.023 |
| Vasculogenesis | ALCAM CDKN2B COL4A2 PDGFC PECAM1 SMOC2 | 0.0242 |
Inclusion and Exclusion Criteria.
| MACE on statin, defined as either AMI or revascularization on statin |
|---|
|
|
| - At least two ICD9 code for AMI or other acute and subacute forms of ischemic heart disease within a five-day window |
| - Confirmed lab within the same time window |
| - Statin prescribed prior to the AMI event in medical records at least 180 days |
|
|
| - At least one revascularization CPT code |
| - Statin prescribed prior to the revascularization event in medical records at least 180 days |
|
|
| - No diagnosis code for AMI, other acute and subacute forms of ischemic heart disease, or historical AMI assigned previously |
| - No revascularization CPT codes assigned previously |
| - No MACE (Major Adverse Cardiovascular Events) found in previous problem list by NLP |
|
|
| - Statin prescribed |
| - No diagnosis code for AMI, other acute and subacute forms of ischemic heart, or historical AMI assigned previously |
| - No revascularization CPT codes assigned previously |
| - No MACE found in previous problem list by NLP |
| - Controls match cases by age, gender, statin type (e.g., simvastatin), and statin exposure |