| Literature DB >> 29073909 |
Cihan Oguz1, Shurjo K Sen1, Adam R Davis1, Yi-Ping Fu2,3, Christopher J O'Donnell3,4,5,6, Gary H Gibbons7,8.
Abstract
BACKGROUND: One goal of personalized medicine is leveraging the emerging tools of data science to guide medical decision-making. Achieving this using disparate data sources is most daunting for polygenic traits. To this end, we employed random forests (RFs) and neural networks (NNs) for predictive modeling of coronary artery calcium (CAC), which is an intermediate endo-phenotype of coronary artery disease (CAD).Entities:
Keywords: Case-control study; Coronary artery calcium; Coronary heart disease; Genotype data; Neural networks; Random forest; Systems biology
Mesh:
Substances:
Year: 2017 PMID: 29073909 PMCID: PMC5659034 DOI: 10.1186/s12918-017-0474-5
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1Overall strategy of the analysis
Fig. 2Schematic of the modeling approach
Predictive importance values of clinical variables in ClinSeq®; and FHS cohorts. Only the instances with positive predictive importance are reported
| Clinical variable | Predictive importance |
|---|---|
| Total cholesterol | 8.60 (FHS) |
| Systolic blood pressure | 6.24 (FHS), 12.94 (ClinSeq®;) |
| Diastolic blood pressure | 2.88 (FHS) |
| Fibrinogen | 1.81 (FHS), 3.50 (ClinSeq®;) |
| Fasting Blood Glucose | 0.024 (FHS) |
| HDL cholesterol | 18.39 (ClinSeq®;) |
Fig. 3Predictive performance plotted against the number of predictors in ClinSeq®; and FHS cohorts. Model inputs are derived only from clinical variables
Predictive performances of RF models (quantified by the mean ± standard deviation values of AUC) trained and tested with different predictor sets in the ClinSeq®; and FHS cohort data
| Predictors | Optimal # markers | Optimal AUC |
|
|---|---|---|---|
| CLIN | 3 (ClinSeq®;), 3 (FHS) | 0.69 ±0.02 (ClinSeq®;), 0.61 ±0.02 (FHS) | 0.015 (ClinSeq®;), 0.080 (FHS) |
| SNP Set-2 | 21 (ClinSeq)®;, 21 (FHS) | 0.99 ±0.01 (ClinSeq®;), 0.85 ±0.02 (FHS) | <0.001 (ClinSeq®;), <0.001 (FHS) |
| CLIN+SNP Set-2 | 21 (ClinSeq®;), 18 (FHS) | 0.99 ±0.01 (ClinSeq®;), 0.83 ±0.01 (FHS) | <0.001 (ClinSeq®;), <0.001 (FHS) |
“CLIN” corresponds to the nine clinical variables listed in Additional file 1: Table S1 (all variables except age and gender)
Predictive importance values of the set of SNPs that generate optimal predictive performance in both cohorts. Nearest genes are listed for intergenic SNPs (marked with asterisk)
| SNP | Locus | Predictive importance | Predictive importance | Percent |
|---|---|---|---|---|
| (ClinSeq®;) | (FHS) | difference | ||
| rs13159307 |
| 28.83 | 21.64 | 24.94 |
| rs8107904 |
| 36.95 | 21.83 | 40.92 |
| rs571797 |
| 17.68 | 6.86 | 61.20 |
| rs2390285 |
| 22.86 | 17.27 | 24.45 |
| rs342393 |
| 18.04 | 15.34 | 14.97 |
| rs13429160 |
| 35.68 | 16.89 | 52.66 |
| rs11674863 |
| 26.18 | 15.74 | 39.88 |
| rs514237 |
| 19.09 | 24.81 | 23.06 |
| rs6860493 |
| 20.72 | 26.39 | 21.49 |
| rs10054519 |
| 21.17 | 25.25 | 16.16 |
| rs12521249 |
| 21.17 | 25.44 | 16.78 |
| rs10065689 |
| 20.45 | 25.55 | 19.96 |
| rs2241097 |
| 34.02 | 24.11 | 29.13 |
| rs10059993 |
| 20.82 | 24.77 | 15.95 |
| rs12645809 |
| 22.1 | 25.33 | 12.75 |
| rs480220 |
| 19.76 | 24.01 | 17.70 |
| rs1366410 |
| 21.15 | 23.77 | 11.02 |
| rs11767632 |
| 32.09 | 20.94 | 34.75 |
| rs7713479 |
| 21.11 | 37.48 | 43.68 |
| rs243172 |
| 34.90 | 46.17 | 24.41 |
| rs243170 |
| 35.91 | 51.20 | 29.86 |
The normalized difference of the predictive importance values of each SNP in two cohorts (difference divided by the higher predictive importance value in the two cohorts) has a median value of 24% (interquartile range:17%-36%). In terms of predictive importance based ranking, five of the top 11 SNP predictors (with 65% of the cumulative predictive importance) are common, whereas nine of the top 14 SNP predictors (with 76% of the cumulative predictive importance) overlap between two cohorts
*Intergenic SNPs for which the nearest genes are reported
Fig. 4Properties of 36 optimal NN models trained with data from the discovery cohort and tested with data from the replication cohort. Median AUC value for each network topology (ranging between 0.8021 and 0.8515) and the corresponding p-values. Third quartile of the AUC values among different network topologies ranged between 0.8503 and 0.9074
Fig. 5Network of genes derived from GeneMANIA (based on 244 studies in humans) using the most predictive set of SNPs in this study. The connections in pink are derived from gene coexpression data, whereas the connections in green are derived from genetic interaction data from the literature. The inner circle is composed of genes on which the subset of SNPs in SNP Set-2 leading to optimal performance in both cohorts are present, whereas the genes forming the outer circle are additional genes identified by GeneMANIA. The thicknesses of the links (or edges) between the genes are proportional to the interaction strengths, whereas the node size for each gene is proportional to the rank of the gene based on its importance (or gene score) within the network. All interactions within this network are listed in Additional file 1: Table S8
Enriched diseases and biological functions (in the network of genes derived from GeneMANIA) with p-values ranging between 1.0E-4 and 1.0E-2 as identified by IPA based on Fisher’s exact test
| Category | Disease or function | Genes |
|
|---|---|---|---|
| Connective tissue development and function | Quantity of adipose tissue |
| 3.58E-4 |
|
| |||
| Connective tissue development and function | Differentiation of adipocytes |
| 8.82E-4 |
|
| |||
| Cardiovascular disease | Angiectasis of blood vessel |
| 9.87E-4 |
| Cardiovascular system development and function | Area of capillary vessel |
| 9.87E-4 |
| Hematological system development and function | Cell division of |
| 9.87E-4 |
| peripheral blood lymphocytes | |||
| Cardiovascular disease, endocrine system disorders, | Susceptibility to insulin |
| 9.87E-4 |
| Metabolic disease | resistance-related hypertension | ||
| Cardiac necrosis, cell death and survival | Cell death of heart tissue |
| 1.97E-3 |
| Cellular movement | Migration of connective tissue cells |
| 2.14E-3 |
|
| |||
| Carbohydrate metabolism, cellular function | Homeostasis of D-glucose |
| 2.46E- 3 |
| and maintenance |
| ||
| Nucleic acid metabolism | Conversion of NAD+ |
| 2.96E- 3 |
| Cardiovascular system development and function | Tethering of endothelial cell lines |
| 2.96E-3 |
| Cellular compromise, inflammatory response | Degranulation of beta islet cells |
| 3.94E-3 |
| Cardiovascular system development and function | Density of blood vessel tissue |
| 3.94E-3 |
| Endocrine system disorders, hematological disease | Onset of hyperglycemia |
| 3.94E-3 |
| Metabolic disease | |||
| Carbohydrate metabolism | Tolerance of D-glucose |
| 4.93E- 3 |
| Cardiovascular system development and function | Angiogenesis of heart |
| 5.91E-3 |
| Cardiovascular system development and function | Density of blood vessel |
| 5.96E-3 |
| Immune cell trafficking, inflammatory response | Adhesion of neutrophils |
| 7.52E-3 |
| Hematological system development and function |
| ||
| Endocrine system development and function | Insulin sensitivity of liver |
| 7.87E-3 |
| Hepatic system development and function | |||
| Nucleic acid metabolism | Metabolism of NADPH |
| 7.87E-3 |
| Connective tissue development and function | Quantity of visceral fat |
| 8.85E- 3 |
| Carbohydrate metabolism | Binding of chondroitin sulfate |
| 9.83E-3 |
51 additional enriched diseases and biological functions (statistically less significant) with p-values ranging between 1.0E-2 and 5.0E-2 are listed in Additional file 1: Table S7