| Literature DB >> 34638627 |
Annie M Westerlund1,2, Johann S Hawe1, Matthias Heinig2,3, Heribert Schunkert1,4.
Abstract
Cardiovascular diseases (CVD) annually take almost 18 million lives worldwide. Most lethal events occur months or years after the initial presentation. Indeed, many patients experience repeated complications or require multiple interventions (recurrent events). Apart from affecting the individual, this leads to high medical costs for society. Personalized treatment strategies aiming at prediction and prevention of recurrent events rely on early diagnosis and precise prognosis. Complementing the traditional environmental and clinical risk factors, multi-omics data provide a holistic view of the patient and disease progression, enabling studies to probe novel angles in risk stratification. Specifically, predictive molecular markers allow insights into regulatory networks, pathways, and mechanisms underlying disease. Moreover, artificial intelligence (AI) represents a powerful, yet adaptive, framework able to recognize complex patterns in large-scale clinical and molecular data with the potential to improve risk prediction. Here, we review the most recent advances in risk prediction of recurrent cardiovascular events, and discuss the value of molecular data and biomarkers for understanding patient risk in a systems biology context. Finally, we introduce explainable AI which may improve clinical decision systems by making predictions transparent to the medical practitioner.Entities:
Keywords: AI; biomarkers; cardiovascular disease; coronary artery disease; explainable artificial intelligence; genomics; machine learning; molecular networks; multi-omics; proteomics
Mesh:
Substances:
Year: 2021 PMID: 34638627 PMCID: PMC8508897 DOI: 10.3390/ijms221910291
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Table of clinical datasets used to train traditional risk scores. LDL—Low-density lipid; HDL—High-density lipid; BMI—Body mass index; LV—Left ventricle; CKD—Chronic kidney disease.
| Dataset | Cohort Size | Attributes | Follow Up | Ref. |
|---|---|---|---|---|
|
| ||||
| age, sex, | ||||
| American Framingham | ∼15,000 | diabetes, | 12 years | [ |
| heart study | LDL and HDL | |||
| cholesterol, smoking, | ||||
| systolic blood pressure | ||||
| Sex, smoking, | ||||
| SCORE pooled dataset | ∼21,000 | total cholestrol, | Average | [ |
| (12 European cohorts) | tot. chol./HDL ratio, | 13 years | ||
| systolic blood pressure | ||||
| ACC/AHA | FRS attributes, tot. chol. | |||
| pooled | ∼25,000 | treated/untreated | ≥12 years | [ |
| cohort | systolic blood pressure | |||
| FRS attributes, | ||||
| QRESEARCH | social deprivation, | |||
| (No diabetes or | ∼10 million | family history, | 17 years | [ |
| CVD at baseline) | BMI, LV function, | (study length) | ||
| antihypertensive | ||||
| agent treatment | ||||
| Suita dataset | ∼5600 | FRS attributes, CKD | 11.8 years | [ |
Table of traditional risk scores based on clinical datasets. MI—myocardial infarction; TIA—transient ischaemic attack; CAD—coronary artery disease; CVD—cardiovascular disease; PH—proportional hazards.
| Risk Score | Dataset | Clinical Question | Method | Performance | Ref. |
|---|---|---|---|---|---|
| 0.733–0.841/ | |||||
| Framingham | American | 10-year risk; | 0.769–0.847 | ||
| risk score | Framingham | CAD, | Cox PH | (C-statistic, | [ |
| heart study | CVD events | events average, | |||
| male/female) | |||||
| SCORE | 10-year risk; | 0.71–0.84 | |||
| SCORE | pooled | CAD/CVD | Weibull | (AUC-ROC, | [ |
| dataset | mortality | EU countries) | |||
| ACC/AHA | ACC/AHA | 10-year risk; | 0.713–0.818 | ||
| pooled cohort | pooled | atherosclerotic | Cox PH | (C-statistic, | [ |
| equations | cohort | CVD events | sex/race average) | ||
| 10-year risk; | 0.7674/0.7879 | ||||
| QRISK | QRESEARCH | CVD (MI, CAD, | Cox PH | (AUC-ROC, | [ |
| stroke, TIA) | male/female) | ||||
| Suita | Suita dataset | 10-year risk; | Cox PH | 0.835 | [ |
| CAD | (C-statistic) |
Table of clinical, genetic, and imaging datasets used to predict risk for recurrent events. CCTA—coronary computed tomography angiography; CVD—cardiovascular disease; CAD—coronary artery disease; CeVD—cerebrovascular disease; PAD—peripheral artery disease; ACS—acute coronary syndrome.
| Dataset | Cohort Size | Type of Data | Baseline | Follow Up | Ref. |
|---|---|---|---|---|---|
| Clinical | |||||
| ( | ∼102,000 | risk | ACS | 6 months | [ |
| registry | (30 countries) | factors | |||
| Clinical risk | CVD | ||||
| ( | ∼68,000 | factors, | (CAD, CeVD, | 1–2 years | [ |
| registry | (44 countries) | demographics | PAD) | ||
| Clinical | |||||
| ( | IV/V: ∼16,000 | risk factors, | CAD | 0.5–3 years | [ |
| (27 countries) | lifestyle | ||||
| Clinical | |||||
| ( | ∼8600 | risk | ACS | 2.5 years | [ |
| factors | (median) | ||||
| CCTA images, | |||||
| ( | ∼50,000 | clinical | suspected | 2.3 years | [ |
| registry | (6 countries) | risk factors | CAD | (median) | |
| Clinical risk | |||||
| factors, carotid | |||||
| ( | ∼13,000 | ultrasound, | CVD | 4.7 years | [ |
| (567 patients with CCTA images | (median) | ||||
| ∼186,000 | |||||
| ( | (57 studies, | Genotype | CAD | 1–15 years | [ |
| 18 countries) |
Table of clinical risk scores for predicting recurrent events. CVD—cardiovascular disease; MI—Myocardial infarction; PH—proportional hazards; CI—confidence interval.
| Risk score | Dataset | Clinical Question | Method | Performance | Ref. |
|---|---|---|---|---|---|
| GRACE registry | 6-month risk; | 0.7/0.82 | |||
| ( | (43,810 patients, | death, or | Cox PH | (C-statistic, | [ |
| 14 countries) | death/MI | death/death-MI) | |||
| REACH | 20-month risk; | 0.67 [0.66, 0.68]/ | |||
| registry | CVD events, | 0.75 [0.73, 0.77] | |||
| ( | (49,689 patients, | cardiovasc. | Cox PH | (C-statistic, | [ |
| 44 countries) | death | 95% CI, | |||
| CVD/death) | |||||
| EuroAspire | 2-year risk; | ||||
| ( | (IV/V, | CVD events | Weibull | 0.67 [0.64, 0.70] | [ |
| 27 countries, | or | (C-statistic, | |||
| 12,484 patients) | interventions | 95% CI) | |||
| 3-year risk; | |||||
| ( | TIMI | Cardiovasc. | Cox PH | 0.67 [0.65, 0.69] | [ |
| (8598 patients, | death, MI, | (C-statistic, | |||
| 9 predictors) | stroke | 95% CI) | |||
| CONFIRM | 2-year risk; | ||||
| ( | registry | death | Cox PH | 0.682 | [ |
| (20,300 patients) | (C-statistic) | ||||
| UCC-SMART | 10-year risk; | 0.68 [0.64, 0.71] | |||
| ( | (5788 patients, | CVD events | Cox PH | (C-statistic, | [ |
| 14 predictors) | 95% CI) |
Description of performance metrics. Here, a “positive” label can, for example, correspond to “having CVD”, while a “negative” may correspond to “healthy”.
| Metric | Description | Math. Definition |
|---|---|---|
| True positive | A positive sample correctly |
|
| (TP) | predicted by the model. | |
| True negative | A negative sample correctly |
|
| (TN) | predicted by the model. | |
| False positive | A sample wrongly classified as |
|
| (FP) | positive by the model. | |
| False negative | A sample wrongly classified as |
|
| (FN) | negative by the model. | |
| Precision | Fraction of true positives among |
|
| the predicted positives. |
| |
| Recall | Fraction of positives that are |
|
| (Sensitivity) | correctly predicted. |
|
| Specificity | Fraction of negatives that are |
|
| correctly predicted. |
| |
| Accuracy | Fraction of correctly predicted |
|
| positives and negatives. |
| |
| ROC curve | A curve indicating performance | |
| (Receiver Operating | of a classifier. The |
|
| Characteristic) | recall and the | |
| s = (1-specificity) | ||
| AUC-ROC | Quantitative performance | |
| (Area Under the Curve | measure based on ROC curve. | |
| - ROC) | Ranges from 0 to 1, where 1 |
|
| corresponds to perfect, and 0.5 | ||
| to random, classification. | ||
| C-statistic | Equivalent to AUC-ROC. Can | |
| be used for censored data | ||
| (missing patient outcomes). | ||
|
| ||
|
| ||
| information exists. | ||
|
| ||
| PR curve | Similar to ROC curve. | |
| (Precision Recall) | shows precision and |
|
| recall ( | ||
| AUC-PR | Quantitative performance | |
| (AUC—Precision-Recall) | measure based on PR curve. |
|
| Alternative to AUC-ROC. |
Figure 1Overview of how molecular data can be used for understanding and predicting the risk of recurrent cardiovascular events. Genome-wide association studies (GWAS) can be used to identify CVD risk loci. Weights obtained from GWAS can be used to calculate a polygenic risk score. Moreover, the GWAS loci can be combined with multi-omics data and prior knowledge to construct regulatory networks. From these networks, it is possible to extract physiological pathways and network modules, as well as associate the level of activity of distinct network regions with high or low risk. The network information and polygenic risk score can be integrated together to improve risk prediction of recurrent cardiovascular events.
Figure 2(A) Illustration of a typical AI workflow. Each patient is first described by the same set of numerical and/or categorical attributes (features), such as risk factors or gene expression levels. The data (patients) are then divided into training, validation and test sets. AI models with different values of non-trainable parameters (hyperparameters) are trained on the training set. The model performance is evaluated on the validation set according to some metric. A final model with the hyperparameters yielding the best validation-set performance is then evaluated on the independent test set. (B) Illustration of a perceptron neural network with a sigmoid activation function. (C) Illustration of a multi-layer neural network with one hidden layer. The arrows indicate direction of feed-forward and back-propagation passes.
Performance of AI models for predicting primary or recurrent events compared to traditional survival models. CVD—cardiovascular disease; CAD—coronary artery disease; ACS—acute coronary syndrome; PH—proportional hazards; CI—confidence interval; NPHS—National Pulmonary Hypertension Service; EHR—Electronic health records; LR—Logistic regression; RF—Random forest; MLP—multi-layer perceptron; NB—Naive Bayes.
| Risk Score/ | Dataset | Clinical | Prediction | Comparison | Ref. |
|---|---|---|---|---|---|
| Method | Question | Performance | (Cox PH) | ||
| ( | UK Biobank | 5 year-risk; |
|
| |
| prognosis | (clinical data, | Fatal or | 0.774 | 0.758 | |
| framework | 423,604 | non-fatal | [0.768, 0.780] | [0.753, 0.763] | [ |
| patients) | CVD event |
| |||
| 0.734 | |||||
| [0.729, 0.739] | |||||
| ( | NHANES | Predict |
| - | |
| | (clinial, lab, | presence | |||
| | demographic | of CAD | [ | ||
| | data, | ||||
| | 37,079 | ||||
| | patients) | ||||
| | |||||
| ( | EHR | 5-year risk; |
|
| |
| (clinical data, | MI, stroke, |
|
| ||
| | socioecomics | or fatal CAD | - /0.775 | ||
| 262,923/ | [0.825, 0.846]/ | [-, -]/ | |||
| 131,721 | [0.760, 0.790] | [0.755, 0.794] | |||
| | patients) | ||||
| [0.765, 0.802]/ | |||||
| [0.812, 0.839] | |||||
| | [ | ||||
| [0.766, 0.803]/ | |||||
| [0.816, 0.843] | |||||
| | |||||
| [0.760,0.793]/ | |||||
| [0.820,0.842] | |||||
| | |||||
| [0.729,0.770]/ | |||||
| [0.795,0.820] | |||||
| ( | BleeMACS, | 1-year risk; |
| - | |
| (AdaBoost) | RENAMI | ||||
| (clinical data, | [ | ||||
| 19,826 | |||||
| patients) | (ACS at baseline) | ||||
| ( | CONFIRM | 5-year risk; |
|
| |
| (logit-boost) | registry | Death | 0.79 | 0.61 | [ |
| (10,030 patients) | (suspected CAD | [0.77, 0.81] | [0.59, 0.64] | ||
| at baseline) | |||||
|
|
|
|
| ||
| ( | NPHS | Survival-times |
| ||
| (survival | (MR imaging, |
| 0.75 | 0.64 | [ |
| autoencoder) | clinical data, |
| [0.70, 0.79] | [0.57, 0.70] | |
| 302 patients) |
| ||||
| ( | GerMIFS I-V, | Genetic risk; |
| - | |
| | LURIC | CAD | |||
| | (∼ 2.8M SNPs, | [ | |||
| | 15,709 patients) | ||||
| | |||||
| |
Figure 3A typical clinical decision system with explainable AI. An AI model is first trained on a cohort containing for example clinical, imaging or multi-omics data. The trained model is then used to predict the risk for a patient to develop the disease or specific symptoms. Finally, the explainable AI provides information about the decision patterns, which helps the medical practitioner to assess faithfulness of the prediction and formulate a treatment strategy.