| Literature DB >> 33858484 |
Seema Singh Saharan1,2,3, Pankaj Nagar4, Kate Townsend Creasy5, Eveline O Stock5, James Feng5, Mary J Malloy6, John P Kane7.
Abstract
BACKGROUND: As per the 2017 WHO fact sheet, Coronary Artery Disease (CAD) is the primary cause of death in the world, and accounts for 31% of total fatalities. The unprecedented 17.6 million deaths caused by CAD in 2016 underscores the urgent need to facilitate proactive and accelerated pre-emptive diagnosis. The innovative and emerging Machine Learning (ML) techniques can be leveraged to facilitate early detection of CAD which is a crucial factor in saving lives. The standard techniques like angiography, that provide reliable evidence are invasive and typically expensive and risky. In contrast, ML model generated diagnosis is non-invasive, fast, accurate and affordable. Therefore, ML algorithms can be used as a supplement or precursor to the conventional methods. This research demonstrates the implementation and comparative analysis of K Nearest Neighbor (k-NN) and Random Forest ML algorithms to achieve a targeted "At Risk" CAD classification using an emerging set of 35 cytokine biomarkers that are strongly indicative predictive variables that can be potential targets for therapy. To ensure better generalizability, mechanisms such as data balancing, repeated k-fold cross validation for hyperparameter tuning, were integrated within the models. To determine the separability efficacy of "At Risk" CAD versus Control achieved by the models, Area under Receiver Operating Characteristic (AUROC) metric is used which discriminates the classes by exhibiting tradeoff between the false positive and true positive rates.Entities:
Keywords: AUROC; CAD; Classification; Confidence interval; Distance metrics; ML; Plasma cytokines; ROSE; Random Forest; k-NN; k-fold cross validation
Year: 2021 PMID: 33858484 PMCID: PMC8050889 DOI: 10.1186/s13040-021-00260-z
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Clinical Demographic Profile. A collection of plasma samples from patients with diagnosed coronary artery disease (CAD) and healthy controls
| Gender | CAD ( | Control ( | Total ( |
|---|---|---|---|
| 19 | 26 | 45 (43.27%) | |
| 20 | 39 | 59 (56.73%) | |
| 39 (37.5%) | 65 (62.5%) | 104 |
Fig. 1The classifier framework implemented to identify “At Risk” Coronary Artery Disease (CAD) classification
Classifier 1 Experiment Results for the k-NN algorithm with 35 cytokines and k = 9
| Algorithm | Classification Criterion | Predictor Feature Space | AUROC with 95% Confidence Interval | Prediction Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| k-NN | Distance Measure: Euclidean with k = 9 | 35 Cytokines | 0.832 | 0.992 | 0.658 |
Classifier 2 Experiment Results for the Random Forest with 35 cytokines
| Algorithm | Classification Criterion | Predictor Feature Space | AUROC with 95% Confidence Interval | Prediction Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| Random Forest | Decision Trees | 35 Cytokines | 0.96 | 0.954 | 0.967 |
Fig. 2Classifier 1 Experiment: Optimal number of neighbors for k-NN (Euclidean Distance) with a total set of 35 cytokines
Fig. 3Classifier 1 Experiment: AUROC curve with 95% CI for k-NN (Euclidean Distance) with a total set of 35 cytokines
Fig. 4Classifier 5 Experiment: AUROC curve for Random Forest with the total set of 35 cytokines
Fig. 5Classifier Comparison: AUROC curve, Sensitivity, Specificity for Random Forest and k-NN with the total set of 35 cytokines