| Literature DB >> 29258510 |
Sherif Sakr1, Radwa Elshawi2, Amjad M Ahmed1, Waqas T Qureshi3, Clinton A Brawner4, Steven J Keteyian1, Michael J Blaha5, Mouaz H Al-Mallah6,7.
Abstract
BACKGROUND: Prior studies have demonstrated that cardiorespiratory fitness (CRF) is a strong marker of cardiovascular health. Machine learning (ML) can enhance the prediction of outcomes through classification techniques that classify the data into predetermined categories. The aim of this study is to present an evaluation and comparison of how machine learning techniques can be applied on medical records of cardiorespiratory fitness and how the various techniques differ in terms of capabilities of predicting medical outcomes (e.g. mortality).Entities:
Keywords: All-cause mortality; FIT (Henry ford ExercIse testing) project; Machine learning
Mesh:
Year: 2017 PMID: 29258510 PMCID: PMC5735871 DOI: 10.1186/s12911-017-0566-6
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Baseline Characteristics for Included Study Cohort
| Characteristic | Data ( |
|---|---|
| Age (years)a | 54 ± 13 |
| Maleb | 18,703 (55) |
| Raceb | |
| White | 23,801 (70) |
| Black | 9768 (29) |
| Others | 643 (1) |
| Body Mass Index (kg/m2)a | 29.3 ± 5.8 |
| Reason for Testb | |
| Chest Pain | 17,547 (51) |
| Shortness of Breath | 3307 (10) |
| Pre-Operation | 781 (2) |
| Rule out Ischemia | 3884 (11) |
| Stress Variablesa | |
| Peak METS | 9.2 ± 3.1 |
| Resting Systolic Blood Pressure (mmHg) | 132 ± 19 |
| Resting Diastolic Blood Pressure (mmHg) | 82 ± 11 |
| Resting Heart rate (bpm) | 74 ± 13 |
| Peak Systolic Blood Pressure (mmHg) | 183 ± 27 |
| Peak Diastolic Blood Pressure (mmHg) | 86 ± 14 |
| Peak Heart Rate (bpm) | 151 ± 21 |
| Chronotropic incompetenceb | 6957 (23.3) |
| Past Medical Historyb | |
| Diabetes | 5907(17) |
| Hypertension | 20,534 (60) |
| Smoking | 15,249 (43) |
| Family History of CAD | 18,299 (51) |
| Medications Usedb | |
| Diuretic Use | 5743 (16) |
| Hypertensive medications | 14,905 (42) |
| Diabetes medications | 2432 (7) |
| Statin | 4524 (13.2) |
| Aspirin | 5752 (16.8) |
| Beta Blockers | 5434 (15.9) |
| Calcium Channel Blockers | 4638 (13.5) |
mmHg millimeter mercury, bpm beat per minute, CAD coronary artery disease
All the data are presented as:
aMean and standard deviation and
bfrequencies and percentages
Fig. 1The ranking of the variables based on the outcome of the Feature Selection Process
Comparison of the performance of Decision Tree (DT) classifier with sampling using confidence parameter (Conf) equals 0.1, 0.25, 0.5, 0.75 and 1
| Conf = 0.1 | Conf = 0.25 | Conf = 0.5 | Conf = 0.75 | Conf = 1 | |
|---|---|---|---|---|---|
| Sensitivity | 50.52% | 55.71% | 59.33% | 59.95% | 59.12% |
| Specificity | 94.05% | 64.97% | 95.56% | 96.05% | 95.74% |
| Precision | 55.69% | 61.87% | 67.08% | 70.91% | 68.52% |
| F-score | 52.98% | 58.63% | 62.97% | 64.97% | 63.48% |
| RMSE | 0.31 | 0.29 | 0.28 | 0.27 | 0.28 |
| AUC | 0.83 | 0.84 | 0.87 | 0.88 | 0.87 |
Comparison of the performance of Decision Tree (DT) classifier without sampling using confidence parameter (Conf) equals 0.1, 0.25, 0.5, 0.75 and 1
| Conf = 0.1 | Conf = 0.25 | Conf = 0.5 | Conf = 0.75 | Conf = 1 | |
|---|---|---|---|---|---|
| Sensitivity | 61.52% | 54.43% | 43.48% | 36.11% | 36.11% |
| Specificity | 90.09% | 90.51% | 90.91% | 90.95% | 90.95% |
| Precision | 18.21% | 22.80% | 28.16% | 30.17% | 30.17% |
| F-score | 28.10% | 32.14% | 34.18% | 32.87% | 32.87% |
| RMSE | 0.3 | 0.3 | 0.33 | 0.35 | 0.35 |
| AUC | 0.72 | 0.73 | 0.69 | 0.65 | 0.65 |
Comparison of the performance of Support Vector Machine (SVM) classifier with sampling using polynomial, normalized polynomial and puk kernels using complexity parameters 0.1, 10 and 30
| Polynomial | Normalized Polynomial | Puk | |||||||
|---|---|---|---|---|---|---|---|---|---|
| C = 0.1 | C = 10 | C = 30 | C = 0.1 | C = 10 | C = 30 | C = 0.1 | C = 10 | C = 30 | |
| Sensitivity | 36.18% | 36.18% | 36.18% | 100% | 95.10% | 65.10% | 47.38% | 81.94% | 80.26% |
| Specificity | 94.37% | 94.37% | 94.37% | 88.31% | 88.79% | 88.85% | 88.58% | 94.13% | 95.19% |
| Precision | 61.46% | 61.41% | 61.41% | 0.02% | 33.67% | 5.62% | 6.33% | 53.64% | 62.63% |
| F-score | 45.55% | 45.53% | 45.53% | 0.05% | 49.73% | 10.35% | 11.17% | 64.84% | 70.36% |
| RMSE | 0.41 | 0.42 | 0.42 | 0.34 | 0.34 | 0.34 | 0.35 | 0.26 | 0.25 |
| AUC | 0.74 | 0.74 | 0.74 | 0.5 | 0.52 | 0.53 | 0.53 | 0.76 | 0.8 |
Comparison of the performance of Support Vector Machine (SVM) classifier without sampling using polynomial, normalized polynomial and puk kernels using complexity parameters 0.1, 10 and 30
| Polynomial | Normalized Polynomial | Puk | |||||||
|---|---|---|---|---|---|---|---|---|---|
| C = 0.1 | C = 10 | C = 30 | C = 0.1 | C = 10 | C = 30 | C = 0.1 | C = 10 | C = 30 | |
| Sensitivity | 0% | 0% | 0% | 0% | 0% | 56.59% | 0% | 37.90% | 39.78% |
| Specificity | 88.30% | 88.30% | 88.30% | 88.30% | 88.30% | 88.83% | 88.30% | 90.22% | 87.65% |
| Precision | 0% | 0% | 0% | 0% | 0.00% | 5.55% | 0% | 22.11% | 18.03% |
| F-score | 0% | 0% | 0% | 0% | 0.00% | 10.11% | 0% | 27.92% | 24.81% |
| RMSE | 0.34 | 0.34 | 0.34 | 0.34 | 0.34 | 0.34 | 0.34 | 0.37 | 0.47 |
| AUC | 0.50 | 0.50 | 0.50 | 0.5 | 0.5 | 0.52 | 0.5 | 0.58 | 0.59 |
Comparison of the performance of Artificial Neural Networks (ANN) classifier with gradient descent backpropagation using hidden units {1, 2, 4, 8, 32} and the momentum {0,0.2,0.5,0.9} using sampling
| H = 1 | H = 2 | H = 4 | H = 8 | H = 32 | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| M = 0 | M = 0.2 | M = 0.5 | M = 0.9 | M = 0 | M = 0.2 | M = 0.5 | M = 0.9 | M = 0 | M = 0.2 | M = 0.5 | M = 0.9 | M = 0 | M = 0.2 | M = 0.5 | M = 0.9 | M = 0 | M = 0.2 | M = 0.5 | M = 0.9 | |
| Sensitivity | – | 54.27% | – | 45.83% | 57.63% | 56.81% | 56.15% | 50.86% | 55.79% | 55.61% | 55.89% | 47.38% | 54.87% | 51.87% | 50.94% | 51.02% | 43.82% | 45.96% | 44.17% | 58.70% |
| Specificity | 88.01% | 88.35% | 88.01% | 89.10% | 90.51% | 90.61% | 90.78% | 90.30% | 89.82% | 90.13% | 90.43% | 89.67% | 90.31% | 90.29% | 90.39% | 89.77% | 90.43% | 90.60% | 90.19% | 88.09% |
| Precision | 0 | 3.60% | 0 | 11.90% | 24.93% | 26.00% | 27.70% | 23.60% | 18.47% | 21.47% | 24.37% | 17.77% | 23.30% | 23.57% | 24.50% | 18.33% | 25.87% | 27.13% | 23.37% | 0.90% |
| F-score | 0 | 6.75% | 0 | 18.89% | 34.81% | 35.67% | 37.10% | 32.24% | 27.75% | 30.98% | 33.94% | 25.84% | 32.71% | 32.41% | 33.09% | 26.97% | 32.53% | 34.12% | 30.56% | 1.77% |
| RMSE | 0.30 | 0.30 | 0.30 | 0.30 | 0.29 | 0.29 | 0.29 | 0.30 | 0.29 | 0.29 | 0.29 | 0.30 | 0.30 | 0.30 | 0.30 | 0.32 | 0.32 | 0.32 | 0.32 | 0.35 |
| AUC | 0.77 | 0.76 | 0.74 | 0.72 | 0.8 | 0.79 | 0.77 | 0.72 | 0.8 | 0.81 | 0.82 | 0.78 | 0.81 | 0.81 | 0.81 | 0.68 | 0.77 | 0.77 | 0.78 | 0.52 |
Comparison of the performance of Artificial Neural Networks (ANN) classifier with gradient descent backpropagation using hidden units {1, 2, 4, 8, 32} and the momentum {0,0.2,0.5,0.9} without using sampling
| H = 1 | H = 2 | H = 4 | H = 8 | H = 32 | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| M = 0 | M = 0.2 | M = 0.5 | M = 0.9 | M = 0 | M = 0.2 | M = 0.5 | M = 0.9 | M = 0 | M = 0.2 | M = 0.5 | M = 0.9 | M = 0 | M = 0.2 | M = 0.5 | M = 0.9 | M = 0 | M = 0.2 | M = 0.5 | M = 0.9 | |
| Sensitivity | – | – | 42.30% | – | 52.65% | 52.72% | 52.40% | 61.99% | 49.67% | 52.10% | 50.99% | 47.50% | 51.16% | 49.19% | 51.93% | 51.89% | 42.69% | 40.05% | 42.31% | 66.67% |
| Specificity | 88.30% | 88.30% | 90.62% | 88.30% | 91.37% | 91.32% | 91.42% | 89.29% | 90.90% | 89.96% | 90.59% | 91.64% | 90.89% | 90.79% | 90.56% | 89.37% | 90.83% | 90.98% | 91.07% | 88.38% |
| Precision | 0 | 0 | 25.43% | 0 | 31.39% | 30.86% | 31.84% | 10.14% | 27.18% | 17.51% | 24% | 34.57% | 26.94% | 26.12% | 23.54% | 11.51% | 27.46% | 29.55% | 29.93% | 0.77% |
| F-score | 0 | 0 | 31.76% | 0 | 39.33% | 38.93% | 39.61% | 17.43% | 35.13% | 26.21% | 32.63% | 40.02% | 35.29% | 34.13% | 32.40% | 18.84% | 33.43% | 34.00% | 35.06% | 1.51% |
| RMSE | 0.29 | 0.30 | 0.30 | 0.30 | 0.29 | 0.29 | 0.29 | 0.30 | 0.29 | 0.29 | 0.30 | 0.30 | 0.30 | 0.30 | 0.30 | 0.31 | 0.33 | 0.33 | 0.33 | 0.34 |
| AUC | 0.79 | 0.77 | 0.76 | 0.73 | 0.78 | 0.78 | 0.79 | 0.79 | 0.78 | 0.78 | 0.79 | 0.79 | 0.80 | 0.80 | 0.80 | 0.79 | 0.77 | 0.76 | 0.76 | 0.50 |
Comparison of the performance of Naïve Bayesian classifier (BC) using three different Weka options for handling continuous attributes: single normal, kernel estimation and supervised discretization using Sampling
| Single Normal | kernel Estimation | Supervised Discretization | |
|---|---|---|---|
| Sensitivity | 35.32% | 40.90% | 37.41% |
| Specificity | 93.26% | 92.37% | 93.32% |
| Precision | 52.34% | 42.70% | 52.20% |
| F-score | 42.18% | 41.78% | 43.59% |
| RMSE | 0.35 | 0.32 | 0.34 |
| AUC | 0.81 | 0.81 | 0.82 |
Comparison of the performance of Naïve Bayesian classifier (BC) using three different Weka options for handling continuous attributes: single normal, kernel estimation and supervised discretization without using Sampling
| Single Normal | kernel Estimation | Supervised Discretization | |
|---|---|---|---|
| Sensitivity | 35.73% | 41.25% | 37.71% |
| Specificity | 93.22% | 92.17% | 93.23% |
| Precision | 51.89% | 40.79% | 51.32% |
| F-score | 42.32% | 41.02% | 43.47% |
| RMSE | 0.35 | 0.32 | 0.34 |
| AUC | 0.81 | 0.81 | 0.82 |
Comparison of the performance of Bayesian Network classifier (BN) using different search algorithms: K2, Hill Climbing, Repeated Hill Climber, LAGD Hill Climbing, TAN, Tabu and Simulated Annealing using Sampling
| K2 | Hill Climbing | Repeated Hill Climber | LAGD Hill Climbing | TAN | Tabu | Simulated Annealing | |
|---|---|---|---|---|---|---|---|
| Sensitivity | 37.44% | 37.44% | 37.44% | 47.65% | 60.07% | 37.59% | 55.20% |
| Specificity | 93.31% | 93.31% | 93.31% | 91.55% | 91.02% | 93.20% | 91.23% |
| Precision | 52.11% | 52.11% | 52.11% | 33.76% | 27.32% | 51.10% | 29.71% |
| F-score | 43.57% | 43.57% | 43.57% | 39.52% | 37.56% | 43.31% | 38.63% |
| RMSE | 0.34 | 0.34 | 0.34 | 0.34 | 0.28 | 0.34 | 0.29 |
| AUC | 0.82 | 0.82 | 0.82 | 0.81 | 0.84 | 0.81 | 0.84 |
Comparison of the performance of Bayesian Network classifier (BN) using different search algorithms: K2, Hill Climbing, Repeated Hill Climber, LAGD Hill Climbing, TAN, Tabu and Simulated Annealing without using Sampling
| K2 | Hill Climbing | Repeated Hill Climber | LAGD Hill Climbing | TAN | Tabu | Simulated Annealing | |
|---|---|---|---|---|---|---|---|
| Sensitivity | 37.70% | 37.70% | 37.70% | 48.11% | 57.09% | 37.94% | 53.65% |
| Specificity | 93.21% | 93.21% | 93.21% | 91.44% | 90.71% | 93.19% | 90.97% |
| Precision | 51.20% | 51.20% | 51.20% | 32.63% | 24.57% | 50.89% | 27.44% |
| F-score | 43.42% | 43.42% | 43.42% | 38.89% | 34.35% | 43.47% | 36.31% |
| RMSE | 0.34 | 0.34 | 0.34 | 0.34 | 0.34 | 0.3 | 0.3 |
| AUC | 0.82 | 0.82 | 0.82 | 0.81 | 0.83 | 0.81 | 0.82 |
Comparison of the performance K-Nearest Neighbor classifier (KNN) using different values of k {1, 3, 5, 10} neighbors and using different distance functions; Euclidean distance, Manhattan distance and Minkowski distance using sampling
| Euclidean distance | Manhattan Distance | Minkowski Distance | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| K = 1 | K = 3 | K = 5 | K = 10 | K = 1 | K = 3 | K = 5 | K = 10 | K = 1 | K = 3 | K = 5 | K = 10 | |
| Sensitivity | 78.43% | 65.61% | 64.17% | 50.00% | 78.29% | 65.66% | 65.68% | 61.23% | 78.43% | 65.61% | 64.17% | 59.23% |
| Specificity | 96.98% | 91.74% | 90.53% | 89.84% | 97.05% | 91.80% | 90.60% | 89.91% | 96.98% | 91.74% | 90.53% | 89.84% |
| Precision | 77.18% | 33.64% | 22.32% | 11.50% | 77.73% | 34.16% | 22.94% | 16.44% | 77.18% | 33.64% | 22.32% | 15.89% |
| F-score | 77.80% | 44.47% | 33.12% | 18.70% | 78.01% | 44.94% | 34.01% | 25.91% | 77.80% | 44.47% | 33.12% | 25.05% |
| RMSE | 0.23 | 0.27 | 0.28 | 0.29 | 0.23 | 0.27 | 0.28 | 0.29 | 0.23 | 0.27 | 0.28 | 0.29 |
| AUC | 0.88 | 0.86 | 0.85 | 0.84 | 0.87 | 0.86 | 0.85 | 0.84 | 0.87 | 0.86 | 0.85 | 0.84 |
The results show that the value 1 for the K parameter achieves the highest AUC (0.88) using Euclidean distance
Comparison of the performance K-Nearest Neighbor classifier (KNN) using different values of k {1, 3, 5, 10} neighbors and using different distance functions; Euclidean distance, Manhattan distance and Minkowski distance without using sampling
| Euclidean distance | Manhattan Distance | Minkowski Distance | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| K = 1 | K = 3 | K = 5 | K = 10 | K = 1 | K = 3 | K = 5 | K = 10 | K = 1 | K = 3 | K = 5 | K = 10 | |
| Sensitivity | 28.06% | 38.19% | 42.44% | 46.78% | 28.54% | 38.21% | 42.96% | 47.59% | 28.06% | 38.19% | 42.44% | 28.06% |
| Specificity | 90.24% | 89.88% | 89.50% | 89.31% | 90.28% | 89.87% | 89.49% | 89.31% | 90.24% | 89.88% | 89.50% | 90.24% |
| Precision | 25.36% | 18.37% | 13.64% | 11.12% | 25.62% | 18.18% | 13.42% | 11.12% | 25.36% | 18.37% | 13.64% | 25.36% |
| F-score | 26.64% | 24.81% | 20.64% | 17.97% | 27.00% | 24.64% | 20.45% | 18.03% | 26.64% | 24.81% | 20.64% | 26.64% |
| RMSE | 0.4 | 0.33 | 0.32 | 0.3 | 0.4 | 0.33 | 0.32 | 0.31 | 0.4 | 0.33 | 0.32 | 0.4 |
| AUC | 0.58 | 0.66 | 0.7 | 0.74 | 0.59 | 0.67 | 0.7 | 0.74 | 0.58 | 0.66 | 0.7 | 0.58 |
The results show that the value 10 for the K parameter achieves the highest AUC (0.74) using Euclidean distance
Comparison of the performance of Random Forest (RF) classifier having 10, 50 and 100 trees with different feature set considered at each split (1, 2, 4, 8, and 12) using sampling
| No. of tree =10 | No. of tree =50 | No. of tree =100 | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F = 1 | F = 2 | F = 4 | F = 8 | F = 12 | F = 1 | F = 2 | F = 4 | F = 8 | F = 12 | F = 1 | F = 2 | F = 4 | F = 8 | F = 12 | |
| Sensitivity | 90.62% | 91.01% | 89.46% | 87.40% | 86.90% | 96.07% | 95.47% | 94.67% | 93.63% | 93.14% | 96.73% | 95.97% | 94.85% | 93.95% | 93.59% |
| Specificity | 96.49% | 96.56% | 96.67% | 96.79% | 96.83% | 96.84% | 96.85% | 97.06% | 97.11% | 97.15% | 96.88% | 96.88% | 97.04% | 97.19% | 97.18% |
| Precision | 72.78% | 73.40% | 74.28% | 75.31% | 75.67% | 75.50% | 75.57% | 77.27% | 77.73% | 77.99% | 75.74% | 75.81% | 77.08% | 78.35% | 78.28% |
| F-score | 80.72% | 81.26% | 81.17% | 80.90% | 80.90% | 84.55% | 84.36% | 85.09% | 84.94% | 84.90% | 84.96% | 84.71% | 85.05% | 85.44% | 85.25% |
| AUC | 80.72 | 81.26 | 81.17 | 80.90 | 80.90 | 0.97 | 0.97 | 0.97 | 0.97 | 0.97 | 0.97 | 0.97 | 0.97 | 0.97 | 0.97 |
| RMSE | 0.2 | 0.19 | 0.2 | 0.2 | 0.2 | 0.18 | 0.18 | 0.18 | 0.18 | 0.18 | 0.18 | 0.18 | 0.18 | 0.18 | 0.18 |
Comparison of the performance of Random Forest (RF) classifier having 10, 50 and 100 trees with different feature set considered at each split (1, 2, 4, 8, and 12) without using sampling
| No. of tree =10 | No. of tree =50 | No. of tree =100 | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F = 1 | F = 2 | F = 4 | F = 8 | F = 12 | F = 1 | F = 2 | F = 4 | F = 8 | F = 12 | F = 1 | F = 2 | F = 4 | F = 8 | F = 12 | |
| Sensitivity | 45.82% | 47.20% | 48.35% | 46.64% | 45.44% | 56.62% | 56.34% | 57.33% | 55.84% | 54.51% | 58.39% | 59.87% | 59.09% | 56.56% | 54.41% |
| Specificity | 90.03% | 90.23% | 90.58% | 90.83% | 90.85% | 89.60% | 89.81% | 90.29% | 90.48% | 90.57% | 89.50% | 89.81% | 90.21% | 90.45% | 90.48% |
| Precision | 18.90% | 20.77% | 24.19% | 26.91% | 27.30% | 13.61% | 15.74% | 20.41% | 22.42% | 23.42% | 12.49% | 15.53% | 19.52% | 22.08% | 22.56% |
| F-score | 26.76% | 28.84% | 32.24% | 34.13% | 34.11% | 21.95% | 24.61% | 30.10% | 31.99% | 32.76% | 20.58% | 24.66% | 29.35% | 31.76% | 31.90% |
| RMSE | 0.3 | 0.3 | 0.3 | 0.3 | 0.31 | 0.29 | 0.29 | 0.29 | 0.29 | 0.30 | 0.29 | 0.29 | 0.29 | 0.29 | 0.29 |
| AUC | 0.76 | 0.77 | 0.77 | 0.77 | 0.76 | 0.81 | 0.81 | 0.81 | 0.81 | 0.80 | 0.81 | 0.81 | 0.82 | 0.81 | 0.81 |
Fig. 2AUC of different models with different percentage of synthetic examples created using SMOTE
Comparison of the performance of the different classification models without using the SMOTE sampling method
| DT | SVM | ANN | BC | BN | KNN | RF | |
|---|---|---|---|---|---|---|---|
| Sensitivity | 54.43% | 39.78% | 52.65% |
| 57.09% | 46.78% | 59.09% |
| Specificity | 90.51% |
| 91.37% |
| 90.71% | 89.31% | 90.21% |
| Precision | 22.80% | 18.03% | 31.39% |
| 24.57% |
| 19.52% |
| F-score | 32.14% | 24.81% | 39.33% |
| 34.35% |
| 29.35% |
| RMSE | 0.3 |
|
| 0.34 | 0.34 | 0.3 |
|
| AUC | 0.73 |
| 0.80 | 0.82 |
| 0.74 | 0.82 |
The models are: Decision Tree (DT), Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayesian Classifier (BC), Bayesian Network (BN), K-Nearest Neighbor (KNN) and Random Forest (RF). The results of this experiment show that BN achieves the highest AUC (0.83). The BC model achieves the highest precision (51.32%) and the highest specificity (93.32%)
Comparison of the performance of the different classification models using the SMOTE sampling methods. The models are: Decision Tree (DT), Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayesian Classifier (BC), Bayesian Network (BN), K-Nearest Neighbor (KNN) and Random Forest (RF)
| DT | SVM | ANN | BC | BN | KNN | RF | |
|---|---|---|---|---|---|---|---|
| Sensitivity | 59.95% | 80.26% | 55.89% |
| 60.07% | 78.43% |
|
| Specificity | 96.05% | 95.19% | 90.43% | 93.32% |
|
| 96.84% |
| Precision | 70.91% | 62.63% |
| 52.20% | 27.32% |
| 75.50% |
| F-score | 64.97% | 70.36% |
| 43.59% | 37.56% | 77.80% |
|
| RMSE | 0.27 | 0.25 | 0.29 |
| 0.28 |
| 0.18 |
| AUC | 0.88 |
| 0.82 | 0.82 | 0.84 | 0.88 |
|
The results of this experiment show that the RF model achieves the highest AUC (0.97), the lowest RMSE (0.18) and the highest sensitivity (94.65%)
Fig. 3The ROC curves of the different machine learning classification models. The models are: Decision Tree (DT), Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayesian Classifier (BC), Bayesian Network (BN) and K-Nearest Neighbor (KNN). The results show that without using the SMOTE sampling method (a), BC and BN achieves the highest AUC (0.81) while with using the SMOTE sampling method (b), the KNN model achieves the highest AUC (0.94)