| Literature DB >> 35161928 |
Ibrahim M El-Hasnony1, Omar M Elzeki1,2, Ali Alshehri3, Hanaa Salem4.
Abstract
The rapid growth and adaptation of medical information to identify significant health trends and help with timely preventive care have been recent hallmarks of the modern healthcare data system. Heart disease is the deadliest condition in the developed world. Cardiovascular disease and its complications, including dementia, can be averted with early detection. Further research in this area is needed to prevent strokes and heart attacks. An optimal machine learning model can help achieve this goal with a wealth of healthcare data on heart disease. Heart disease can be predicted and diagnosed using machine-learning-based systems. Active learning (AL) methods improve classification quality by incorporating user-expert feedback with sparsely labelled data. In this paper, five (MMC, Random, Adaptive, QUIRE, and AUDI) selection strategies for multi-label active learning were applied and used for reducing labelling costs by iteratively selecting the most relevant data to query their labels. The selection methods with a label ranking classifier have hyperparameters optimized by a grid search to implement predictive modelling in each scenario for the heart disease dataset. Experimental evaluation includes accuracy and F-score with/without hyperparameter optimization. Results show that the generalization of the learning model beyond the existing data for the optimized label ranking model uses the selection method versus others due to accuracy. However, the selection method was highlighted in regards to the F-score using optimized settings.Entities:
Keywords: active learning; chronic diseases; data mining; heart disease; machine learning; multi-label classification
Mesh:
Year: 2022 PMID: 35161928 PMCID: PMC8839067 DOI: 10.3390/s22031184
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1A general framework for implementing an active learning approach.
Figure 2Implemented active learning cycle.
Multi-label query strategies.
| Strategy | Description |
|---|---|
|
| Select an instance to run a loss reduction and confidence maximization query on all labels. |
|
| Randomly select the instances or instance–label pairs. |
|
| The maximum margin uncertainty and label cardinality inconsistency are used to query all labels. |
|
| To choose a label–instance pair, consider the informational and representative qualities of the pair. |
|
| Based on the degree of uncertainty and diversity, choose an instance–label pair. |
Confusion matrix.
| Actual Output | Predicted Class | ||
|---|---|---|---|
| Positive | Negative | ||
|
|
| True Positives (TP) | False Negatives (FN) |
|
| False Positives (FP) | True Negatives (TN) | |
Performance evaluation metrics.
|
| The ratio is defined as the correct outcomes of all the possible prediction values. Accuracy is the degree to which measures are within a specific range. At the same time, precision is the degree to which measurements are within. |
Accuracy
|
|
| The weighted (sensitivity) and an accurate recall average are two different measures. F1 is a good option if you want to balance precision and reminder. |
|
|
| The ratio correctly identified as diabetes in heart disease out of all heart disease instances. |
|
Description of the dataset features.
| Feature | Type | Description |
|---|---|---|
|
| numeric | Years of age |
|
| categorical | 1: male, 0: female |
|
| numeric | Type of chest pain |
|
| numeric | Standing blood pressure of the patient (in mm Hg) |
|
| numeric | Serum cholesterol (in mg/dl) |
|
| categorical | If fasting blood sugar >120 mg/dL (1 = true; 0 = false) |
|
| numeric | 0: means “normal”. |
|
| numeric | Attained maximum heart rate. |
|
| categorical | Angina induced by exercise (1 = yes; 0 = no) |
|
| numeric | Exercise-induced ST depression in comparison to resting |
|
| numeric | The peak exercise’s slope ST-segment V |
|
| numeric | Number of significant vessels (0–3) colored by fluoroscopy |
|
| numeric | 3 = normal; 6 = fixed defect; 7 = reversable defect |
|
| categorical | Heart disease diagnosis (angiographic disease status) |
The feature distribution of the Cleveland Heart Disease dataset.
| Age | Sex | cp | trestbps | chol | fbs | restecg | thalach | Exang | oldpeak | slope | ca | thal | Target | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 54.3667 | 0.6832 | 0.9670 | 131.6238 | 246.2640 | 0.1485 | 0.5281 | 149.647 | 0.3267 | 1.0396 | 1.3993 | 0.7294 | 2.3135 | 0.5446 |
|
| 9.08 | 0.47 | 1.03 | 17.54 | 51.83 | 0.36 | 0.53 | 22.901 | 0.47 | 1.16 | 0.62 | 1.02 | 0.61 | 0.50 |
|
| 29 | 0 | 0 | 94 | 126 | 0 | 0 | 71 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 77 | 1 | 3 | 200 | 564 | 1 | 2 | 202 | 1 | 6.2 | 2 | 4 | 3 | 1 |
Figure 3Statistical description of heart disease dataset.
Figure 4Dataset features heatmap.
Comparative evaluation of active learning selection methods in terms of accuracy (the best values are bold).
| Method | Before Hyperparameter Optimization | After hyperparameter Optimization | ||||
|---|---|---|---|---|---|---|
| #Queries | Cost | Performance | #Queries | Cost | Performance | |
|
| 121 | 121 | 0.431 ± 0.076 | 121 | 121 | 0.526 ± 0.048 |
|
| 121 | 121 | 0.508 ± 0.032 | 121 | 121 | 0.454 ± 0.068 |
|
| 73 | 122 | 0.476 ± 0.050 | 73 | 122 | 0.512 ± 0.050 |
|
|
|
|
|
|
|
|
|
| 121 | 121 | 0.355 ± 0.114 | 109 | 121 | 0.499 ± 0.068 |
Figure 5Classification accuracy for the five selection AL strategies without grid search optimizer.
Figure 6Classification accuracy for the five selection AL strategies with grid search optimizer.
Comparative Evaluation of Active learning Selection Methods in terms of F-Score (the best values are bold).
| Method | Before Hyperparameters Optimization | After Hyperparameters Optimization | ||||
|---|---|---|---|---|---|---|
| #Queries | Cost | Performance | #Queries | Cost | Performance | |
|
| 121 | 121 | 0.6014 ± 0.044 |
|
|
|
|
| 121 | 121 | 0.6104 ± 0.032 | 121 | 121 | 0.59158 ± 0.03 |
|
| 61 | 122 | 0.5734 ± 0.048 | 73 | 122 | 0.6070 ± 0.042 |
|
|
|
|
| 61 | 122 | 0.6062 ± 0.036 |
|
| 121 | 121 | 0.6030 ± 0.028 | 109 | 121.2 | 0.6076 ± 0.036 |
Figure 7F1-measure for the five selection AL strategies without grid search optimizer.
Figure 8F1-measure for the five selection AL strategies without grid search optimizer.