| Literature DB >> 35720925 |
Sushruta Mishra1, Hiren Kumar Thakkar2, Priyanka Singh3, Gajendra Sharma4.
Abstract
Advanced predictive analytics coupled with an effective attribute selection method plays a pivotal role in the precise assessment of chronic disorder risks in patients. Traditional attribute selection approaches suffer from premature convergence, high complexity, and computational cost. On the contrary, heuristic-based optimization to supervised methods minimizes the computational cost by eliminating outlier attributes. In this study, a novel buffer-enabled heuristic, a memory-based metaheuristic attribute selection (MMAS) model, is proposed, which performs a local neighborhood search for optimizing chronic disorders data. It is further filtered with unsupervised K-means clustering to remove outliers. The resultant data are input to the Naive Bayes classifier to determine chronic disease risks' presence. Heart disease, breast cancer, diabetes, and hepatitis are the datasets used in the research. Upon implementation of the model, a mean accuracy of 94.5% using MMAS was recorded and it dropped to 93.5% if clustering was not used. The average precision, recall, and F-score metric computed were 96.05%, 94.07%, and 95.06%, respectively. The model also has a least latency of 0.8 sec. Thus, it is demonstrated that chronic disease diagnosis can be significantly improved by heuristic-based attribute selection coupled with clustering followed by classification. It can be used to develop a decision support system to assist medical experts in the effective analysis of chronic diseases in a cost-effective manner.Entities:
Mesh:
Year: 2022 PMID: 35720925 PMCID: PMC9200507 DOI: 10.1155/2022/8749353
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Attribute selection process.
Existing work details on attribute selection over chronic disease datasets.
| Existing works | Attribute selector used | Chronic disease dataset |
|---|---|---|
| El Akadi et al. [ | Genetic algorithm | Dengue datasets |
| Mokeddem et al. [ | Genetic algorithm | Coronary artery disease |
| Kora and Kalva [ | Bat algorithm | ECG signal data |
| Keerthi Priya et al. [ | Whale optimization algorithm | Breast cancer and hepatitis |
| Uzer et al. [ | Artificial bee colony algorithm | Liver, diabetes, and hepatitis |
| Dogantekin et al. [ | Linear discriminant analysis | Hepatitis datasets |
| Kohavi and John [ | Sequential forward selection | Thyroid dataset |
| Gandhi and Prajapati [ | Correlation feature selection | PIMA Indian diabetes |
| Kavitha and Kannan [ | Principal component analysis | Heart disease dataset |
| Yildirim [ | Consistency-based subset evaluation | Hepatitis dataset |
| Ding and Fu [ | Information gain | Breast cancer and diabetes dataset |
| Kohli and Arora [ | Adaptive boosting | Heart disease, breast cancer, and diabetes |
| Mishra et al. [ | Genetic algorithm | Diabetes |
| Sahoo et al. [ | DTNB algorithm | Heart disorders |
Diabetes dataset details [1].
| Name of attribute | at-description | Domain range |
|---|---|---|
| Preg | Pregnancy count | 0–15 |
| Plas | Plasma glucose concentration | 0–199 |
| Pres | Diastolic blood pressure | 0–122 (mm Hg) |
| Skin | Triceps' skin (mm) thickness | 0–99 (mm) |
| Insu | Serum insulin (2-hour) | 0–846 (mu U/ml) |
| Mass | Body mass index | 0–67.1 (kg/m2) |
| Pedi | Diabetes pedigree function | 0.08–2.42 |
| Age | Person's age | 21–81 years |
| Class | Label of person | 0 = absence; 1 = presence |
Breast cancer dataset details [1].
| Name of attribute | Description | Domain range |
|---|---|---|
| Class | Class label | Nonrecurrence and recurrence |
| Age | Age in years | 10–19, 20–29, 30–39, 40–49, 50–59, 60–69, 70–79, 80–89, and 90–99 |
| Menopause | Whether the patient is pre- or postmenopausal during treatment | ge40 or lt40 or premeno |
| Tumor-size | Tumour size (in mm) | 0–4, 5–9, 10–14, 15–19, 20–24, 25–29, 30–34, 35–39, 40–44, 45–49, 50–54, and 55–59. |
| Iny-nodes | Total axillary lymph nodes that contain metastatic breast cancer | 0–2, 3–5, 6–8, 9–11, 12–14, 15–17, 18–20, 21–23, 24–26, 27–29, 30–32, 33–35, and 36–39 |
| Node-caps | If tumor penetrated in lymph node capsule | Yes or no |
| Deg-malig | Histological level of the tumor | 1, 2, or 3 |
| Breast | Which side of breast is affected | Right or left |
| Breast-quad | Breast is partitioned into four quadrants with nipple as the center | Right-up, left-up, right-low, left-low, and central |
| Irradiat | Patient's radiation (X-rays) therapy history | Yes or no |
Heart disease dataset details [1].
| Name of attribute | Description | Domain range |
|---|---|---|
| Age | Age | 1–100 years old |
| Sex | Person's gender | 1 = male. 0 = female |
| Cp | Uneasiness in chest | General angina/nonanginal pain/asymptomatic/atypical angina/ |
| Trestbps | Blood pressure at rest | Measured in mm Hg after admitted to medical centre |
| Chol | Serum cholesterol level | Measured in mg/dl |
| Restecg | Electrocardiogram outcome at rest time | Values of 0, 1, or 2 |
| Oldpeak | Exercise-induced ST depression prior to rest | 3.05–3.81 |
| Exang | Exercise-induced angina | 1 = yes; 0 = no |
| Smoke | Smoker or not | Value: 1 = yes; 0 = no |
| Slope | ST segment peak exercise slope | 1: upsloping; 2: flat; 3: downsloping |
| Ca | Major vessel count | 0–3 |
| Thal | Maximum heart rate achieved | 3 = normal; 6 = fixed defect; and 7 = reversible defect |
Hepatitis disease dataset details.
| Parameters | Description |
|---|---|
| Class | Die, Live |
| Age | 10, 20, 30, 40, 50, 60, 70, 80 |
| Sex | Male, female |
| Steroid | No, yes |
| Antivirals | No, yes |
| Fatigue | No, yes |
| Malaise | No, yes |
| Anorexia | No, yes |
| Liver big | No, yes |
| Liver firm | No, yes |
| Spleen palpable | No, yes |
| Spiders | No, yes |
| Ascites | No, yes |
| Varices | No, yes |
Figure 2The proposed metaheuristic attribute selector-based classification model for chronic disorder detection.
Algorithm 1Cluster K-means.
Algorithm 2MMAS procedure.
Figure 3Demonstration of cross-validation method.
Reduced dataset details after applying MHAS.
| Chronic disease dataset | Dataset details | MMAS |
|---|---|---|
| Heart disease dataset | Samples | 270 |
| Initial attributes | 13 | |
| Reduced attributes | 10 | |
|
| ||
| Diabetes dataset | Samples | 768 |
| Initial attributes | 8 | |
| Reduced attributes | 6 | |
|
| ||
| Breast cancer dataset | Samples | 286 |
| Initial attributes | 9 | |
| Reduced attributes | 7 | |
|
| ||
| Hepatitis dataset | Samples | 155 |
| Initial attributes | 20 | |
| Reduced attributes | 15 | |
Figure 4Classification accuracy analysis using the MMAS method on chronic disease data.
Figure 5Comparison of the MMAS method with other heuristic methods for diabetes data.
Figure 6Comparison of the MMAS method with other heuristic methods for breast cancer data.
Figure 7Comparison of the MMAS method with other heuristic methods for heart disease data.
Figure 8Comparison analysis of the MMAS method with other heuristic methods for hepatitis data.
Figure 9Impact of clustering on accuracy performance of the model.
Impact of the MMAS method on different performance metrics.
| Diabetes | Breast cancer | Heart disease | Hepatitis | |
|---|---|---|---|---|
| Without MMAS method | ||||
| Precision | 90.8 | 89.9 | 89.9 | 90.4 |
| Recall | 90.2 | 85.6 | 87.6 | 88.1 |
| F-score | 90.5 | 87.7 | 88.7 | 89.2 |
| With MMAS method | ||||
| Precision | 95.5 | 96.3 | 95.8 | 96.6 |
| Recall | 94.4 | 94.1 | 92.7 | 95.1 |
| F-score | 94.9 | 95.2 | 94.2 | 95.8 |
Figure 10Latency delay comparative analysis using the MMAS method on chronic disease data.
Figure 11Feature relevance graph for heart disease dataset.
Impact of heuristics on heart disease dataset using feature relevance score.
| Parameters | BFS | GS | PSO | GSS | MMAS |
|---|---|---|---|---|---|
| Number of instances | 270 | 270 | 270 | 270 | 270 |
| Initial attribute set | 13 | 13 | 13 | 13 | 13 |
| Reduced attribute set | 11 | 11 | 12 | 11 | 10 |
Figure 12Feature relevance graph for breast cancer dataset.
Impact of heuristics on breast cancer dataset using feature relevance score.
| Parameters | BFS | GS | PSO | GSS | MMAS |
|---|---|---|---|---|---|
| Number of instances | 286 | 286 | 286 | 286 | 286 |
| Initial attribute set | 9 | 9 | 9 | 9 | 9 |
| Reduced attribute set | 7 | 8 | 8 | 8 | 7 |
Figure 13Feature relevance graph for diabetes dataset.
Impact of heuristics on diabetes dataset using feature relevance score.
| Parameters | BFS | GS | PSO | GSS | MMAS |
|---|---|---|---|---|---|
| Number of instances | 768 | 768 | 768 | 768 | 768 |
| Initial attribute set | 8 | 8 | 8 | 8 | 8 |
| Reduced attribute set | 7 | 7 | 7 | 8 | 6 |
Figure 14Feature relevance graph for hepatitis dataset.
Feature relevance graph for hepatitis dataset.
| Parameters | BFS | GS | PSO | GSS | MMAS |
|---|---|---|---|---|---|
| Number of instances | 155 | 155 | 155 | 155 | 155 |
| Initial attribute set | 20 | 20 | 20 | 20 | 20 |
| Reduced attribute set | 16 | 18 | 18 | 18 | 15 |
Figure 15Matthews correlation coefficient (MCC) analysis.
Figure 16Accuracy analysis of the MMAS method on different disease datasets.