| Literature DB >> 35433587 |
Junaid Rashid1, Saba Batool2, Jungeun Kim1, Muhammad Wasif Nisar2, Amir Hussain3, Sapna Juneja4, Riti Kushwaha5.
Abstract
Chronic diseases are increasing in prevalence and mortality worldwide. Early diagnosis has therefore become an important research area to enhance patient survival rates. Several research studies have reported classification approaches for specific disease prediction. In this paper, we propose a novel augmented artificial intelligence approach using an artificial neural network (ANN) with particle swarm optimization (PSO) to predict five prevalent chronic diseases including breast cancer, diabetes, heart attack, hepatitis, and kidney disease. Seven classification algorithms are compared to evaluate the proposed model's prediction performance. The ANN prediction model constructed with a PSO based feature extraction approach outperforms other state-of-the-art classification approaches when evaluated with accuracy. Our proposed approach gave the highest accuracy of 99.67%, with the PSO. However, the classification model's performance is found to depend on the attributes of data used for classification. Our results are compared with various chronic disease datasets and shown to outperform other benchmark approaches. In addition, our optimized ANN processing is shown to require less time compared to random forest (RF), deep learning and support vector machine (SVM) based methods. Our study could play a role for early diagnosis of chronic diseases in hospitals, including through development of online diagnosis systems.Entities:
Keywords: artificial neural network (ANN); chronic diseases; feature selection; medical diagnosis; prediction
Mesh:
Year: 2022 PMID: 35433587 PMCID: PMC9008324 DOI: 10.3389/fpubh.2022.860396
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Summary of the literature review.
|
|
|
|
|
|
|---|---|---|---|---|
| 2021 | Disease prediction with the features using machine learning | Diabetes and breast cancer | SVM | ( |
| 2020 | Different feature selection approaches are compared to evaluate their recall, precision, and F1 measure performance. | Diabetes, Kidney and Heart Attack | Adaptive probabilistic divergence is used to select most useful features. | ( |
| 2020 | To predict the presence of three chronic diseases. | Diabetes, Heart attack, and cancer | Incremental Feature Selection Approach with Convolutional Neural Network (CNN) | ( |
| 2019 | To explore most important features for different chronic diseases. | Heart, Hepatitis, Diabetes, Cancer | Including information gain, gain ratio, and correlation-based approaches. | ( |
| 2020 | Enhance the accuracy of a prediction model | Chronic Diseases | Stacked Ensemble approach | ( |
| 2020 | To enhance the classification results using the clustering method | Diabetes, Cancer & Kidney diseases | Rough K-means clustering | ( |
| 2019 | Focused on feature filtering techniques to predict cancer in an early stage. | Cancer | Decision Tree, Naive Bayes, k-Nearest Neighbors, and Support Vector Machine | ( |
| 2020 | Extract the most influencing features | Cancer | Pearson correlation with ANN | ( |
| 2019 | Enhance classification performance using significant attributes | Diabetes | A hybrid of AdaBoost, Bagging, and K-NN | ( |
| 2019 | To predict diabetes using demographics and hypertension data | Diabetes | convolution neural network (CNN) | ( |
| 2020 | To highlight features most common in heart and diabetes patients. | Diabetes and Heart Disease | Supervised learning classification and regression | ( |
| 2019 | Heart disease prediction based on patients' data. | Heart Attack | Random Forest | ( |
| 2019 | Feature reduction in heart patients' data. | Heart Attack | Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) | ( |
| 2020 | To explore the most significant features | Hepatitis | Principal Component Analysis (PCA) | ( |
| 2019 | Highlight hepatitis C risk predicting factors | Hepatitis | Random Forest | ( |
| 2020 | Feature transformation is used to increase the accuracy of the hepatitis prediction model | Hepatitis | Classification | ( |
| 2019 | Irrelevant significant features reduction for kidney disease prediction model | Kidney Failure | Ant Colony Optimization | ( |
| 2019 | to predict chronic kidney disease based on demographic data. | Kidney Diseases | Random Forest | ( |
| 2020 | Ensemble techniques for kidney disease prediction | Kidney Diseases | Decision Tree, k-NN and Naive Bayes | ( |
| 2019 | A prediction model for patients' survival after kidney transplant | Kidney Diseases | Information Gain and Naive Bayes | ( |
Figure 1Proposed approach.
Datasets used for classification.
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|
| Cancer | Breast Cancer Wisconsin (Diagnostic) Data Set (Dataset 1) | 31 | 569 | 212 | 357 | ( | |
| Cancer | Breast Cancer Wisconsin (Dataset 2) | 10 | 699 | 240 | 459 | ( | |
| Diabetes | Pima Indians Diabetes Database (Dataset 1) | 9 | 768 | 268 | 500 | ( | |
| Diabetes | Diabetes Classification (Dataset 2) | 15 | 390 | 60 | 330 | ( | |
| Heart Attack | Heart Disease UCI (Dataset 1) | 14 | 303 | 138 | 165 | ( | |
| Heart Attack | Heart Disease Prediction (Dataset 2) | 13 | 270 | 150 | 120 | ( | |
| Hepatitis | Hepatitis (Dataset 1) | 19 | 142 | 80 | 62 | ( | |
| Hepatitis | Indian Liver Patient Records (Dataset 2) | 11 | 584 | 168 | 416 | ( | |
| Kidney | Kidney Disease Dataset (Dataset 1) | 26 | 189 | 74 | 115 | ( |
Figure 2Proposed model architecture.
Figure 3Cancer diagnosis accuracy (dataset 1).
Figure 4Cancer diagnosis accuracy (dataset 2).
Figure 5Diabetes diagnosis accuracy (dataset 1).
Figure 6Diabetes diagnosis accuracy (dataset 2).
Figure 7Heart attack diagnosis accuracy (dataset 1).
Figure 8Heart attack diagnosis accuracy (dataset 2).
Figure 9Hepatitis diagnosis accuracy (dataset 1).
Figure 10Hepatitis diagnosis accuracy (dataset 2).
Figure 11Kidney disease diagnosis accuracy.
Highest accuracy achieved by proposed model.
|
|
|
|---|---|
| Cancer | 98.23% |
| Diabetes | 93.59% |
| Heart | 93.44% |
| Hepatitis | 98.46% |
| Kidney | 98.90% |
Overall accuracy achieved for all diseases prediction.
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| ANN | 98.23% | 97.43% | 86.67% | 93.08% | 93.44% | 85.56% | 98.46% | 72.73% | 98.90% | 91.61% |
| RF | 96.46% | 97.14% | 85.71% | 93.59% | 91.80% | 85.93% | 98.46% | 72.74% | 98.90% | 91.19% |
| DL | 97.35% | 97.28% | 85.71% | 93.33% | 93.44% | 85.56% | 98.00% | 68.97% | 98.38% | 90.89% |
| LR | 97.35% | 96.86% | 85.71% | 91.28% | 90.16% | 86.67% | 93.46% | 73.44% | 98.38% | 90.37% |
| SVM | 98.23% | 97.28% | 83.81% | 92.56% | 91.80% | 84.81% | 98.00% | 71.70% | 98.38% | 90.73% |
| KNN | 94.69% | 97.13% | 83.81% | 93.08% | 90.16% | 86.67% | 96.00% | 71.19% | 95.40% | 89.79% |
| NB | 97.35% | 96.86% | 84.76% | 93.33% | 91.80% | 86.30% | 88.00% | 71.20% | 95.40% | 89.44% |
| DT | 95.58% | 96.00% | 81.90% | 92.82% | 88.52% | 80.00% | 95.30% | 73.42% | 95.40% | 88.77% |
Figure 12Average accuracy achieved by all classification algorithms.
Figure 13Data processing time for cancer, diabetes, heart, hepatitis and kidney disease.
Comparative analysis.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Alam et al. ( | 2019 | Cancer, diabetes, BP, hepatitis, heart, Parkinson's & carcinoma | Random Forest | Information Gain, Gain Ratio & Relief | 99% |
| Atallah et al. ( | 2019 | Kidney transplantation | K-NN | Information Gain & Naive Bayes | 81% |
| Hegde et al. ( | 2020 | Kidney, diabetes & hepatitis | Logistic Regression | Probabilistic Feature Selection | 91.6% |
| Sandhiya et al. ( | 2020 | Breast cancer, diabetes & heart | CNN | Incremental Feature Selection | 93% |
| Arumugam et al. ( | 2021 | Diabetes and breast cancer | Decision tree | Probabilistic Feature Selection | 90% |
| Proposed | 2022 | Diabetes, breast cancer, hepatitis, kidney & heart diseases | ANN | PSO | 99.67% |