| Literature DB >> 32698547 |
Sushruta Mishra1, Hrudaya Kumar Tripathy1, Pradeep Kumar Mallick1, Akash Kumar Bhoi2, Paolo Barsocchi3.
Abstract
Disease diagnosis is a critical task which needs to be done with extreme precision. In recent times, medical data mining is gaining popularity in complex healthcare problems based disease datasets. Unstructured healthcare data constitutes irrelevant information which can affect the prediction ability of classifiers. Therefore, an effective attribute optimization technique must be used to eliminate the less relevant data and optimize the dataset for enhanced accuracy. Type 2 Diabetes, also called Pima Indian Diabetes, affects millions of people around the world. Optimization techniques can be applied to generate a reliable dataset constituting of symptoms that can be useful for more accurate diagnosis of diabetes. This study presents the implementation of a new hybrid attribute optimization algorithm called Enhanced and Adaptive Genetic Algorithm (EAGA) to get an optimized symptoms dataset. Based on readings of symptoms in the optimized dataset obtained, a possible occurrence of diabetes is forecasted. EAGA model is further used with Multilayer Perceptron (MLP) to determine the presence or absence of type 2 diabetes in patients based on the symptoms detected. The proposed classification approach was named as Enhanced and Adaptive-Genetic Algorithm-Multilayer Perceptron (EAGA-MLP). It is also implemented on seven different disease datasets to assess its impact and effectiveness. Performance of the proposed model was validated against some vital performance metrics. The results show a maximum accuracy rate of 97.76% and 1.12 s of execution time. Furthermore, the proposed model presents an F-Score value of 86.8% and a precision of 80.2%. The method is compared with many existing studies and it was observed that the classification accuracy of the proposed Enhanced and Adaptive-Genetic Algorithm-Multilayer Perceptron (EAGA-MLP) model clearly outperformed all other previous classification models. Its performance was also tested with seven other disease datasets. The mean accuracy, precision, recall and f-score obtained was 94.7%, 91%, 89.8% and 90.4%, respectively. Thus, the proposed model can assist medical experts in accurately determining risk factors of type 2 diabetes and thereby help in accurately classifying the presence of type 2 diabetes in patients. Consequently, it can be used to support healthcare experts in the diagnosis of patients affected by diabetes.Entities:
Keywords: F-Score; attribute optimization; classification; classification accuracy; diabetes; fitness function; genetic algorithm; mutation
Mesh:
Year: 2020 PMID: 32698547 PMCID: PMC7411768 DOI: 10.3390/s20144036
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Overview of Genetic Algorithm.
Comparative analysis of the accuracy rate of similar existing work.
| Authors | Algorithm and Method Used | Accuracy Rate | Year |
|---|---|---|---|
| Goncalves et al. [ | Hierarchical Neuro-fuzzy BSP system | 78.26% | 2006 |
| Polat, K., Gunes, S. & Arslan, A. [ | Generalized Discriminant Analysis (GDA) and Least Square Support Vector Machine (LS-SVM) | 82.05% | 2008 |
| Kahramanli, H. & Allahverdi, N [ | Fuzzy neural network (FNN) | 84.24% | 2008 |
| Hasan Temurtas et al. [ | LM algorithm and a probabilistic neural network | 82.37% | 2009 |
| B.M Patil [ | Hybrid prediction model | 92.38% | 2010 |
| Jarullah, Al. A. [ | Decision tree algorithm | 78% | 2011 |
| E. P. Ephzibah [ | Fuzzy and genetic algorithms | 87% | 2011 |
| A.V.Senthil Kumar, M.Kalpana [ | Intensified Fuzzy Verdict Mechanism | 88.35% | 2011 |
| Aliza Ahmad [ | Pruned Decision tree | 89.3% | 2011 |
| Alexis Marcano-Cedeno [ | Artificial Metaplasticity based MLP | 89.93% | 2011 |
| Chang-Shing Lee and Mei-Hui Wang [ | Fuzzy Expert System | 93.8% | 2011 |
| Giveki, Davar, et al. [ | Attribute Weighted Support Vector Machines (FW-SVMs) | 93.58% | 2012 |
| Sapna. S [ | Fuzzy and GA | 88% | 2012 |
| Koteswara Chari et al. [ | Random forest algorithm | 92.2% | 2019 |
| Fayssal Beloufa and Chikh [ | Modified Artificial Bee Colony | 91.9% | 2013 |
| Aishwarya, S. & Anto, S [ | Gaussian radial basis function | 89.54% | 2014 |
| Wenxin Zhu and Ping Zhong [ | SVM+ | 87.6% | 2014 |
| Srideivanai Nagarajan et al. [ | Random tree | 93% | 2014 |
| Rahman Ali [ | Random committee | 81% | 2014 |
| Vaishali Jain, Supriya Raheja [ | Fuzzy Logic-based Diabetes Diagnosis System(FLDDS) | 87.2% | 2015 |
| Vijayan, V. Veena and C. Anjali. [ | Adaboost | 84.09% | 2015 |
| Harleen Kaur [ | SVM-linear model | 89% | 2018 |
| Piyush Samant [ | Iridology technique | 89.63% | 2018 |
Attribute Details of Pima Indian Diabetes [46].
| Attribute Name | Labelled Value |
|---|---|
| Frequency of Pregnancy | Preg |
| The concentration of Plasma glucose level | Plas |
| Diastolic blood pressure (mm Hg) | Pres |
| The thickness of Triceps skin (mm) | Skin |
| Serum insulin (2-h) | Insu |
| Body mass index (kg/m2) | Mass |
| Diabetes pedigree function | Pedi |
| Age (years) | Age |
| Class label (0 or 1) | Class |
Acronyms discussed in the proposed technique [47].
| Name of Metric | Definition |
|---|---|
|
| Initial Solution Set_Generate |
| Pseudo-code fto generates Initial set of attributes for first round | |
|
| Pseudo-code for computation of Fitness unit fn (x) |
|
| Pseudo-code for Restrict Mutate unit |
|
| Attributes of diabetes dataset at initial stage |
|
| Attribute set after application of Optimized Genetic Search method |
|
| A threshold value of every attribute to identify if diabetes is present or not. |
|
| The merit of individual variable |
|
| Average worth of every attribute |
|
| Optimum Occurrence Rate |
|
| A minimum Occurrence Rate |
|
| Number of 1′s in the attribute column |
|
| Lower indexed 1′s count attribute column |
|
| Higher indexed 1′s count attribute column |
|
| Total number of 0′s in a specific solution. |
|
| Prediction Accuracy |
|
| Fitness (Evaluation) Function |
|
| Misprediction Rate = MPR |
| ff | fitness factor (0.5) |
|
| No. of 0′s in a specific set of solution |
|
| High order bit |
|
| Low order bit |
|
| A metric representing the frequency of crossover. |
|
| Crossover Rate-Mutation Rate |
|
| Data structure used in storing pre-crossover fitness unit values [ |
|
| Data structure used in storing pre-crossover fitness unit values [ |
|
| Ranking order specifying every set of solution based on the fitness function value |
|
| Metric that indicates the frequency of mutation of a chromosome. |
|
| Number of rounds until the algorithm is executed. |
|
| Crossover Mean and Mutation Mean. [ |
Pseudo-code 1 for ISS_Gen (FSinitial, FSfinal). [47].
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pseudo code 2 for Comp_fn(x). [47].
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pseudo-code 3 for Adaptive_CRR-MRR. [47].
|
|
|
|
| for |
|
|
|
|
|
|
Pseudo-code 4 for RS_Mutate [47].
|
|
|
|
|
|
|
|
|
|
|
|
Pseudo-code 5 for EAGA [47].
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pseudo-code 6 for Enhanced Genetic Algorithm (E-GA) [47].
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pseudo-code 7 for Adaptive Genetic Algorithm (A-GA).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 2Proposed Optimized Attribute Selection Method.
Figure 3Classification Model based on our Proposed Attribute Selection method (Enhanced and Adaptive-Genetic Algorithm-Multilayer Perceptron (EAGA-MLP)).
Calculation of Mean value for each column (attribute).
| Preg | Plas | Pres | Skin | Insu | Mass | Pedi | Age | |
|---|---|---|---|---|---|---|---|---|
| 5 | 166 | 72 | 19 | 175 | 25.8 | 0.587 | 51 | |
| 5 | 97 | 60 | 23 | 0 | 28.2 | 0.423 | 22 | |
| 7 | 114 | 66 | 0 | 0 | 32.8 | 0.258 | 42 | |
| 1 | 89 | 76 | 34 | 37 | 32.2 | 0.192 | 23 | |
| 8 | 183 | 64 | 0 | 0 | 23.3 | 0.672 | 32 | |
| 7 | 160 | 54 | 32 | 175 | 30.5 | 0.588 | 39 | |
| 4 | 146 | 85 | 27 | 100 | 28.9 | 0.189 | 27 | |
| 13 | 126 | 90 | 0 | 0 | 43.4 | 0.583 | 42 | |
| 2 | 197 | 70 | 45 | 543 | 30.5 | 0.158 | 53 | |
| 3 | 83 | 58 | 31 | 18 | 34.3 | 0.336 | 25 | |
| 2 | 141 | 58 | 34 | 128 | 25.4 | 0.699 | 24 | |
| 15 | 136 | 70 | 32 | 110 | 37.1 | 0.153 | 43 | |
| 2 | 110 | 74 | 29 | 125 | 32.4 | 0.698 | 27 | |
| 3 | 120 | 70 | 30 | 135 | 42.9 | 0.452 | 30 | |
| 4 | 173 | 70 | 14 | 168 | 29.7 | 0.361 | 35 | |
| Mean | 5 | 136 | 69 | 23 | 114 | 32.1 | 0.425 | 34.4 |
Calculation of 1′s count for each column (attribute).
| Preg | Plas | Pres | Skin | Insu | Mass | Pedi | Age | |
|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | |
| 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | |
| 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | |
| 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | |
| 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | |
| 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | |
| 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | |
| 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | |
| 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | |
| 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | |
| 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | |
| 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | |
| 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | |
| 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | |
| Mean | 5 | 136 | 69 | 23 | 114 | 32.1 | 0.425 | 34.2 |
| 1′s count | 7 | 8 | 9 | 10 | 6 | 7 | 5 | 7 |
Sample Chromosomes are taken at random.
| Preg | Plas | Pres | Skin | Insu | Mass | Age |
|---|---|---|---|---|---|---|
| 1 | 0 | 1 | 1 | 0 | 1 | 0 |
| 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| 0 | 0 | 0 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 1 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0 | 0 | 1 | 0 | 1 | 1 | 1 |
| 1 | 1 | 1 | 1 | 0 | 1 | 0 |
| 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| 0 | 1 | 0 | 1 | 1 | 0 | 0 |
Priority-based chromosome ranking based on Fitness Function.
| Preg | Plas | Pres | Skin | Insu | Mass | Age | f (n) |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 1 | 0 | 1 | 18% |
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 21% |
| 1 | 1 | 1 | 0 | 0 | 1 | 1 | 29% |
| 0 | 0 | 0 | 1 | 1 | 0 | 1 | 33% |
| 1 | 1 | 1 | 1 | 0 | 1 | 0 | 34% |
| 1 | 0 | 0 | 1 | 0 | 0 | 1 | 37% |
| 1 | 1 | 1 | 1 | 0 | 0 | 1 | 38% |
| 0 | 0 | 1 | 0 | 1 | 1 | 1 | 41% |
| 0 | 1 | 0 | 1 | 1 | 0 | 0 | 42% |
| 0 | 1 | 1 | 0 | 1 | 0 | 0 | 48% |
Figure 4Applying 2-point crossover on sample chromosomes.
Recalculation of Fitness function after Crossover on the chromosome set.
| Preg | Plas | Pres | Skin | Insu | Mass | Age | f (n) |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | 1 | 0 | 1 | 25% |
| 1 | 0 | 1 | 1 | 1 | 1 | 1 | 19% |
| 1 | 1 | 0 | 1 | 1 | 1 | 1 | 17% |
| 0 | 0 | 1 | 0 | 0 | 0 | 1 | 44% |
| 1 | 1 | 0 | 1 | 0 | 1 | 0 | 35% |
| 1 | 0 | 1 | 1 | 0 | 0 | 1 | 20% |
| 1 | 1 | 1 | 0 | 1 | 0 | 1 | 21% |
| 0 | 0 | 1 | 1 | 0 | 1 | 1 | 26% |
| 0 | 1 | 0 | 1 | 1 | 0 | 0 | 42% |
| 0 | 1 | 1 | 0 | 1 | 0 | 0 | 48% |
Swapping and Ranking of chromosome set based on Recalculated Fitness function after Crossover.
| Preg | Plas | Pres | Skin | Insu | Mass | Age | f (n) |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | 1 | 1 | 1 | 17% |
| 1 | 1 | 1 | 1 | 1 | 0 | 1 | 18% |
| 1 | 0 | 1 | 1 | 1 | 1 | 1 | 19% |
| 1 | 0 | 1 | 1 | 0 | 0 | 1 | 20% |
| 1 | 1 | 1 | 0 | 1 | 0 | 1 | 21% |
| 1 | 1 | 0 | 1 | 1 | 0 | 1 | 25% |
| 0 | 0 | 1 | 1 | 0 | 1 | 1 | 26% |
| 1 | 1 | 1 | 0 | 0 | 1 | 1 | 29% |
| 1 | 1 | 0 | 1 | 0 | 1 | 0 | 35% |
| 0 | 1 | 0 | 1 | 1 | 0 | 0 | 42% |
1-bit Mutation of the chromosome set.
| Preg | Plas | Pres | Skin | Insu | Mass | Age |
|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | 1 | 1 | 1 |
| 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 1 | 0 | 1 | 1 | 1 | 1 | 1 |
| 1 | 0 | 1 | 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 0 | 1 | 0 | 1 |
| 1 | 1 | 0 | 1 | 1 | 0 | 1 |
| 0 | 0 | 1 | 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 0 | 0 | 1 | 1 |
| 1 | 1 | 0 | 1 | 0 | 1 | 0 |
| 0 | 1 | 0 | 1 | 1 | 0 | 0 |
Applying Restrict Mutate on the Chromosome set in the last generation.
| Preg | Plas | Pres | Skin | Insu | Mass | Age |
|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | 1 | 1 | 1 |
| 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 1 | 1 | 0 | 1 |
| 0 | 0 | 1 | 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 0 | 1 | 0 | 0 |
| 1 | 1 | 0 | 1 | 1 | 1 | 1 |
| 1 | 0 | 1 | 1 | 0 | 0 | 0 |
| 1 | 1 | 1 | 0 | 0 | 1 | 1 |
| 1 | 1 | 1 | 1 | 0 | 1 | 0 |
| 0 | 1 | 0 | 1 | 1 | 0 | 0 |
Final Optimal and Enhanced Attribute set after k generations.
| Preg | Plas | Pres | Skin | Insu | Mass | Age | |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 1 | 1 | 1 | 1 | |
| 0 | 1 | 1 | 1 | 1 | 0 | 1 | |
| 1 | 0 | 0 | 1 | 1 | 0 | 1 | |
| 0 | 0 | 1 | 0 | 1 | 0 | 1 | |
| 1 | 1 | 1 | 1 | 0 | 0 | 0 | |
| 1 | 1 | 0 | 1 | 1 | 1 | 1 | |
| 1 | 0 | 1 | 0 | 1 | 0 | 0 | |
| 1 | 1 | 1 | 0 | 0 | 1 | 1 | |
| 1 | 1 | 1 | 0 | 1 | 1 | 0 | |
| 0 | 1 | 0 | 1 | 1 | 0 | 0 | |
| 1′s count | 7 | 7 | 6 | 6 | 8 | 4 | 6 |
Final Ranking of attributes after the termination of EAGA.
| Preg | Plas | Pres | Skin | Insu | Age | |
|---|---|---|---|---|---|---|
| 7 | 7 | 6 | 6 | 8 | 6 | 1′s count |
| 3 | 2 | 4 | 5 | 1 | 6 | Ranking of Attributes |
Figure 5Comparative analysis of accuracy rate of presented algorithm EAGA with its subcomponents.
Figure 6Comparative analysis of latency of presented algorithm EAGA with its subcomponents.
Figure 7Comparative analysis of Precision of presented Algorithm EAGA with its subcomponents.
Figure 8Comparative analysis of Recall of presented Algorithm EA -GA.
Figure 9Comparative analysis of F-Score of presented Algorithm EAGA with its subcomponents.
Figure 10Comparative analysis of Classification Accuracy w.r.t number of Generations.
Figure 11Prediction Accuracy Rate of EAGA method of Cross-Validation technique.
Comparative analysis of classification accuracy of EAGA with GA, E-GA and A-GA.
| Data Samples Size | Performance | GA-MLP | E-GA-MLP | A-GA-MLP | EAGA-MLP |
|---|---|---|---|---|---|
| 100 | Accuracy (%) | 91.46 | 92.76 | 93.02 | 94.02 |
| Latency (s) | 0.05 | 0.03 | 1.06 | 0.06 | |
| 200 | Accuracy (%) | 94.47 | 94.43 | 94.87 | 95.17 |
| Latency (s) | 0.88 | 0.78 | 0.82 | 0.8 | |
| 300 | Accuracy (%) | 90.98 | 91.98 | 92.32 | 94.32 |
| Latency (s) | 0.99 | 0.93 | 0.95 | 0.75 | |
| 400 | Accuracy (%) | 87.26 | 89.56 | 90.31 | 94.51 |
| Latency (s) | 1.53 | 1.03 | 0.97 | 0.9 | |
| 500 | Accuracy (%) | 89.78 | 91.78 | 89.67 | 91.67 |
| Latency (s) | 1.73 | 1.23 | 1.76 | 1.04 | |
| 600 | Accuracy (%) | 86.33 | 88.65 | 89.22 | 95.22 |
| Latency (s) | 2.07 | 1.77 | 1.89 | 1.89 | |
| 700 | Accuracy (%) | 92.24 | 94.14 | 93.29 | 97.96 |
| Latency (s) | 1.86 | 1.56 | 1.92 | 1.12 |
Parameters for Statistical hypothesis analysis.
| Variable | Description |
|---|---|
| m | Number of data samples in diabetes dataset |
| y1 | Number of correctly classified samples using fuzzy model |
| y2 | Number of correctly classified samples using EAGA-MLP model |
|
| Accuracy obtained using fuzzy model |
|
| Accuracy obtained using EAGA-MLP model |
| S | Test statistic measure |
Figure 12Accuracy comparison of EAGA-MLP model over other literature survey works.
Performance analysis of EAGA-MLP model with different chronic disease datasets.
| Disease Dataset | Attributes Types | Instances | Attributes | Accuracy (%) |
|---|---|---|---|---|
| Pima Indians Diabetes | Integer, Real | 768 | 8 | 97.76 |
| Kidney Disease | Real | 400 | 25 | 94.24 |
| Statlog (Heart) | Categorical, Real | 270 | 13 | 95.12 |
| Breast Cancer | Real | 569 | 32 | 94.56 |
| Arrhythmia | Categorical, Integer | 452 | 279 | 93.76 |
| Hepatitis | Categorical, Integer | 155 | 19 | 94.42 |
| Lung Cancer | Integer | 32 | 56 | 95.36 |
| Parkinson’s | Real | 197 | 23 | 92.68 |
Figure 13Precision analysis of EAGA-MLP model over different chronic disease datasets.
Figure 14Recall analysis of EAGA-MLP model over different chronic disease datasets.
Figure 15F-Score analysis of EAGA-MLP model over different chronic disease datasets.