| Literature DB >> 31662787 |
V R Elgin Christo1, H Khanna Nehemiah2, B Minu3, A Kannan4.
Abstract
A framework for clinical diagnosis which uses bioinspired algorithms for feature selection and gradient descendant backpropagation neural network for classification has been designed and implemented. The clinical data are subjected to data preprocessing, feature selection, and classification. Hot deck imputation has been used for handling missing values and min-max normalization is used for data transformation. Wrapper approach that employs bioinspired algorithms, namely, Differential Evolution, Lion Optimization, and Glowworm Swarm Optimization with accuracy of AdaBoostSVM classifier as fitness function has been used for feature selection. Each bioinspired algorithm selects a subset of features yielding three feature subsets. Correlation-based ensemble feature selection is performed to select the optimal features from the three feature subsets. The optimal features selected through correlation-based ensemble feature selection are used to train a gradient descendant backpropagation neural network. Ten-fold cross-validation technique has been used to train and test the performance of the classifier. Hepatitis dataset and Wisconsin Diagnostic Breast Cancer (WDBC) dataset from University of California Irvine (UCI) Machine Learning repository have been used to evaluate the classification accuracy. An accuracy of 98.47% is obtained for Wisconsin Diagnostic Breast Cancer dataset, and 95.51% is obtained for Hepatitis dataset. The proposed framework can be tailored to develop clinical decision-making systems for any health disorders to assist physicians in clinical diagnosis.Entities:
Mesh:
Year: 2019 PMID: 31662787 PMCID: PMC6778924 DOI: 10.1155/2019/7398307
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1System framework.
Outline of hepatitis datasets.
| S. no. | Feature | Description | Datatype |
|---|---|---|---|
| 1. | Age | Age of the patient | Numerical |
| 2 | Sex | Gender of the patient | Categorical |
| 3 | Steroid | Whether the patient has taken anabolic steroids or not | Boolean |
| 4 | Antivirals | Whether the patient has taken antivirals or not | Boolean |
| 5 | Fatigue | Whether the patient has experienced extreme tiredness or not | Boolean |
| 6 | Malaise | Whether the patient is having a vague feeling of body discomfort | Boolean |
| 7 | Anorexia | Whether the patient has lack or loss of appetite for food | Boolean |
| 8 | Liver big | Whether the patient's liver is enlarged or not | Boolean |
| 9 | Liver firm | Whether the patient's liver is firm or not | Boolean |
| 10 | Spleen palpable | Whether the patient's spleen is enlarged or not | Boolean |
| 11 | Spiders | Whether the blood vessels are near the skin surface due to the increased estrogen level. | Boolean |
| 12 | Ascites | Whether the fluid is accumulated in the peritoneal cavity or not | Boolean |
| 13 | Varices | Whether the patient is having bleeding from varices or not | Boolean |
| 14 | Bilirubin | The amount of bilirubin in the blood sample | Numerical |
| 15 | Alk phosphate | Level of alkane phosphate in the blood sample | Numerical |
| 16 | Sgot | The amount of serum lutamic oxalo acetic transaminase in the blood | Numerical |
| 17 | Albumin | The amount of serum albumin protein in the clear liquid portion of the blood sample | Numerical |
| 18 | Protime | Time taken for blood plasma to clot | Numerical |
| 19 | Histology | Class attribute indicates whether the patient survives or not | Boolean |
Outline of WDBC dataset.
| S. no | Feature | Description | Data type |
|---|---|---|---|
| 1 | ID | Patient identification number | Numerical |
| 2 | Diagnosis | Malignant or benign | Character |
|
| |||
| 3 | Radius (mean) | Mean of distances from centre to points on the perimeter | Real |
| 4 | Radius (error) | Real | |
| 5 | Radius (worst) | Real | |
|
| |||
| 6 | Texture (mean) | Standard deviation of grey scale values | Real |
| 7 | Texture (error) | Real | |
| 8 | Texture (worst) | Real | |
|
| |||
| 9 | Perimeter (mean) | Perimeter of cell nucleus | Real |
| 10 | Perimeter (error) | Real | |
| 11 | Perimeter (worst) | Real | |
|
| |||
| 12 | Area (mean) | Area of cell | Real |
| 13 | Area (error) | Real | |
| 14 | Area (worst) | Real | |
|
| |||
| 15 | Smoothness (mean) | Local variation in radius lengths | Real |
| 16 | Smoothness (error) | Real | |
| 17 | Smoothness (worst) | Real | |
|
| |||
| 18 | Compactness (mean) | (perimeter^2/area—1.0) | Real |
| 19 | Compactness (error) | Real | |
| 20 | Compactness (worst) | Real | |
|
| |||
| 21 | Concavity (mean) | Severity of concave portions of the contour | Real |
| 22 | Concavity (error) | Real | |
| 23 | Concavity (worst) | Real | |
|
| |||
| 24 | Concave (mean) | Number of concave portions of the contour | Real |
| 25 | Concave (error) | Real | |
| 26 | Concave (worst) | Real | |
|
| |||
| 27 | Symmetry (mean) | Measure of cell symmetry | Real |
| 28 | Symmetry (error) | Real | |
| 29 | Symmetry (worst) | Real | |
|
| |||
| 30 | Fractal dimension (mean) | (“Coastline approximation”—1) | Real |
| 31 | Fractal dimension (error) | Real | |
| 32 | Fractal dimension (worst) | Real | |
Parameter setting for Glowworm Swarm Optimization.
| Parameter | Value |
|---|---|
|
| 0.4 |
|
| 0.6 |
|
| 0.08 |
|
| 5 |
|
| 0.03 |
|
| 5 |
Parameter setting for BPNN.
| Parameter | Value | Meaning |
|---|---|---|
|
| Features selected by correlation-based ensemble feature selector | Number of input nodes |
|
| (2 | Number of hidden nodes |
|
| 1 | Hidden layer |
|
| Linear | Output |
Initial weights and bias are randomly assigned with small random variables ranging from −0.5 to 0.5, and the learning rate is kept as 0.5.
Feature importance of Hepatitis dataset.
| S. no. | Feature | Feature importance | Rank |
|---|---|---|---|
| 1 | Age | 0.335503 | 3 |
| 2 | Sex | 0.014356 | 15 |
| 3 | Steroid | 0.011443 | 16 |
| 4 | Antivirals | 0.033793 | 11 |
| 5 | Fatigue | 0.022269 | 13 |
| 6 | Malaise | 0.019788 | 14 |
| 7 | Anorexia | 0.010061 | 17 |
| 8 | Liver big | 0.007907 | 18 |
| 9 | Liver firm | 0.032248 | 12 |
| 10 | Spleen palpable | 0.036895 | 10 |
| 11 | Spiders | 0.095853 | 9 |
| 12 | Ascites: | 0.099008 | 8 |
| 13 | Varices | 0.110238 | 7 |
| 14 | Bilirubin | 0.202373 | 6 |
| 15 | Alk phosphate | 0.532782 | 1 |
| 16 | SGOT | 0.511008 | 2 |
| 17 | Albumin | 0.21938 | 5 |
| 18 | Protime | 0.294931 | 4 |
Feature importance of WDBC dataset.
| S. no. | Feature | Feature importance | Rank |
|---|---|---|---|
| 1 | ID | 0.852635 | 24 |
| 2 | Radius (mean) | 0.860782 | 22 |
| 3 | Radius (error) | 0.835712 | 27 |
| 4 | Radius (worst) | 0.926704 | 10 |
| 5 | Texture (mean) | 0.928031 | 8 |
| 6 | Texture (error) | 0.776179 | 29 |
| 7 | Texture (worst) | 0.909129 | 16 |
| 8 | Perimeter (mean) | 0.93506 | 2 |
| 9 | Perimeter (error) | 0.94209 | 1 |
| 10 | Perimeter (worst) | 0.735037 | 30 |
| 11 | Area (mean) | 0.836177 | 26 |
| 12 | Area (error) | 0.933734 | 5 |
| 13 | Area (worst) | 0.864297 | 20 |
| 14 | Smoothness (mean) | 0.931545 | 6 |
| 15 | Smoothness (error) | 0.925377 | 11 |
| 16 | Smoothness (worst) | 0.93505 | 3 |
| 17 | Compactness (mean) | 0.923189 | 12 |
| 18 | Compactness (error) | 0.928030 | 9 |
| 19 | Compactness (worst) | 0.858593 | 23 |
| 20 | Concavity (mean) | 0.818137 | 28 |
| 21 | Concavity (error) | 0.917486 | 14 |
| 22 | Concavity (worst) | 0.900307 | 17 |
| 23 | Concave (mean) | 0.863435 | 21 |
| 24 | Concave (error) | 0.898584 | 18 |
| 25 | Concave (worst) | 0.935045 | 4 |
| 26 | Symmetry (mean) | 0.719719 | 31 |
| 27 | Symmetry (error) | 0.918347 | 13 |
| 28 | Symmetry (worst) | 0.930219 | 7 |
| 29 | Fractal dimension (mean) | 0.914832 | 15 |
| 30 | Fractal dimension (error) | 0.845395 | 25 |
| 31 | Fractal dimension (worst) | 0.891554 | 19 |
Features selected for hepatitis dataset.
| Age | Sex | Steroid | Antivirals | Fatigue | Malaise | Anorexia | Liver_big | Liver_firm | Spleen_palpable | Spiders | Ascites | Varices | Bilirubin | Alk_phosphate | Sgot | Albumin | Histology | Class | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DE | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 |
| GSO | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 |
| LION | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Features selected for WDBC dataset.
| Correlation-based feature selector | LION | GSO | DE | |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | P_id |
| 1 | 1 | 1 | 0 | Mean_radius |
| 1 | 1 | 1 | 1 | Mean_texture |
| 0 | 1 | 1 | 0 | Mean_perimeter |
| 1 | 1 | 1 | 1 | Mean_area |
| 1 | 1 | 1 | 0 | Mean_smoothness |
| 1 | 1 | 1 | 0 | Mean_compactness |
| 1 | 1 | 1 | 0 | Mean_concavity |
| 1 | 1 | 1 | 1 | Concavepoints_mean |
| 1 | 1 | 1 | 1 | Mean_symmetry |
| 1 | 1 | 1 | 0 | Mean_fractaldimension |
| 1 | 1 | 1 | 1 | Standard_error_radius |
| 1 | 1 | 0 | 1 | Standard_error_texture |
| 0 | 1 | 1 | 0 | Standard_error_perimeter |
| 0 | 0 | 0 | 0 | Standard_error_area |
| 0 | 0 | 0 | 1 | Standard_error_smoothness |
| 1 | 1 | 1 | 0 | Standard_error_compactness |
| 0 | 0 | 0 | 0 | Standard_error_concavity |
| 0 | 0 | 0 | 1 | Concavepoints_standard_error |
| 0 | 1 | 0 | 0 | Standard_error_symmetry |
| 0 | 0 | 0 | 1 | Standard_error_fractaldimension |
| 0 | 0 | 0 | 1 | Worst_radius |
| 1 | 1 | 1 | 0 | Worst_texture |
| 0 | 0 | 0 | 1 | Worst_perimeter |
| 0 | 1 | 1 | 1 | Worst_area |
| 1 | 1 | 1 | 1 | Worst_smoothness |
| 1 | 0 | 1 | 1 | Worst_compactness |
| 1 | 1 | 0 | 1 | Worst_concavity |
| 1 | 1 | 0 | 1 | Concavepoints_worst |
| 1 | 1 | 1 | 0 | Worst_symmetry |
| 1 | 0 | 1 | 1 | Worst_fractaldimension |
| 1 | 1 | 1 | 1 | Diagnosis |
Figure 2Comparison of classifier accuracy achieved by changing the number of hidden nodes for Hepatitis dataset.
Figure 3Comparison of classifier accuracy achieved by changing the number of hidden nodes for WDBC dataset.
Confusion matrix for proposed framework used to train and test the Hepatitis dataset.
| Predicted | |||
| Expected | Nonfatal | Fatal | |
| Nonfatal | 38 (TN) | 2 (FP) | |
| Fatal | 3 (FN) | 39 (TP) | |
Confusion matrix for proposed framework used to train and test the WDBC dataset.
| Predicted | |||
| Expected | Benign | Malignant | |
| Benign | 118 (TN) | 2 (FP) | |
| Malignant | 1 (FN) | 116 (TP) | |
Performance evaluation of the proposed framework.
| Measure | WDBC (%) | Hepatitis (%) |
|---|---|---|
| Accuracy | 98.734 | 93.902 |
| Precision | 98.305 | 95.121 |
| Sensitivity | 99.145 | 92.857 |
| Specificity | 98.333 | 95 |
Performance of correlation-based ensemble feature selector and individual feature selector for Hepatitis dataset.
| Measure | Proposed work (%) | DE (%) | GSO (%) | Lion (%) |
|---|---|---|---|---|
| Accuracy | 93.902 | 91.46 | 92.6 | 92.68 |
| Precision | 95.121 | 92.68 | 95 | 95.12 |
| Sensitivity | 92.857 | 90.69 | 90.47 | 90.69 |
| Specificity | 95 | 92.5 | 95 | 94.87 |
Performance of correlation-based ensemble feature selector and individual feature selector for WDBC dataset.
| Measure | Proposed work (%) | DE (%) | GSO (%) | Lion (%) |
|---|---|---|---|---|
| Accuracy | 98.734 | 97.03 | 97.45 | 97.45 |
| Precision | 98.305 | 95.86 | 96.66 | 95.90 |
| Sensitivity | 99.145 | 98.30 | 98.30 | 99.15 |
| Specificity | 98.333 | 95.76 | 96.61 | 95.76 |
Performance comparison of proposed work with other classifiers for Hepatitis dataset.
| Measure | Naive Bayes | J48 | Decision table | AdaBoostMI | Multilayer | Random forest | Proposed work |
|---|---|---|---|---|---|---|---|
| Accuracy | 0.8387 | 0.8064 | 0.7806 | 0.8064 | 0.8452 | 0.8323 | 0.93902 |
| Precision | 0.8450 | 0.7980 | 0.7810 | 0.7980 | 0.8390 | 0.8250 | 0.95121 |
| Sensitivity | 0.8390 | 0.8060 | 0.7810 | 0.9417 | 0.8450 | 0.8819 | 0.92857 |
| Specificity | 0.9083 | 0.8661 | 0.8618 | 0.8661 | 0.8898 | 0.8320 | 0.95 |
Performance comparison of proposed work with other classifiers for WDBC dataset.
| Measure | Naive Bayes | J48 | Decision table | AdaBoostMI | Multilayer perceptron | Random forest | Proposed work |
|---|---|---|---|---|---|---|---|
| Accuracy | 0.9297 | 0.9312 | 0.9350 | 0.9472 | 0.9630 | 0.9666 | 0.98734 |
| Precision | 0.9300 | 0.9320 | 0.9350 | 0.9470 | 0.9630 | 0.9670 | 0.98305 |
| Sensitivity | 0.9300 | 0.9310 | 0.9350 | 0.9417 | 0.9630 | 0.9670 | 0.99145 |
| Specificity | 0.8716 | 0.8950 | 0.9268 | 0.9470 | 0.9569 | 0.9707 | 0.98333 |