| Literature DB >> 35684870 |
Amith Khandakar1,2, Muhammad E H Chowdhury1, Mamun Bin Ibne Reaz2, Sawal Hamid Md Ali2, Serkan Kiranyaz1, Tawsifur Rahman1, Moajjem Hossain Chowdhury2, Mohamed Arselene Ayari3,4, Rashad Alfkey5, Ahmad Ashrif A Bakar2, Rayaz A Malik6, Anwarul Hasan7.
Abstract
Diabetes mellitus (DM) is one of the most prevalent diseases in the world, and is correlated to a high index of mortality. One of its major complications is diabetic foot, leading to plantar ulcers, amputation, and death. Several studies report that a thermogram helps to detect changes in the plantar temperature of the foot, which may lead to a higher risk of ulceration. However, in diabetic patients, the distribution of plantar temperature does not follow a standard pattern, thereby making it difficult to quantify the changes. The abnormal temperature distribution in infrared (IR) foot thermogram images can be used for the early detection of diabetic foot before ulceration to avoid complications. There is no machine learning-based technique reported in the literature to classify these thermograms based on the severity of diabetic foot complications. This paper uses an available labeled diabetic thermogram dataset and uses the k-mean clustering technique to cluster the severity risk of diabetic foot ulcers using an unsupervised approach. Using the plantar foot temperature, the new clustered dataset is verified by expert medical doctors in terms of risk for the development of foot ulcers. The newly labeled dataset is then investigated in terms of robustness to be classified by any machine learning network. Classical machine learning algorithms with feature engineering and a convolutional neural network (CNN) with image-enhancement techniques are investigated to provide the best-performing network in classifying thermograms based on severity. It is found that the popular VGG 19 CNN model shows an accuracy, precision, sensitivity, F1-score, and specificity of 95.08%, 95.08%, 95.09%, 95.08%, and 97.2%, respectively, in the stratification of severity. A stacking classifier is proposed using extracted features of the thermogram, which is created using the trained gradient boost classifier, XGBoost classifier, and random forest classifier. This provides a comparable performance of 94.47%, 94.45%, 94.47%, 94.43%, and 93.25% for accuracy, precision, sensitivity, F1-score, and specificity, respectively.Entities:
Keywords: classical machine learning; deep learning; diabetic foot; diabetic foot severity classification; k-mean clustering; machine learning; non-invasive diagnosis technique; thermal change index; thermogram
Mesh:
Year: 2022 PMID: 35684870 PMCID: PMC9185274 DOI: 10.3390/s22114249
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Illustration of the study methodology.
Figure 2Sample of MPA, LPA, MCA, and LCA angiosomes of the foot for control and diabetic groups [37].
Figure 3Detailed framework of thermogram image clustering.
Figure 4Cumulative variance vs. the number of PCA components.
Details of the dataset used for training (with and without augmentation), validation, and testing.
| Classifier | Dataset | Count of Diabetic Thermograms/Cluster | Training Dataset Details | |||
|---|---|---|---|---|---|---|
| Training | Augmented Train Thermogram | Validation | Test | |||
| Severity | Contreras et al. [ | Mild | 43 | 2040 | 3 | 11 |
| Moderate | 48 | 2244 | 4 | 11 | ||
| Severe | 93 | 1806 | 7 | 24 | ||
Figure 5Original thermogram versus enhanced thermogram using AHE and gamma correction for DM and CG [37].
Figure 6Schematic representation of the stacking model architecture.
Figure 7t-SNE plots with the (a) TCI-based classes (class 1–class 5) and output from k-mean clustering with (b) k = 2, (c) k = 3, (d) k = 4, and (e) k = 5.
Figure 8Diabetic thermograms classified into three severities: mild, moderate, and severe.
Figure 9Class 1, 2, 3, 4, and 5 distribution in the K-mean clustering categories—mild, moderate, and severe.
Performance metrics for the best-performing combinations using Approach 1.
| Classifier | Feature Selection | # of Feature | Class | Accuracy | Precision | Sensitivity | F1-Score | Specificity | Inference Time (ms) |
|---|---|---|---|---|---|---|---|---|---|
| XGBoost | Random Forest | 25 | Mild | 92.59 ± 6.80 | 91.53 ± 7.23 | 94.74 ± 5.80 | 93.1 ± 6.58 | 91.94 ± 7.07 | 0.397 |
| Moderate | 92.59 ± 6.47 | 86.15 ± 8.53 | 88.89 ± 7.76 | 87.5 ± 8.17 | 93.89 ± 5.91 | ||||
| Severe | 92.59 ± 4.61 | 96.64 ± 3.17 | 93.50 ± 4.34 | 95.04 ± 3.82 | 91.67 ± 4.86 | ||||
| Overall | 92.59 ± 3.29 | 92.72 ± 3.26 | 92.59 ± 3.29 | 92.63 ± 3.28 | 92.31 ± 3.34 |
Shortlisted features based on the six feature ranking techniques.
| Feature | Pearson | Chi-Square | RFE | Logistics | Random Forest | LightGBM | Total |
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| 6 |
|
|
|
|
|
|
|
| 6 |
|
|
|
|
|
|
|
| 6 |
|
|
|
|
|
|
|
| 6 |
|
|
|
|
|
|
|
| 6 |
|
|
|
|
|
|
|
| 6 |
|
|
|
|
|
|
|
| 6 |
|
|
|
|
|
|
|
| 6 |
|
|
|
|
|
|
| 5 | |
|
|
|
|
|
|
| 5 | |
|
|
|
|
|
|
| 5 | |
|
|
|
|
|
| 4 | ||
|
|
|
|
|
| 4 | ||
|
|
|
|
|
| 4 | ||
|
|
|
|
|
| 4 | ||
|
|
|
|
|
| 4 | ||
|
|
|
|
|
| 4 | ||
|
|
|
|
|
| 4 | ||
|
|
|
|
|
| 4 |
Class-wise performance metrics for the top 3 machine learning classifiers and the stacked classifier.
| Classifier | Class | Accuracy | Precision | Sensitivity | F1-Score | Specificity | Inference Time (ms) |
|---|---|---|---|---|---|---|---|
| Gradient Boost |
| 92.01 ± 4.98 | 91.38 ± 5.15 | 92.98 ± 4.69 | 92.17 ± 4.93 | 91.71 ± 5.06 | 0.379 |
|
| 92.01 ± 4.73 | 84.50 ± 6.32 | 86.51 ± 5.97 | 85.49 ± 6.15 | 93.92 ± 4.17 | ||
|
| 92.01 ± 3.37 | 96.30 ± 2.35 | 94.35 ± 2.87 | 95.32 ± 2.63 | 89.58 ± 3.80 | ||
|
| 92.01 ± 3.40 | 92.10 ± 3.38 | 92.01 ± 3.40 | 92.04 ± 3.40 | 91.20 ± 3.55 | ||
| XGBoost | Mild | 93.24 ± 4.61 | 90.08 ± 5.49 | 95.61 ± 3.76 | 92.77 ± 4.76 | 92.51 ± 4.83 | 0.336 |
| Moderate | 93.24 ± 4.38 | 89.26 ± 5.41 | 85.71 ± 6.11 | 87.45 ± 5.78 | 95.86 ± 3.48 | ||
| Severe | 93.24 ± 3.13 | 96.75 ± 2.21 | 95.97 ± 2.45 | 96.36 ± 2.33 | 90.42 ± 3.66 | ||
| Overall | 93.24 ± 3.15 | 93.26 ± 3.15 | 93.24 ± 3.15 | 93.22 ± 3.15 | 92.31 ± 3.34 | ||
| Random Forest | Mild | 91.80 ± 5.04 | 89.19 ± 5.7 | 86.84 ± 6.21 | 88.00 ± 5.97 | 93.32 ± 4.58 | 0.327 |
| Moderate | 91.80 ± 4.79 | 90.43 ± 5.14 | 82.54 ± 6.63 | 86.31 ± 6.00 | 95.03 ± 3.80 | ||
| Severe | 91.80 ± 3.41 | 93.51 ± 3.07 | 98.79 ± 1.36 | 96.08 ± 2.42 | 84.58 ± 4.49 | ||
| Overall | 91.80 ± 3.44 | 91.71 ± 3.46 | 91.80 ± 3.44 | 91.67 ± 3.47 | 89.32 ± 3.88 | ||
| Stacking (Gradient Boost + XGBoost + Random Forest) | Mild | 94.47 ± 4.20 | 91.53 ± 5.11 | 94.74 ± 4.10 | 93.10 ± 4.65 | 94.39 ± 4.23 | 0.379 + 0.336 + 0.327 = 1.042 |
| Moderate | 94.47 ± 3.99 | 92.44 ± 4.62 | 87.30 ± 5.81 | 89.80 ± 5.29 | 96.96 ± 3.00 | ||
| Severe | 94.47 ± 2.85 | 96.81 ± 2.19 | 97.98 ± 1.75 | 97.39 ± 1.98 | 90.83 ± 3.59 | ||
| Overall | 94.47 ± 2.87 | 94.45 ± 2.87 | 94.47 ± 2.87 | 94.43 ± 2.88 | 93.25 ± 3.15 |
Two-dimensional CNN five-fold testing performance of severity classifier.
| Enhancement | Network | Class | Accuracy | Precision | Sensitivity | F1-Score | Specificity | Inference Time (ms) |
|---|---|---|---|---|---|---|---|---|
| Original | VGG 19 | Mild | 98.77 ± 2.86 | 98.21 ± 3.44 | 96.49 ± 4.78 | 97.34 ± 4.18 | 99.47 ± 1.88 | 7.271 |
| Moderate | 94.67 ± 5.55 | 86.76 ± 8.37 | 93.65 ± 6.02 | 90.07 ± 7.39 | 95.03 ± 5.37 | |||
| Severe | 95.90 ± 3.49 | 97.50 ± 2.75 | 94.35 ± 4.06 | 95.90 ± 3.49 | 97.5 ± 2.75 | |||
| Overall | 94.76 ± 2.82 | 94.89 ± 2.76 | 94.67 ± 2.82 | 94.73 ± 2.80 | 97.32 ± 2.03 | |||
| AHE | VGG 19 | Mild | 98.77 ± 2.86 | 96.55 ± 4.74 | 98.25 ± 3.4 | 97.39 ± 4.14 | 98.93 ± 2.67 | 8.161 |
| Moderate | 95.08 ± 5.34 | 90.48 ± 7.25 | 90.48 ± 7.25 | 90.48 ± 7.25 | 96.69 ± 4.42 | |||
| Severe | 96.31 ± 3.32 | 96.75 ± 3.12 | 95.97 ± 3.46 | 96.36 ± 3.3 | 96.67 ± 3.16 | |||
| Overall | 95.08 ± 2.71 | 95.08 ± 2.71 | 95.09 ± 2.71 | 95.08 ± 2.71 | 97.2 ± 2.07 | |||
| Gamma Correction | VGG 19 | Mild | 88.11 ± 8.40 | 90.91 ± 7.46 | 87.72 ± 8.52 | 89.29 ± 8.03 | 88.24 ± 8.36 | 9.651 |
| Moderate | 88.11 ± 7.99 | 75.71 ± 10.59 | 84.13 ± 9.02 | 79.70 ± 9.93 | 89.50 ± 7.57 | |||
| Severe | 88.11 ± 5.70 | 94.12 ± 4.14 | 90.32 ± 5.20 | 92.18 ± 4.73 | 85.83 ± 6.14 | |||
| Overall | 88.11 ± 4.06 | 88.62 ± 3.99 | 88.11 ± 4.06 | 88.28 ± 4.04 | 87.34 ± 4.17 |
Figure 10AUC for the original and best-performing AHE thermogram in severity classification.
Figure 11Comparison of F1-score between original and AHE-enhanced thermogram images using 2D severity classifier.