| Literature DB >> 35328279 |
Iftiaz A Alfi1, Md Mahfuzur Rahman2,3, Mohammad Shorfuzzaman4, Amril Nazir5.
Abstract
A skin lesion is a portion of skin that observes abnormal growth compared to other areas of the skin. The ISIC 2018 lesion dataset has seven classes. A miniature dataset version of it is also available with only two classes: malignant and benign. Malignant tumors are tumors that are cancerous, and benign tumors are non-cancerous. Malignant tumors have the ability to multiply and spread throughout the body at a much faster rate. The early detection of the cancerous skin lesion is crucial for the survival of the patient. Deep learning models and machine learning models play an essential role in the detection of skin lesions. Still, due to image occlusions and imbalanced datasets, the accuracies have been compromised so far. In this paper, we introduce an interpretable method for the non-invasive diagnosis of melanoma skin cancer using deep learning and ensemble stacking of machine learning models. The dataset used to train the classifier models contains balanced images of benign and malignant skin moles. Hand-crafted features are used to train the base models (logistic regression, SVM, random forest, KNN, and gradient boosting machine) of machine learning. The prediction of these base models was used to train level one model stacking using cross-validation on the training set. Deep learning models (MobileNet, Xception, ResNet50, ResNet50V2, and DenseNet121) were used for transfer learning, and were already pre-trained on ImageNet data. The classifier was evaluated for each model. The deep learning models were then ensembled with different combinations of models and assessed. Furthermore, shapely adaptive explanations are used to construct an interpretability approach that generates heatmaps to identify the parts of an image that are most suggestive of the illness. This allows dermatologists to understand the results of our model in a way that makes sense to them. For evaluation, we calculated the accuracy, F1-score, Cohen's kappa, confusion matrix, and ROC curves and identified the best model for classifying skin lesions.Entities:
Keywords: deep learning; diagnosis; interpretability; machine learning; melanoma; skin cancer; stacking model
Year: 2022 PMID: 35328279 PMCID: PMC8947367 DOI: 10.3390/diagnostics12030726
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1Example of malignant and benign skin lesions.
Figure 2The overall design of the proposed system for melanoma skin cancer detection.
Approaches for image augmentation in training dataset.
| Method | Amount |
|---|---|
| rotation_range | 90 |
| shear_range | 0.1 |
| zoom_range | 0.1 |
| horizontal_flip | True |
| vertical_flip | True |
| shuffle | True |
Figure 3Stacking ensemble of machine learning models with “hand-crafted” image features.
Figure 4Deep learning models for melanoma skin cancer detection: (a) schematic diagram of modified pre-trained CNN models, (b) ensemble models to produce an optimal predictive model.
Summary of model parameters and configurations.
| Item Description | Value |
|---|---|
| Epoch Count | 100 |
| Batch Size | 16 |
| Type of Optimizer | SGD with a Nesterov momentum of 0.9 |
| Learning Rate | Initial rate of 0.001 with a decay of 1 × |
| Loss Function | Binary Cross Entropy |
| Image Size | Both 299 × 299 and 224 × 224 |
| Pooling Technique | Global Average Pooling (GAP) |
| Activation Function | Softmax (Classification Head) |
| Pre-trained Weight | ImageNet |
Performance report of method 1.
| Techniques | Accuracy | Precision | Recall | F1-Score | Kappa |
|---|---|---|---|---|---|
| Logistic Regression | 0.84 | 0.83 | 0.87 | 0.84 | 0.69 |
| Random Forest | 0.84 | 0.75 | 0.93 | 0.84 | 0.68 |
| SVM | 0.85 | 0.81 | 0.89 | 0.85 | 0.69 |
| GBM | 0.87 | 0.83 | 0.91 | 0.87 | 0.74 |
| KNN | 0.82 | 0.88 | 0.80 | 0.83 | 0.64 |
| Stacking | 0.88 | 0.84 | 0.92 | 0.88 | 0.76 |
Confusion matrix of method 1.
| Techniques | TP | FP | FN | TN |
|---|---|---|---|---|
| Logistic Regression | 299 | 61 | 42 | 258 |
| Random Forest | 270 | 90 | 18 | 282 |
| SVM | 292 | 68 | 33 | 267 |
| GBM | 302 | 58 | 28 | 272 |
| KNN | 319 | 41 | 76 | 224 |
| Stacking | 305 | 55 | 26 | 274 |
Performance report of method 2.
| Techniques | Accuracy | Precision | Recall | F1-Score | Kappa | AUC |
|---|---|---|---|---|---|---|
| Mobilenet | 0.90 | 0.89 | 0.92 | 0.90 | 0.80 | 0.96 |
| Xception | 0.88 | 0.91 | 0.87 | 0.88 | 0.77 | 0.95 |
| ResNet50 | 0.91 | 0.91 | 0.91 | 0.91 | 0.82 | 0.96 |
| ResNet50V2 | 0.90 | 0.88 | 0.93 | 0.90 | 0.80 | 0.95 |
| DenseNet121 | 0.91 | 0.90 | 0.92 | 0.91 | 0.82 | 0.97 |
| Ensembling (5 models) | 0.91 | 0.91 | 0.92 | 0.91 | 0.82 | 0.97 |
| Ensembling (4 best models) | 0.91 | 0.90 | 0.92 | 0.91 | 0.82 | 0.97 |
| Ensembling (3 best models) | 0.92 | 0.91 | 0.92 | 0.92 | 0.83 | 0.97 |
Confusion matrix of method 2.
| Techniques | TP | FP | FN | TN |
|---|---|---|---|---|
| Mobilenet | 321 | 39 | 27 | 273 |
| Xception | 329 | 31 | 45 | 255 |
| ResNet50 | 331 | 29 | 30 | 270 |
| ResNet50V2 | 317 | 43 | 23 | 277 |
| DenseNet121 | 327 | 33 | 27 | 273 |
| Ensembling (5 models) | 329 | 31 | 27 | 273 |
| Ensembling (4 best models) | 327 | 33 | 26 | 274 |
| Ensembling (3 best models) | 331 | 29 | 26 | 274 |
Figure 5Deep learning models—accuracy, loss, and AUC-ROC curve.
Figure 6AUC-ROC curve of ensembled models.
Paired t-test results (t-test stat, p-value) from the performance comparison of the best ensembling model and other base CNN models.
| Techniques | Accuracy | Precision | Recall | AUC | Kappa |
|---|---|---|---|---|---|
| Mobilenet | ( | ( | (−1.507, 0.182) | ( | ( |
| Xception | ( | (−1.280, 0.229) | ( | ( | ( |
| ResNet50 | ( | (−1.280, 0.229) | ( | ( | ( |
| DenseNet121 | ( | ( | (−1.441, 0.199) | ( | ( |
A comparative summary of the existing techniques for skin lesion classification.
| Research Team | Technique | Dataset | Results |
|---|---|---|---|
| Devansh et al. [ | De-coupled DCGANs | ISIC-2017 dataset | ROC-AUC 0.880, Accuracy 81.6% |
| Vijayalakshmi [ | Convolutional neural network (CNN) and support vector machine (SVM) | ISIC 2018 dataset | Accuracy 85.0% |
| Daghrir et al. [ | CNN and two machine learning classifiers (KNN and SVM) | ISIC 2018 dataset | Accuracy 88.4% |
| Nasr et al. [ | Custom CNN model | MED-NODE dataset consisting of 170 images (70 melanoma and 100 nevi cases) | Accuracy 81.0%, Precision 75.0%, Sensitivity 81.0% |
| Warsi et al. [ | Segmentation and classification approach based on D-optimality orthogonal matching pursuit (DOOMP), fixed wavelet grid network (FWGN) | PH2 dataset | Accuracy 91.8%, Specificity 92.5% |
| Abbes et al. [ | Fuzzy c-means (FCM), deep neural network | A public dataset consisting of 206 lesion images | Accuracy 87.5%, Sensitivity 90.1%, Specificity 84.4% |
| Bi et al. [ | Multi-scale lesion-biased representation (MLR) and joint reverse classification (JRC) | PH2 dataset | Accuracy 92.0%, Sensitivity 87.5%, Specificity 93.1% |
| Yuan and Lo [ | Deep fully convolutional deconvolutional neural networks (CDNNs) to build binary masks for skin lesion segmentation | ISBI 2017 skin lesion dataset | Accuracy 93.4%, Jaccard Index (JA) of 0.765, Sensitivity 82.5% |
| Abuzaghleh et al. [ | Skin lesion segmentation and analysis based on color and shape geometry, SVM classifier | PH2 dataset | One-level accuracy 91.0%, Two-level accuracy 93.2% |
| DeVries and Ramachandram [ | Inception-V3 model | ISIC 2017 dataset | Accuracy 90.3%, AUC 0.943 |
| Our approach | Interpretable deep learning and ensemble stacking of ML models | A small dataset of ISIC 2018 | Accuracy 92.0%, Prec. 91.0%, Recall 92.0%, Kappa 0.83, AUC 0.97 |
Figure 7Interpretation of prediction results for both positive and negative melanoma predictions using the singleton best performing DenseNet121 model.