| Literature DB >> 35808463 |
Mohammad Fraiwan1, Esraa Faouri1.
Abstract
Skin cancer (melanoma and non-melanoma) is one of the most common cancer types and leads to hundreds of thousands of yearly deaths worldwide. It manifests itself through abnormal growth of skin cells. Early diagnosis drastically increases the chances of recovery. Moreover, it may render surgical, radiographic, or chemical therapies unnecessary or lessen their overall usage. Thus, healthcare costs can be reduced. The process of diagnosing skin cancer starts with dermoscopy, which inspects the general shape, size, and color characteristics of skin lesions, and suspected lesions undergo further sampling and lab tests for confirmation. Image-based diagnosis has undergone great advances recently due to the rise of deep learning artificial intelligence. The work in this paper examines the applicability of raw deep transfer learning in classifying images of skin lesions into seven possible categories. Using the HAM1000 dataset of dermoscopy images, a system that accepts these images as input without explicit feature extraction or preprocessing was developed using 13 deep transfer learning models. Extensive evaluation revealed the advantages and shortcomings of such a method. Although some cancer types were correctly classified with high accuracy, the imbalance of the dataset, the small number of images in some categories, and the large number of classes reduced the best overall accuracy to 82.9%.Entities:
Keywords: deep learning; image classification; melanoma; skin cancer; skin lesions
Mesh:
Year: 2022 PMID: 35808463 PMCID: PMC9269808 DOI: 10.3390/s22134963
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1A graphical abstract of the general steps used in this paper.
The mean overall Accuracy, F-score, Precision, Recall, and Specificity for each deep learning model and 70/30 data split.
| Model | F1 Score | Precision | Recall | Specificity | Accuracy |
|---|---|---|---|---|---|
| SqueezeNet | 51.2% | 63.1% | 49.6% | 93.8% | 71.7% |
| GoogLeNet | 55.4% | 63.2% | 53.4% | 94.3% | 74.0% |
| Inceptionv3 | 61.5% | 65.5% | 60.7% | 94.5% | 74.2% |
| DenseNet201 | 64.8% | 70.9% | 62.7% | 94.7% | 75.8% |
| MobileNetv2 | 61.0% | 67.2% | 58.4% | 94.1% | 75.6% |
| Resnet101 | 64.3% | 67.6% | 63.8% | 95.0% | 76.7% |
| Resnet50 | 63.4% | 68.5% | 62.4% | 94.7% | 74.4% |
| Resnet18 | 59.3% | 64.7% | 57.8% | 94.6% | 75.3% |
| Xception | 60.9% | 66.5% | 59.2% | 94.7% | 75.4% |
| Inception-ResNet-v2 | 61.4% | 65.3% | 60.8% | 94.4% | 75.5% |
| ShuffleNet | 60.6% | 64.9% | 58.7% | 93.5% | 74.6% |
| DarkNet-53 | 61.9% | 66.8% | 61.9% | 94.5% | 71.6% |
| EfficientNetb0 | 57.6% | 70.3% | 53.7% | 94.1% | 73.8% |
Figure 2Sample confusion matrix for Resnet101 model and 70/30 data split.
Figure 3Sample training/validation progress curve for Resnet101 and 70/30 data split.
The mean overall accuracy, F-score, precision, recall, and specificity for each deep learning model and an 80/20 data split.
| Model | F1 Score | Precision | Recall | Specificity | Accuracy |
|---|---|---|---|---|---|
| SqueezeNet | 52.6% | 64.0% | 50.8% | 93.4% | 68.0% |
| GoogLeNet | 56.2% | 70.0% | 53.8% | 93.4% | 68.5% |
| Inceptionv3 | 61.1% | 64.2% | 62.6% | 94.0% | 68.8% |
| DenseNet201 | 66.1% | 74.7% | 63.3% | 94.3% | 73.5% |
| MobileNetv2 | 61.5% | 65.9% | 60.2% | 93.9% | 73.0% |
| Resnet101 | 62.3% | 69.0% | 62.2% | 94.2% | 70.2% |
| Resnet50 | 63.2% | 71.7% | 61.8% | 93.9% | 67.7% |
| Resnet18 | 62.2% | 64.7% | 63.2% | 93.8% | 69.6% |
| Xception | 56.1% | 61.3% | 55.9% | 94.0% | 70.2% |
| Inception-ResNet-v2 | 58.5% | 63.9% | 59.7% | 93.8% | 67.4% |
| ShuffleNet | 61.2% | 70.2% | 57.8% | 93.3% | 70.0% |
| DarkNet-53 | 61.4% | 70.7% | 58.5% | 93.6% | 70.2% |
| EfficientNetb0 | 56.0% | 69.8% | 52.6% | 93.6% | 72.2% |
Figure 4Sample confusion matrix for DenseNet201 model and 80/20 data split.
Figure 5Sample training/validation progress curve for DenseNet201 and 80/20 data split.
The mean overall accuracy, F-score, precision, recall, and specificity for each deep learning model and 90/10 data split.
| Model | F1 Score | Precision | Recall | Specificity | Accuracy |
|---|---|---|---|---|---|
| SqueezeNet | 52.7% | 67.1% | 48.0% | 92.7% | 75.0% |
| GoogLeNet | 54.5% | 64.2% | 53.1% | 94.5% | 73.4% |
| Inceptionv3 | 67.9% | 69.9% | 70.1% | 95.3% | 79.3% |
| DenseNet201 | 74.4% | 78.5% | 73.6% | 96.0% | 82.9% |
| MobileNetv2 | 63.5% | 68.8% | 63.4% | 94.8% | 74.9% |
| Resnet101 | 71.7% | 71.1% | 74.5% | 96.3% | 81.2% |
| Resnet50 | 67.8% | 72.6% | 68.3% | 95.5% | 77.8% |
| Resnet18 | 67.9% | 72.3% | 68.3% | 95.1% | 79.0% |
| Xception | 59.5% | 65.0% | 58.5% | 94.4% | 72.1% |
| Inception-ResNet-v2 | 64.4% | 66.6% | 66.8% | 94.8% | 73.9% |
| ShuffleNet | 65.8% | 74.0% | 61.8% | 94.3% | 79.0% |
| DarkNet-53 | 66.3% | 70.0% | 66.1% | 95.1% | 80.8% |
| EfficientNetb0 | 61.3% | 73.4% | 57.0% | 94.7% | 76.7% |
Figure 6Sample confusion matrix for DarkNet-53 model and 90/10 data split.
Figure 7Sample training/validation progress curve for DarkNet-53 and 90/10 data split.
The mean training and validation times for all algorithms and data split strategies. All times are in seconds.
| Data Split | 70/30 | 80/20 | 90/10 |
|---|---|---|---|
| Model | |||
| SqueezeNet | 377.0 | 400.4 | 422.6 |
| GoogLeNet | 726.8 | 795 | 855.0 |
| Inceptionv3 | 2182.9 | 2419.9 | 2655.2 |
| DenseNet201 | 7190.8 | 7884.7 | 8686.6 |
| MobileNetv2 | 3266.3 | 3678.5 | 4028.5 |
| Resnet101 | 2196.5 | 2449.5 | 2682.7 |
| Resnet50 | 992.2 | 1100.0 | 1192.9 |
| Resnet18 | 413.6 | 439.9 | 470.0 |
| Xception | 9076.2 | 10,111.1 | 11,094.8 |
| Inception-ResNet-v2 | 6698.0 | 7495.4 | 8254.3 |
| ShuffleNet | 2386.9 | 2641.0 | 2916.0 |
| DarkNet-53 | 1761 | 1974.6 | 2126.3 |
| EfficientNet-b0 | 5432.4 | 6028.4 | 6737.5 |
A summary of the latest literature in automatic skin lesion classification.
| Study | Objective | Dataset | Approach | Performance |
|---|---|---|---|---|
| Li et al. (2020) [ | Two-class classification: melanoma and seborrheic keratosis | 600 images | Mid-level features and segmentation according to ROI | Area under the receiver-operating characteristic curve, ResNet (89.00%), DenseNet (88.85%), Fusion(90.67%) |
| Pezhman Pour and Seker [ | Lesion segmentation | 3879 images | Dermoscopic feature segmentation using CNN | 2% and 7% improvement in Jaccard index and sensitivity, respectively |
| Al-masni et al. [ | Three-class classification: melanoma, benign, and seborrheic keratosis | 2950 images | Segmentation using FrCN | Segmentation accuracy of 95.62% (clinical benign cases), 90.78% (melanoma, and 91.29% (seborrheic keratosis) |
| Dash et al. [ | Three-class classification: moderate, severe, and very severe | 6267 images | Segmentation using modified U-Net architecture | 93.03% Dice coefficient, 94.8% accuracy, 89.6% sensitivity, and 97.60% specificity |
| Xie et al. [ | Segmentation into two semantic classes: lesion and background | 1479 images | Segmentation of dermoscopy images preserving edge details | Jaccard indices of 0.783, 0.858, and 0.857 |
| Serte et al. [ | Two-class classification: melanoma and seborrheic keratosis | 2000 images | Gabor wavelet-based deep learning model for melanoma and seborrheic keratosis | Average area under the receiver-operating characteristic curve, 91% |
| Li et al. [ | Optimal hair removal (reduce over/under removal) | 1751 dermoscopic images with hair occlusion | Digital hair removal from images of skin lesion using CNN | Accuracy (99.08%), Specificity (99.85%), F1 score (94.43%), precision (99.09%), sensitivity (95.74%) |
| This work | Seven-class classification | 10015 dermoscopic images | Deep transfer learning of a CNN | Accuracy (82.9%) |
Figure 8Sample confusion matrix for Resnet101 model, 70/30 data split and 50 epochs of training.
Figure 9Sample confusion matrix for DenseNet201 model, 80/20 data split and 50 epochs of training.
Figure 10Sample confusion matrix for DarkNet-53 model, 90/10 data split and 50 epochs of training.