| Literature DB >> 35912265 |
Yinhao Wu1, Bin Chen2, An Zeng3, Dan Pan4, Ruixuan Wang5, Shen Zhao1.
Abstract
Skin cancer is one of the most dangerous diseases in the world. Correctly classifying skin lesions at an early stage could aid clinical decision-making by providing an accurate disease diagnosis, potentially increasing the chances of cure before cancer spreads. However, achieving automatic skin cancer classification is difficult because the majority of skin disease images used for training are imbalanced and in short supply; meanwhile, the model's cross-domain adaptability and robustness are also critical challenges. Recently, many deep learning-based methods have been widely used in skin cancer classification to solve the above issues and achieve satisfactory results. Nonetheless, reviews that include the abovementioned frontier problems in skin cancer classification are still scarce. Therefore, in this article, we provide a comprehensive overview of the latest deep learning-based algorithms for skin cancer classification. We begin with an overview of three types of dermatological images, followed by a list of publicly available datasets relating to skin cancers. After that, we review the successful applications of typical convolutional neural networks for skin cancer classification. As a highlight of this paper, we next summarize several frontier problems, including data imbalance, data limitation, domain adaptation, model robustness, and model efficiency, followed by corresponding solutions in the skin cancer classification task. Finally, by summarizing different deep learning-based methods to solve the frontier challenges in skin cancer classification, we can conclude that the general development direction of these approaches is structured, lightweight, and multimodal. Besides, for readers' convenience, we have summarized our findings in figures and tables. Considering the growing popularity of deep learning, there are still many issues to overcome as well as chances to pursue in the future.Entities:
Keywords: convolutional neural network; deep learning; generative adversarial networks; image classification; skin cancer
Year: 2022 PMID: 35912265 PMCID: PMC9327733 DOI: 10.3389/fonc.2022.893972
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 5.738
A summary of the current review related to skin cancer classification.
| Ref. | Title | Venue | Remarks |
|---|---|---|---|
| ( | Skin Cancer Classification Using Convolutional Neural Networks: Systematic Review | Journal of Medical Internet Research | This study presents a detailed overview of studies on using CNNs to classify skin lesions. |
| ( | Techniques and algorithms for computer aided diagnosis of pigmented skin lesions—A review | Biomedical Signal Processing and Control | This paper gives a review of the recent developments in skin lesion classification using dermoscopic images. |
| ( | Classification of Skin cancer using deep learning, Convolutional Neural Networks -Opportunities and vulnerabilities-A systematic Review | International Journal for Modern Trends in Science and Technology | This article reviews the development of deep learning for skin cancer classification tasks. |
| ( | Machine Learning in Dermatology: Current Applications, Opportunities, and Limitations | Dermatology and Therapy volume | This paper reviews the fundamentals of machine learning and its wide range of applications in dermatology. |
| ( | Artificial intelligence-based image classification methods for diagnosis of skin cancer: Challenges and opportunities | Computers in Biology and Medicine | This review discusses the developments in AI-based methods for skin cancer diagnosis, as well as challenges and future directions to enhance them. |
| ( | Skin cancer classification | European Journal of Cancer | This paper analyses studies comparing AI–based skin cancer classifiers with dermatologists. |
| ( | Skin Cancer Detection: A Review Using Deep Learning Techniques | International Journal of Environmental Research and Public Health | This paper provides a review of deep learning-based methods for early diagnosis of skin cancer. |
| ( | Integrating Patient Data Into Skin Cancer Classification Using Convolutional Neural Networks: Systematic Review | Journal of Medical Internet Research | This review summarizes the latest CNN-based methods in skin lesion classification by utilizing image data and patient data. |
| ( | Skin disease diagnosis with deep learning: A review | Neurocomputing | This paper analyses several deep learning algorithms for diagnosing skin diseases from a variety of perspectives based on the challenges at hand. |
Figure 1Examples of three types of dermatological images of BCC to show their differences and relationships: (A) Clinical image. (B) Dermoscopy image. (C) Histopathological image.
Characteristics of different skin-disease datasets.
| Dataset | No. of images | Modality of images | No. of lesion types | Image format | Published year | Goal of publication |
|---|---|---|---|---|---|---|
| PH2 | 200 | Dermoscopic | 3 | .bmp | 2013 | To facilitate the development of computer-aided diagnosis systems in the segmentation and classification of melanoma. |
| MED-NODE | 170 | Macroscopic | 2 | .jpg | 2015 | To build and evaluate the MED-NODE system for detecting skin cancer with dermoscopic images. |
| HAM10000 | 10,015 | Dermoscopic | 8 | .jpg | 2018 | To address the small size and insufficient diversity of images in the skin-disease dataset. |
| Derm7pt | 2,000 | Dermoscopic | 15 | .jpg | 2018 | As a database for the analysis of a seven-point malignant checklist for skin lesions. |
| BCN20000 | 19,424 | Dermoscopic | 9 | .jpg | 2019 | Used to analyze skin cancer lesions in hard-to-diagnose locations such as nails and mucous membranes. |
| ISIC Archive | >13,000 | Dermoscopic | 9 | .jpg, DICOM | 2016–2020 | To reduce skin cancer mortality while promoting the development and use of digital skin imaging. |
References of skin cancer classification with typical CNN frameworks.
| Ref. | Dataset | CNN Architecture | Highlights | Limitations | Performance |
|---|---|---|---|---|---|
| ( | Self-collected dataset | Deep Belief Network, SVM | By combining deep belief networks and SVM classifiers to handle skin cancer diagnosis tasks with limited datasets, as well as outliers and erroneous data. | The generalization ability of the model is limited. | Accuracy: 0.89 |
| ( | Self-collected dataset | Resnet-34, ResNet-50 | Proposed how to improve deep learning-based dermoscopy classification and dataset creation. | Data from more modalities, such as the patient’s medical history, information on other symptoms, are not considered. | Accuracy: 0.85 |
| ( | Online repositories and the Stanford University Medical Center | Inception-v3 | Used a CNN framework to train a large-scale skin disease dataset and achieve superior results on par with dermatologists. The method was also developed for mobile devices. | More research is required to assess its performance in clinical practice. At the same time, this method is limited to some extent by the amount of data. | Accuracy: 0.6375 (avg.) |
| ( | MED-NODE | Deep CNN | Compared with previous methods, it directly used CNN to automatically extract features for skin disease images, also had a higher classification accuracy. | Due to the large noise interference of clinical images, there are still some misclassifications. | Accuracy: 0.81 |
| ( | ISIC-2016 | VGG-16 | It reduces the training time of the model by using the transfer learning strategy while obtaining higher sensitivity and precision. | It is prone to overfitting due to the limited amount of training images. | Accuracy: 0.813 |
| ( | ISIC-2017, IAD | Inception-v2 | Introducing sonification into the diagnosis of skin cancer lesions to improve the sensitivity of the model. | Differences in the diagnosis of pathologists can affect the prediction results of the model. | AUC: 0.976 |
| ( | ISIC-2017 | DenseNet, Dual Path Nets Inception-v4, Inception-ResNet-v2MobileNetV2, PNASNet, ResNet | By analyzing 13 factors from 9 different models, they systematically evaluated the factors influencing the choice of CNN structure. | The dataset used in this article is too limited, and it only focuses on the melanoma classification task. | Top accuracy: 0.827 |
| ( | IAD | VGG-19 | Adopted VGG-19 network to evaluate the thickness of melanoma for the first time. | There are no more pre-training methods utilized for comparison, and precisely predicting melanoma thickness would be more clinically significant. | Accuracy: 0.872 |
| ( | Derm7pt | Inception-v3 | A multi-task network was designed to classify the seven-point checklist and skin disease diagnosis. Different loss functions were also designed to handle different input modalities, such as clinical and dermoscopic images, and patient diagnostic results. | Some criteria of the 7-point checklist are unable to be distinguished. | Accuracy: 0.737 |
| ( | HAM10000 | Deep CNN models | Proposed a method combining CNN with one-versus-all (OVA) for skin disease classification. | The model has not been tested on datasets from various domains and may have a large variance. | Accuracy: 0.929 |
| ( | HAM10000 | ResNeXt, SeResNeXt, DenseNet | Adopted a grid search strategy to find the best ensemble learning methods for skin cancer classification. | The amount of training data is still insufficient, and most of models employed in ensemble learning are from the same network architecture. | Accuracy: 0.88 |
Different methods for improving model efficiency.
| Ref. | Dataset | Highlights | Limitations | Performance |
|---|---|---|---|---|
| ( | Self-collected | Proposed a knowledge distillation method to transfer knowledge between various models simultaneously. | The proposed method sacrifices local accuracy for higher global accuracy, with some additional classification errors on local objects. | Accuracy: 0.75 |
| ( | Public repositories | Proposed a MobileNet-based classification method and successfully deployed it on an Android application. | To improve the model’s classification accuracy, more sophisticated sampling strategies and data preprocessing can be adopted. | Accuracy: 0.944 |
| ( | HAM10000 | Presented an assessment of the effectiveness for the attention module and self-attention module in skin cancer classification based on ResNet architecture. | Only limited number of attention mechanisms are used for comparison. | Accuracy: 0.622 (attention) |
| Accuracy: 0.737 (self-attention) | ||||
| ( | HAM10000 | Proposed a weight pruning strategy for lightweight neural networks to make up for the accuracy loss and improve model performance and reliability in the skin cancer classification. | The proposed pruning method is only validated on the skin disease dataset, and more kinds of medical images are needed to validate its effectiveness. | Accuracy: 0.975 |
| SH-11 | AUC: 0.931 | |||
| ( | HAM10000, PH2, Dermofit | Designed a new pruning method “MergePrune” to reduce the computational cost of retraining the network by combining pruning and training into a single stage. | To assess this strategy, more domain data is needed, such as clinical images, patient meta-data. | Accuracy: 0.776 (avg.) |
| Derm7pt, MSK, UDA | ||||
| ( | ISIC-2017 | Proposed a classification method that incorporated the attention residual learning (ARL) mechanism to EfficientNet for skin cancer diagnoses. | The interpretability of the model needs to be further strengthened. | Accuracy: 0.873 |
| AUC: 0.867 | ||||
| ( | ISIC-2017 | Three different lightweight networks MobileNet, MobileNetV2, and NASNetMobile were were evaluated for skin cancer classification. | The number of lightweight networks and hyperparameters used for testing are relatively restricted. | Accuracy: 0.82 |
| Precision: 0.812 | ||||
| ( | ISIC-2017, PH2 | Proposed an MT-TransUNet network to segment and classify skin lesions simultaneously. | The model finds it difficult with low-contrast skin disease images, and its segmentation performance is vulnerable to occlusions in the skin image. | Accuracy: 0.912 |
| ( | PH2, DermQuest | Built a pruning framework to simplify the complicated architectures by choosing the most informative color channels in skin lesion detection. Also, it carried out a hardware-level analysis of the complexity of different skin cancer classification networks. | The proposed method works well for simple networks, but it may not perform as well for more complicated networks. | Accuracy: 0.9811 ( |
| Accuracy: 0.9892 (DermQuest) | ||||
| ( | SD-198, SD-260 | Proposed a knowledge distillation method based on curriculum training to distinguish herpes zoster from other skin diseases. | It requires manual tuning of hyperparameters according to different models and datasets. | Accuracy: 0.935 |
| ( | DermIS, DermQuest, | Proposed an expert system “i-Rash” based on SqueezeNet to classify four skin diseases. | More clinical data and skin-disease images are needed to further improve the generalization of the model. | Accuracy: 0.972 |
| Sensitivity: 0.944 | ||||
| Specificity: 0.981 |
Different methods for solving data imbalance and data limitation.
| Ref. | Dataset | Highlights | Limitations | Performance |
|---|---|---|---|---|
| ( | ISIC-2017 | By coupling seven GANs to generate seven skin-disease images. At the same time, they improved the efficiency of the model by making the initial layers of GANs share the same parameters. | The model was unable to distinguish the lesion area well when it was mixed with the skin surface, and artifacts such as human hair can also affect the generation of new images. | Accuracy: 0.816 |
| ( | ISIC-2018 | Proposed a GAN architecture that was customized to the style of skin lesions. At the same time, it can generate higher resolution and more diverse skin disease images by adjusting the progressive growth structure of the generator and discriminator in the GAN network. | The content of the GAN-generated synthetic dataset was not complicated enough when compared with the original dataset, and it was also not diverse enough. | Accuracy: 0.952 |
| Sensitivity: 0.832 | ||||
| Specificity: 0.743 | ||||
| ( | ISIC-2018 | Utilized conditional generative adversarial networks (CGAN) to extract key information from all layers to generate skin lesion images with different textures and shapes while ensuring the stability of training. | The amount of data used for training was relatively limited. | Accuracy: 0.941 |
| Precision: 0.915 | ||||
| Recall: 0.799 | ||||
| ( | ISIC Archive | Explored four types of data augmentation methods and a multiple layers augmentation method in melanoma classification. | The data augmentation methods evaluated in this paper were limited and not validated on a large amount of datasets. | Accuracy: 0.829 |
| ( | HAM10000 | They adopted a variational autoencoder network to get domain-dependent noise vectors. Also, a student-like distribution was employed to increase image diversity, and an auxiliary classifier was used to create images of certain classes. | Due to the specificity of medical images, different image generation models may generate skin disease images that did not belong to the same class. | Accuracy: 0.925 |
| ( | HAM10000 | It combined the attention mechanism with PGGAN to obtain global features of skin lesions images, also introduced the Two-Timescale Update Rule to generate features with high fine-grainedness, while increasing the stability of GAN. | Due to the limitation of hardware conditions, this data augmentation method was only evaluated on the resolution of 256 × 256, rather than the original resolution of 600 × 450 in HAM10000 dataset. | AUC: 0.793 |
| ( | HAM10000 | Proposed a class-weighted loss function and a focal loss to overcome the problem of data imbalance. | There is no artifact removal for the images in the training dataset, which leads the model to be biased. Also, it has a relatively high computational complexity. | Accuracy: 0.93 |
| Recall: 0.86 | ||||
| ( | HAM10000 | A novel loss function was combined with the balanced mini-batch logic of the data level to alleviate the imbalance problem of the dermatology dataset. | The classification accuracy for rare skin diseases with limited data needs to be improved further. | Accuracy: 0.8997 |
| ISIC-2019 | ||||
| ( | HAM10000 | Proposed a two-stage technique for determining the appropriate augmentation procedure for mobile devices. | Given the particularity of lightweight CNN, more data augmentation methods and data need to be considered to alleviate the problem of overfitting. | Accuracy: 0.853 |
| ( | PAD-UFES | Designed two algorithms based on evolutionary algorithm and also applied weighted loss function and oversampling to alleviate the problem of data imbalance. | A larger dataset was necessary to improve the performance further. | Accuracy: 0.92 |
| Recall: 0.94 | ||||
| ( | PH2 | Proposed novel a data augmentation method based on a oversampling technique (SMOTE). | The proposed data augmentation method was not validated in the deep learning architectures, and experiments on larger datasets were also required. | Accuracy: 0.922 |
| Sensitivity: 0.808 | ||||
| Specificity: 0.951 |
Different methods for improving model generalization ability and robustness.
| Ref. | Dataset | Highlights | Limitations | Performance |
|---|---|---|---|---|
| ( | DermIS | Investigated the advantages of large-scale supervised pre-training for medical imaging applications. | In addition to the analysis of the weights and features of the model, it is necessary to conduct a comprehensive analysis of other features such as network structure to explore the importance of pre-training. | Accuracy: 0.871 (DermIS) |
| Accuracy: 0.974 (DermQuest) | ||||
| ( | HAM10000 | Proposed transfer learning and adversarial learning in skin disease classification to improve the generalization ability of models to new samples and reduce cross-domain shift. | When the data domain and target domain are significantly different, the method’s overall accuracy suffers. | Accuracy: 0.909 |
| AUC: 0.967 | ||||
| ( | HAM10000 | Performed adversarial training on MobileNet and VGG-16 using the innovative attacking models FGSM and PGD for skin cancer classification. | The number of datasets tested for this experiment is very limited, and there may be local optimizations. | Accuracy: 0.7614 |
| ( | ISIC-2016 | Proposed a comprehensive deep learning framework combining adversarial training and transfer learning for melanoma classification. At the same time, focal loss was introduced to iteratively optimize the network to better learn hard samples. | This method does not consider more types of skin diseases, and it had a high computational cost. | Accuracy: 0.812 |
| Sensitivity: 0.918 | ||||
| ( | ISIC2017 | Presented a Multi-view Filtered Transfer Learning approach to extract useful information from the original samples for domain adaption, thereby improving representation ability for skin disease image. | The effectiveness of this domain adaptation method should be validated on more dermatology datasets. | Accuracy: 0.918 |
| AUC: 0.879 | ||||
| ( | ISBI-2017, PH2 | Proposed an adversarial training method combined with attention module to enhance the robustness of the model in skin-disease classification and segmentation. | Due to the limited amount of training data and the unclear boundaries of skin disease images, the model still suffers from under-segmentation and over-segmentation. | Accuracy: 0.968 |
| Sensitivity: 0.962 | ||||
| Specificity: 0.941 | ||||
| ( | ISIC-2018 | Using seven universal adversarial perturbations to investigate the vulnerability of the classification model. | This method does not perform adversarial training on more skin disease datasets, so the robustness of its model needs to be further improved. | Accuracy: 0.873 |
| ( | ISIC-2019 | Proposed Monte Carlo dropout, Ensemble MC dropout, and Deep Ensemble for uncertainty quantification. | Further optimization of the robustness of the model is required, and the model should also be tested for noise detection to provide a confidence score. | Accuracy: 0.90 |
| AUC: 0.945 | ||||
| ( | ISIC Archive | Proposed a transfer learning method to address the shortage of data in skin lesion images. Also, they utilized a hybrid deep CNN model to accurately extract features and ensure training stability while avoiding overfitting. | The model requires a considerable amount of computational resources while also lacking the diversity of domains. | Accuracy: 0.853 |
| F1 score: 0.891 | ||||
| ( | HAM10000, Dermofit, | Proposed to improve the generalization performance of the model by combining data augmentation and domain alignment. | Due to the privacy of medical images, this trained model may underperform on ethnic groups with a small proportion of the population. | Accuracy: 0.670 |
| ( | Skin7, Skin40 | To increase the method’s overall performance, better pre-training of the extractor can be investigated. | Mean class recall: 0.65 |