| Literature DB >> 35885650 |
Chih-Chi Chang1, Yu-Zhen Li1, Hui-Ching Wu2, Ming-Hseng Tseng1,3.
Abstract
Melanoma, a very severe form of skin cancer, spreads quickly and has a high mortality rate if not treated early. Recently, machine learning, deep learning, and other related technologies have been successfully applied to computer-aided diagnostic tasks of skin lesions. However, some issues in terms of image feature extraction and imbalanced data need to be addressed. Based on a method for manually annotating image features by dermatologists, we developed a melanoma detection model with four improvement strategies, including applying the transfer learning technique to automatically extract image features, adding gender and age metadata, using an oversampling technique for imbalanced data, and comparing machine learning algorithms. According to the experimental results, the improved strategies proposed in this study have statistically significant performance improvement effects. In particular, our proposed ensemble model can outperform previous related models.Entities:
Keywords: feature extraction; imbalanced data; machine learning; melanoma; oversampling techniques; transfer learning
Year: 2022 PMID: 35885650 PMCID: PMC9320570 DOI: 10.3390/diagnostics12071747
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Performance evaluation test results on the models’ melanoma binary classification.
| Authors | Dataset | AUC | ACC | SEN | SPE | PRE | F1 |
|---|---|---|---|---|---|---|---|
| [ | PH2 | NA | 0.861~0.975 | 0.790~0.981 | 0.925~0.938 | NA | NA |
| [ | Subset of PH2 | NA | 0.950 | 0.925 | 0.966 | NA | NA |
| [ | ISIC 2016 | 0.766 | 0.818 | 0.818 | 0.714 | NA | 0.826 |
| [ | ISIC 2017 | 0.870~0.964 | 0.857~0.933 | 0.490~0.933 | 0.872~0.961 | 0.940 | 0.813~0.935 |
| [ | ISIC 2018 | 0.847~0.989 | 0.803~0.931 | 0.484~0.888 | 0.957~0.978 | 0.860~0.905 | 0.491~0.891 |
| [ | Subset of ISIC 2018 | 0.970 | 0.880~0.910 | 0.920~0.960 | NA | 0.840~0.910 | 0.880~0.910 |
| [ | ISIC 2019 | 0.919~0.991 | 0.896~0.924 | 0.483~0.896 | 0.976~0.977 | 0.907 | 0.488~0.898 |
| [ | Subset of ISIC 2019 | NA | 0.930 | 0.925 | 0.933 | NA | NA |
| [ | Combined | 0.880~0.960 | 0.803~0.950 | 0.851~0.930 | 0.844~0.950 | NA | NA |
| [ | MED-NODE | 0.810 | NA | 0.810 | 0.800 | NA | NA |
| [ | Subset of ISBI 2017 | 0.891 | 0.866 | 0.556 | 0.785 | NA | NA |
| NA = Metrics not mentioned in the paper | |||||||
Figure 1MELA-CNN network architecture.
Figure 2Research architecture.
Figure 3Proposed model architecture.
Performance evaluation of five feature extraction techniques.
| Feature Extract | Features | ACC | PRE | REC | AUC | F1 |
|---|---|---|---|---|---|---|
| Handcrafted | 5 | 0.800 | 0.401 | 0.036 | 0.613 | 0.064 |
| MELA-CNN | 256 | 0.913 | 0.837 | 0.693 | 0.830 | 0.756 |
| VGG16 | 512 | 0.814 | 0.569 | 0.189 | 0.738 | 0.282 |
| InceptionResnet V2 | 1536 | 0.822 | 0.655 | 0.204 | 0.752 | 0.309 |
| Inception V3 | 2048 | 0.819 | 0.641 | 0.198 | 0.746 | 0.295 |
Figure 4Comparison of F1-score of five feature extraction techniques.
Performance evaluation of adding metadata.
| Features | ACC | PRE | REC | AUC | F1 |
|---|---|---|---|---|---|
| 5 | 0.800 | 0.401 | 0.036 | 0.613 | 0.064 |
| 7 | 0.821 | 0.582 | 0.327 | 0.789 | 0.415 |
| 256 | 0.913 | 0.837 | 0.693 | 0.830 | 0.756 |
| 258 | 0.926 | 0.844 | 0.764 | 0.865 | 0.800 |
Figure 5F1-score comparison of adding metadata.
Performance evaluation of 10 oversampling techniques.
| Oversampling Technique | ACC | PRE | REC | AUC | F1 |
|---|---|---|---|---|---|
| Original | 0.926 | 0.844 | 0.764 | 0.864 | 0.800 |
| K-Means SMOTE | 0.946 | 0.873 | 0.853 | 0.970 | 0.861 |
| RandomOverSampler | 0.939 | 0.862 | 0.822 | 0.964 | 0.840 |
| SMOTE | 0.937 | 0.833 | 0.849 | 0.966 | 0.839 |
| SVMSMOTE | 0.934 | 0.825 | 0.851 | 0.967 | 0.835 |
| SMOTETomek | 0.934 | 0.829 | 0.844 | 0.967 | 0.835 |
| BorderlineSMOTE | 0.933 | 0.811 | 0.862 | 0.967 | 0.834 |
| SMOTE- RandomUnderSampler | 0.933 | 0.821 | 0.844 | 0.966 | 0.831 |
| SMOTENC | 0.932 | 0.820 | 0.849 | 0.968 | 0.830 |
| SMOTEENN | 0.924 | 0.770 | 0.889 | 0.967 | 0.822 |
| ADASYN | 0.924 | 0.788 | 0.847 | 0.966 | 0.814 |
Figure 6Comparison of F1-scores using 10 oversampling techniques.
Performance evaluation of 13 classifiers with K-means SMOTE.
| Classifiers | ACC | PRE | REC | AUC | F1 |
|---|---|---|---|---|---|
| XGB Classifier | 0.946 | 0.873 | 0.853 | 0.970 | 0.861 |
| Logistic Regression | 0.941 | 0.841 | 0.864 | 0.969 | 0.852 |
| Gradient Boosting | 0.940 | 0.851 | 0.842 | 0.965 | 0.845 |
| Bagging Classifier | 0.939 | 0.837 | 0.851 | 0.965 | 0.845 |
| SVM | 0.939 | 0.859 | 0.833 | 0.968 | 0.844 |
| HistGB Classifier | 0.939 | 0.861 | 0.822 | 0.968 | 0.839 |
| Random Forest | 0.936 | 0.837 | 0.842 | 0.964 | 0.838 |
| MLP | 0.937 | 0.862 | 0.811 | 0.963 | 0.834 |
| AdaBoost | 0.929 | 0.806 | 0.844 | 0.961 | 0.823 |
| K-Neighbors Classifier | 0.925 | 0.808 | 0.816 | 0.922 | 0.809 |
| SGD-LR | 0.922 | 0.783 | 0.836 | 0.956 | 0.806 |
| Decision Tree | 0.911 | 0.759 | 0.804 | 0.871 | 0.780 |
| Gaussian NB | 0.766 | 0.452 | 0.867 | 0.846 | 0.593 |
Figure 7Comparison of F1-score of 13 classifiers with K-means SMOTE.
Figure 8Comparison of ROC curves with different feature extractors.
Figure 9Comparison of PR curves with different feature extractors.
Figure 10ROC curves comparison with different oversampling techniques.
Figure 11PR curves comparison with different oversampling techniques.
Figure 12Comparison of ROC curves with different classifiers.
Figure 13PR curves comparison with different classifiers.
Paired t-test of recall for 5 Features vs. 256 Features.
| Fold | 5 Features REC | 256 Features REC | Difference between REC | Paired |
|---|---|---|---|---|
| 1 | 0.022 | 0.578 | 0.556 | p = 1.81 × 10−9 |
| 2 | 0.111 | 0.622 | 0.511 | |
| 3 | 0.044 | 0.756 | 0.712 | |
| 4 | 0.044 | 0.644 | 0.600 | |
| 5 | 0.044 | 0.800 | 0.756 | |
| 6 | 0.044 | 0.600 | 0.556 | |
| 7 | 0.000 | 0.733 | 0.733 | |
| 8 | 0.022 | 0.689 | 0.667 | |
| 9 | 0.000 | 0.756 | 0.756 | |
| 10 | 0.022 | 0.756 | 0.734 |
Paired t-test of recall for 256 features vs. 258 features.
| Fold | 256 Features REC | 258 Feature REC | Difference between REC | Paired |
|---|---|---|---|---|
| 1 | 0.578 | 0.844 | 0.267 | p = 2.03 × 10−2 |
| 2 | 0.622 | 0.756 | 0.133 | |
| 3 | 0.756 | 0.778 | 0.022 | |
| 4 | 0.644 | 0.644 | 0.000 | |
| 5 | 0.800 | 0.778 | −0.022 | |
| 6 | 0.600 | 0.756 | 0.156 | |
| 7 | 0.733 | 0.733 | 0.000 | |
| 8 | 0.689 | 0.778 | 0.089 | |
| 9 | 0.756 | 0.733 | −0.022 | |
| 10 | 0.756 | 0.844 | 0.089 |
Paired t-test of recall for 258 features w/wo K-Means SMOTE.
| Fold | 258 Features REC | 258 Features with K-Means SMOTE REC | Difference between REC | Paired |
|---|---|---|---|---|
| 1 | 0.844 | 0.933 | 0.089 | p = 7.07 × 10−4 |
| 2 | 0.756 | 0.867 | 0.111 | |
| 3 | 0.778 | 0.778 | 0.000 | |
| 4 | 0.644 | 0.844 | 0.200 | |
| 5 | 0.778 | 0.911 | 0.133 | |
| 6 | 0.756 | 0.889 | 0.133 | |
| 7 | 0.733 | 0.844 | 0.111 | |
| 8 | 0.778 | 0.800 | 0.022 | |
| 9 | 0.733 | 0.800 | 0.067 | |
| 10 | 0.844 | 0.867 | 0.022 |
Paired t-test of F1-score for 5 Features vs. 256 Features.
| Fold | 5 Features F1 | 256 Features F1 | Difference between | Paired |
|---|---|---|---|---|
| 1 | 0.042 | 0.658 | 0.616 | p = 4.56 × 10−10
|
| 2 | 0.185 | 0.718 | 0.533 | |
| 3 | 0.083 | 0.810 | 0.727 | |
| 4 | 0.083 | 0.773 | 0.690 | |
| 5 | 0.083 | 0.818 | 0.735 | |
| 6 | 0.077 | 0.692 | 0.615 | |
| 7 | 0.000 | 0.767 | 0.767 | |
| 8 | 0.042 | 0.713 | 0.671 | |
| 9 | 0.000 | 0.810 | 0.810 | |
| 10 | 0.043 | 0.800 | 0.757 |
Paired t-test of F1-score for 256 features vs. 258 features.
| Fold | 256 Features F1 | 258 Features F1 | Difference between | Paired |
|---|---|---|---|---|
| 1 | 0.658 | 0.826 | 0.168 | p = 3.40 × 10−2 |
| 2 | 0.718 | 0.791 | 0.073 | |
| 3 | 0.810 | 0.833 | 0.024 | |
| 4 | 0.773 | 0.734 | −0.039 | |
| 5 | 0.818 | 0.795 | −0.023 | |
| 6 | 0.692 | 0.810 | 0.117 | |
| 7 | 0.767 | 0.759 | −0.009 | |
| 8 | 0.713 | 0.814 | 0.101 | |
| 9 | 0.810 | 0.815 | 0.005 | |
| 10 | 0.800 | 0.826 | 0.026 |
Paired t-test of F1-score for 258 features w/wo K-Means SMOTE.
| Fold | 258 Features F1 | 258 Features with K-Means SMOTE F1 | Difference between | Paired |
|---|---|---|---|---|
| 1 | 0.826 | 0.913 | 0.087 | p = 3.35 × 10−4 |
| 2 | 0.791 | 0.813 | 0.022 | |
| 3 | 0.833 | 0.843 | 0.010 | |
| 4 | 0.734 | 0.874 | 0.139 | |
| 5 | 0.795 | 0.891 | 0.096 | |
| 6 | 0.810 | 0.870 | 0.060 | |
| 7 | 0.759 | 0.817 | 0.059 | |
| 8 | 0.814 | 0.857 | 0.043 | |
| 9 | 0.815 | 0.857 | 0.042 | |
| 10 | 0.826 | 0.876 | 0.050 |
Performance comparison with Kalwa et al. [28].
| Kalwa et al. (2019) [ | Proposed Model | ||||||
|---|---|---|---|---|---|---|---|
| SVM | XGB Classifier | ||||||
| Holdout (7:3) | Holdout (7:3) | ||||||
| Original | SMOTE | Handcrafted | DL-TL | DL-FE | DL-FE+ | K-Means SMOTE | |
| Number of samples | 200 | 2299 | |||||
| Number of features | 4 | 4 | 5 | 1536 | 256 | 258 | 258 |
| ACC | 0.860 | 0.880 | 0.804 | 0.836 | 0.914 | 0.923 | 0.958 |
| AUC | 0.720 | 0.850 | 0.585 | 0.780 | 0.936 | 0.948 | 0.971 |
| PRE | 0.125 | 0.667 | 0.500 | 0.720 | 0.806 | 0.820 | 0.914 |
| REC | 0.500 | 0.800 | 0.030 | 0.267 | 0.741 | 0.778 | 0.867 |
| F1 | 0.200 | 0.727 | 0.056 | 0.389 | 0.772 | 0.798 | 0.890 |
Performance comparison with Magalhaes et al. [29].
| Magalhaes et al. (2021) [ | Proposed Model | ||||||
|---|---|---|---|---|---|---|---|
| SVM + | XGB Classifier | ||||||
| Holdout (8:2) | Holdout (8:2) | ||||||
| Original | SMOTE | Handcrafted | DL-TL | DL-FE | DL-FE+ | K-Means SMOTE | |
| Number of samples | 287 | 2299 | |||||
| Number of features | 40 | 40 | 5 | 1536 | 256 | 258 | 258 |
| ACC | 0.426 | 0.585 | 0.807 | 0.839 | 0.904 | 0.930 | 0.965 |
| AUC | 0.558 | 0.542 | 0.621 | 0.774 | 0.937 | 0.953 | 0.981 |
| PRE | 0.565 | 0.672 | 0.600 | 0.767 | 0.774 | 0.837 | 0.974 |
| REC | 0.473 | 0.696 | 0.033 | 0.256 | 0.722 | 0.800 | 0.878 |
| F1 | 0.515 | 0.684 | 0.063 | 0.383 | 0.747 | 0.818 | 0.905 |
A comparative summary of the existing techniques for melanoma binary classification.
| Year | Author | Dataset | Non-Me: Me (IR) | Method | Validation | Test Result |
|---|---|---|---|---|---|---|
| 2016 | Nasr | MED-NODE | 100:70 (1.429) | DL | Holdout (8:2) | ACC: 0.810 |
| 2018 | Adjed et al. [ | PH2 | 160:40 | Multiresolution technique + ML | Repeat 1000 times | ACC: 0.861 |
| 2018 | Li et al. [ | ISIC 2018 | 8902:1113 (7.998) | DL + ML | Holdout (7:1:2) | ACC: 0.853 |
| 2019 | Devansh | Combine of | 3063:919 (3.333) | DL | Holdout (85:15) | AUC: 0.880 |
| 2019 | Warsi et al. [ | PH2 | 160:40 | 3D color-texture feature (CTF) + DL | Holdout (70:15:15) | ACC: 0.970 |
| 2019 | Abbes et al. [ | Combine of DermQuest and DermIS | 87:119 (0.731) | FCM + DL | Holdout (NA) | ACC: 0.875 |
| 2019 | Abbas et al. [ | Subset of combining Skin-EDRA, ISIC 2018, DermNet, PH2 | 1420:1380 (1.029) | DL + ML | Holdout (1:1) | ACC: 0.950 |
| 2020 | Almaraz-Damian et al. [ | ISIC 2018 | 8902:1113 (7.998) | DL + ML | Holdout (75:25) | ACC: 0.897 |
| 2020 | Daghrir | Subset of ISIC archive | NA | DL+ML | Holdout (8:2) | ACC: 0.884 |
| 2022 | Iftiaz A. Alf et al. [ | Subset of ISIC 2018 | 1800:1497 (1.202) | DL and ML | Holdout (8:2) | |
| 2022 | Our approach | Subset of combining | 1849:450 (4.109) | DL + ML | Holdout (8:2) | ACC: 0.965 |
| 2022 | Our approach | Subset of combining | 1849:450 (4.109) | DL + ML | Stratified 10-fold Cross-Validation | ACC: 0.941 |