| Literature DB >> 35054203 |
Rafaela Carvalho1, Ana C Morgado1, Catarina Andrade1, Tudor Nedelcu1, André Carreiro1, Maria João M Vasconcelos1.
Abstract
Teledermatology has developed rapidly in recent years and is nowadays an essential tool for early diagnosis. In this work, we aim to improve existing Teledermatology processes for skin lesion diagnosis by developing a deep learning approach for risk prioritization with a dataset of retrospective data from referral requests of the Portuguese National Health System. Given the high complexity of this task, we propose a new prioritization pipeline guided and inspired by domain knowledge. We explored automatic lesion segmentation and tested different learning schemes, namely hierarchical classification and curriculum learning approaches, optionally including additional patient metadata. The final priority level prediction can then be obtained by combining predicted diagnosis and a baseline priority level accounting for explicit expert knowledge. In both the differential diagnosis and prioritization branches, lesion segmentation with 30% tolerance for contextual information was shown to improve classification when compared with a flat baseline model trained on original images; furthermore, the addition of patient information was not beneficial for most experiments. Curriculum learning delivered better results than a flat or hierarchical approach. The combination of diagnosis information and a knowledge map, created in collaboration with dermatologists, together with the priority achieved interesting results (best macro F1 of 43.93% for a validated test set), paving the way for new data-centric and knowledge-driven approaches.Entities:
Keywords: curriculum learning; domain knowledge; hierarchical learning; risk prioritization; skin lesion classification; teledermatology
Year: 2021 PMID: 35054203 PMCID: PMC8775114 DOI: 10.3390/diagnostics12010036
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1Overview of the methodology followed in this work towards the prediction of dermatological case risk prioritization.
DermAI train and test dataset distribution considering the differential diagnosis and priority level (N—Normal; P—Priority; HP—High Priority; Tot—Total).
| Class | Train Dataset | Test1 Dataset | Test2 Dataset | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| N | P | HP | Tot | N | P | HP | Tot | N | P | HP | Tot | |
| 1 SebKer | 893 | 54 | 2 | 949 | 226 | 8 | 3 | 237 | 12 | 8 | 3 | 23 |
| 2 ActKer | 350 | 56 | 10 | 416 | 89 | 14 | 0 | 103 | 10 | 14 | 0 | 24 |
| 3 Nev | 444 | 46 | 5 | 495 | 114 | 8 | 0 | 122 | 11 | 8 | 0 | 19 |
| 4 MolCont | 54 | 2 | 0 | 56 | 7 | 8 | 0 | 15 | 2 | 8 | 0 | 10 |
| 5 Haem | 41 | 13 | 2 | 56 | 7 | 6 | 1 | 14 | 6 | 6 | 1 | 13 |
| 6 UncNeop | 162 | 32 | 2 | 196 | 37 | 4 | 9 | 50 | 5 | 4 | 9 | 18 |
| 7 Dermfib | 103 | 9 | 0 | 112 | 22 | 5 | 1 | 28 | 5 | 5 | 1 | 11 |
| 8 SLent | 38 | 1 | 0 | 39 | 7 | 2 | 0 | 9 | 7 | 2 | 0 | 9 |
| 9 PenFib | 77 | 14 | 0 | 91 | 21 | 1 | 1 | 23 | 11 | 1 | 1 | 13 |
| 10 VWart | 137 | 17 | 0 | 154 | 36 | 1 | 1 | 38 | 11 | 1 | 1 | 13 |
| 11 OtMlNp | 47 | 21 | 25 | 93 | 3 | 6 | 14 | 23 | 0 | 6 | 14 | 20 |
| 12 BCC | 8 | 37 | 0 | 45 | 1 | 7 | 3 | 11 | 1 | 7 | 3 | 11 |
| 13 MM | 13 | 9 | 22 | 44 | 0 | 0 | 8 | 8 | 0 | 0 | 8 | 8 |
| Total | 2367 | 311 | 68 | 2746 | 570 | 70 | 41 | 681 | 81 | 70 | 41 | 192 |
Figure 2Illustrative examples of lesions from the DermAI dataset with different priority levels.
Knowledge map distribution per differential diagnosis.
| Class | Differential Diagnosis | Mean (%) | Std (%) | ||||
|---|---|---|---|---|---|---|---|
| N | P | HP | N | P | HP | ||
| 1 SebKer | Seborrheic Keratosis | 80 | 17 | 3 | 0 | 7.7 | 7.7 |
| 2 ActKer | Actinic Keratosis | 57 | 30 | 13 | 19 | 12 | 21 |
| 3 Nev | Nevus, Non-neoplastic | 67 | 30 | 3 | 19 | 23 | 8 |
| 4 MolCont | Molluscum Contagiosum | 60 | 40 | 0 | 57 | 45 | 12 |
| 5 Haem | Haemangioma | 67 | 23 | 10 | 31 | 21 | 12 |
| 6 UncNeop | Neoplasm Unc. Behavior | 40 | 50 | 10 | 39 | 28 | 12 |
| 7 Drmfib | Dermatofibroma | 84 | 13 | 3 | 17 | 12 | 8 |
| 8 SLent | Solar Lentigo | 84 | 13 | 3 | 17 | 12 | 8 |
| 9 PenFib | Pendulum Fibroma | 93 | 7 | 0 | 15 | 15 | 0 |
| 10 VWart | Viral Warts | 87 | 10 | 3 | 12 | 12 | 8 |
| 11 OtMalNeop | Other Malignant Neoplasm | 9 | 42 | 49 | 17 | 23 | 21 |
| 12 BCC | Basal Cell Carcinoma | 26 | 57 | 17 | 24 | 22 | 8 |
| 13 MM | Malignant Melanoma | 0 | 10 | 90 | 0 | 12 | 12 |
Figure 3Examples of original and cropped images with different tolerances. (a) Original image. (b) Cropped with no tolerance. (c) Cropped with 10% tolerance. (d) Cropped with 30% tolerance. (e) Cropped with 50% tolerance.
Figure 4Hierarchies considered for skin lesion diagnosis. (a) Two-level hierarchy. (b) Three-level hierarchy.
Metrics score of test set 1 for segmentation effect in lesion classification (in %).
| Experiment | Accuracy | Weighted F1 | Macro F1 |
|---|---|---|---|
| Original images | 37.59 | 40.49 | 26.11 |
| Cropped with no tolerance | 34.80 | 37.63 | 26.66 |
| Cropped with 10% tolerance | 33.77 | 36.63 | 25.02 |
| Cropped with 30% tolerance |
|
| 27.50 |
| Cropped with 50% tolerance | 37.59 | 41.36 |
|
Resulting metrics and confusion matrix of test set 1 for a differential diagnosis model trained on cropped images with 30% tolerance (in %).
| Classes | Sensitivity | Precision | F1-Score |
|---|---|---|---|
| 1 SebKer | 43.46 | 68.67 | 53.23 |
| 2 ActKer | 44.66 | 63.89 | 52.57 |
| 3 Nev | 36.07 | 44.44 | 39.82 |
| 4 MolCont | 33.33 | 17.24 | 22.73 |
| 5 Haem | 14.29 | 14.29 | 14.29 |
| 6 UncNeop | 16.00 | 17.78 | 16.84 |
| 7 Drmfib | 64.29 | 25.00 | 36.00 |
| 8 SLent | 11.11 | 7.14 | 8.70 |
| 9 PenFib | 26.09 | 30.00 | 27.91 |
| 10 VWart | 57.89 | 36.67 | 44.90 |
| 11 OtMalNeop | 13.03 | 8.57 | 10.34 |
| 12 BCC | 9.09 | 2.44 | 3.85 |
| 13 MM | 62.50 | 16.67 | 26.32 |
Metrics score of test set 1 for differential diagnosis approaches using cropped images with 30% tolerance (in %).
| Experiment | Accuracy | Weighted F1 | Macro F1 |
|---|---|---|---|
| Nedelcu et al. [ | 42.71 | 44.04 | 28.65 |
| Flat | 38.77 | 41.40 | 27.50 |
| B-CNN (Two-level) | 40.23 | 42.53 | 30.52 |
| B-CNN (Three-level) | 41.70 | 43.45 | 29.95 |
| Curriculum Learning |
|
|
|
Resulting metrics of test set 1 for the differential diagnosis model trained with curriculum learning (in %), considering images (img), or images and metadata (img + meta) as the input.
| Classes | Sensitivity | Precision | F1-Score | |||
|---|---|---|---|---|---|---|
| Img | Img + Meta | Img | Img + Meta | Img | Img + Meta | |
| 1 SebKer |
| 54.01 | 68.12 |
|
| 60.95 |
| 2 ActKer | 62.14 |
|
| 53.62 |
| 61.41 |
| 3 Nev |
| 43.44 |
| 47.75 |
| 45.49 |
| 4 MolCont |
| 26.67 | 31.58 |
|
| 29.63 |
| 5 Haem | 7.14 |
| 25.00 |
| 11.11 |
|
| 6 UncNeop | 14.00 |
| 16.67 |
| 15.22 |
|
| 7 Drmfib |
| 50.00 |
| 31.11 |
| 38.36 |
| 8 SLent | 11.11 | 11.11 | 4.00 |
| 5.88 |
|
| 9 PenFib | 17.39 | 17.39 |
| 28.57 |
| 21.62 |
| 10 VWart |
| 55.26 |
| 38.89 |
| 45.65 |
| 11 OtMalNeop |
| 13.04 |
| 11.11 |
| 12.00 |
| 12 BCC | 18.18 |
| 13.33 |
| 15.38 |
|
| 13 MM |
| 25.00 | 33.33 |
|
| 30.77 |
Figure 5Confusion matrices of test set 1 for the differential diagnosis model trained with curriculum learning. (a) Input: image. (b) Input: image and metadata.
Metrics score of test set 1 for the differential diagnosis learning schemes with metadata (in %).
| Experiment | Accuracy | Weighted F1 | Macro F1 |
|---|---|---|---|
| Flat + Metadata | 36.56 | 39.66 | 26.07 |
| B-CNN (Two-level) + Metadata | 39.65 | 41.36 | 27.91 |
| B-CNN (Three-level) + Metadata | 42.00 | 44.22 | 30.90 |
| Curriculum Learning + Metadata |
|
|
|
Results concerning some related skin lesion diagnosis studies.
| Study | Dataset | Modalities | Classes | Acc. | Sens. | Spec. | Weight-F1 | Macro-F1 | AUC |
|---|---|---|---|---|---|---|---|---|---|
| Esteva et al. [ | Several | CI, DI | 9 | 55.40 | - | - | - | - | - |
| Menegola et al. [ | EDRA | DI | 3 | - | - | - | - | - | 84.50 |
| Yap et al. [ | Private | MI, DI, Md | 5 | 72.00 | - | - | - | - | - |
| Kawahara et al. [ | EDRA | CI, DI, Md | 5 | - | 60.40 | 91.00 | - | - | 89.60 |
| Fisher et al. [ | Dermofit | CI | 10 | 87.10 | - | - | - | - | - |
| Barata et al. [ | ISIC 2017 | DI | 3 | - | - | - | - | - | 87.40 |
| Nedelcu et al. [ | EDRA | MI, DI, Md | 5 | 79.20 | 63.80 | 92.60 | - | - | - |
| Nedelcu et al. [ | DermAI | MI, AI | 13 | 42.71 | - | - | 44.04 | 28.65 | - |
| Curriculum Learning (proposed) | DermAI | MI, AI | 13 | 48.75 | 48.75 | 91.10 | 49.64 | 33.55 | - |
1 EDRA 7-Point Criteria Evaluation, 2 ISBI Challenge/ISIC Skin Lesion Analysis towards Melanoma Detection, 3 Edinburgh Dermofit. Modalities: MI—macroscopic images, CI—clinical images, DI—dermoscopic images, AI—anatomical images, Md—metadata. Cutaneous lymphoma; benign dermal tumors, cysts, and sinuses; malignant dermal tumor; benign epidermal tumors, hamartomas, milia, and growth; malignant and premalignant epidermal tumors; genodermatoses and supernumerary growths; inflammatory conditions; benign melanocytic lesion; malignant melanoma. Benign, basal cell carcinoma, and melanoma. Nevus, melanoma, basal cell carcinoma, squamous cell carcinoma, pigmented benign keratoses. Basal cell carcinoma, nevus, melanoma, miscellaneous, seborrheic keratoses. Actinic keratosis, basal cell carcinoma, squamous cell carcinoma, intraepithelial carcinoma, melanoma, melanocytic nevus/mole, seborrheic keratosis, pyogenic granuloma, hemangioma, and dermatofibroma. Seborrheic keratosis, nevi, and melanoma.
Metrics score of test set 1 for prioritization approaches (in %).
| Experiment | Accuracy | Weighted F1 | Macro F1 |
|---|---|---|---|
| Original images | 43.17 | 51.41 | 29.70 |
| Cropped with 30% tolerance |
|
|
|
| Original images + Metadata | 20.55 | 26.25 | 17.76 |
| Cropped with 30% tolerance + Metadata | 27.02 | 35.58 | 20.45 |
Resulting metrics and confusion matrix of test set 1 for baseline priority model (in %).
| Classes | Sensitivity | Precision | F1-Score |
|---|---|---|---|
| HP | 51.22 | 10.94 | 18.03 |
| P | 20.00 | 9.46 | 12.84 |
| N | 51.23 | 85.63 | 64.11 |
Global metrics score of test set 1 for final priority level prediction (in %).
| Experiment | Accuracy | Weighted F1 | Macro F1 |
|---|---|---|---|
| Baseline Priority | 48.02 | 56.06 | 31.66 |
| Naive GT | 82.53 | 81.81 | 56.40 |
| Naive (Max) | 78.41 | 77.83 | 47.07 |
| Naive (Sum) | 80.18 | 78.74 | 47.86 |
| Simple Approach | 77.24 | 76.39 | 43.46 |
| Combined Approach | 48.31 | 56.77 | 35.60 |
Resulting metrics (%) of test set 1 of each final priority class for the different risk prioritization approaches.
| Models | Sensitivity | Precision | F1-Score | ||||||
|---|---|---|---|---|---|---|---|---|---|
| HP | P | N | HP | P | N | HP | P | N | |
| 1 Baseline Priority | 51.22 | 20.00 | 51.23 | 10.94 | 9.46 | 85.63 | 18.03 | 12.84 | 64.11 |
| 2 Naive GT | 53.66 | 15.71 | 92.81 | 70.97 | 18.03 | 89.81 | 61.11 | 16.79 | 91.29 |
| 3 Naive (Max) | 39.02 | 15.71 | 88.95 | 31.37 | 21.57 | 87.56 | 34.78 | 18.18 | 88.25 |
| 4 Naive (Sum) | 34.15 | 14.29 | 91.58 | 41.18 | 20.83 | 87.15 | 37.33 | 16.95 | 89.31 |
| 5 Simple Approach | 34.15 | 8.57 | 88.77 | 31.82 | 11.32 | 86.64 | 32.94 | 9.76 | 87.69 |
| 6 Combined Approach | 34.15 | 42.86 | 50.00 | 21.54 | 10.20 | 88.51 | 26.42 | 16.48 | 63.90 |
Figure 6Confusion matrices of test set 1 for the different risk prioritization approaches tested. (a) Naive GT. (b) Naive (Max). (c) Naive (Sum). (d) Simple Approach. (e) Combined Approach.
Global metrics score of test set 2 for final priority level prediction (in %).
| Experiment | Accuracy | Weighted F1 | Macro F1 |
|---|---|---|---|
| Baseline Priority | 42.71 | 40.96 | 40.51 |
| Naive GT | 56.25 | 51.02 | 51.81 |
| Naive (Sum) | 50.52 | 44.39 | 43.61 |
| Simple Approach | 48.96 | 41.17 | 40.35 |
| Combined Approach | 44.79 | 44.93 | 43.93 |
Resulting metrics (%) of test set 2 of each final priority class for the different approaches.
| Models | Sensitivity | Precision | F1-Score | ||||||
|---|---|---|---|---|---|---|---|---|---|
| HP | P | N | HP | P | N | HP | P | N | |
| 1 Baseline Priority | 51.22 | 20.00 | 58.02 | 36.21 | 35.90 | 49.47 | 42.42 | 25.69 | 53.41 |
| 2 Naive GT | 53.66 | 15.71 | 92.59 | 78.57 | 37.93 | 55.56 | 63.77 | 22.22 | 69.44 |
| 3 Naive (Sum) | 34.15 | 14.29 | 90.12 | 70.00 | 45.45 | 48.67 | 45.90 | 21.74 | 63.20 |
| 4 Simple Approach | 34.15 | 8.57 | 91.36 | 60.87 | 35.29 | 48.68 | 43.75 | 13.79 | 63.52 |
| 5 Combined Approach | 34.15 | 42.86 | 51.85 | 50.00 | 35.29 | 53.16 | 40.58 | 38.71 | 52.50 |
Figure 7Confusion matrices of test set 2 for the different risk prioritization approaches tested. (a) Baseline Priority. (b) Naive GT. (c) Naive (Sum). (d) Simple Approach. (e) Combined Approach.