| Literature DB >> 35990535 |
Alfonso Medela1, Taig Mac Carthy2, S Andy Aguilar Robles1, Carlos M Chiesa-Estomba3,4, Ramon Grimalt5.
Abstract
Atopic dermatitis (AD) is a chronic, itchy skin condition that affects 15-20% of children but may occur at any age. It is estimated that 16.5 million US adults (7.3%) have AD that initially began at age >2 years, with nearly 40% affected by moderate or severe disease. Therefore, a quantitative measurement that tracks the evolution of AD severity could be extremely useful in assessing patient evolution and therapeutic efficacy. Currently, SCOring Atopic Dermatitis (SCORAD) is the most frequently used measurement tool in clinical practice. However, SCORAD has the following disadvantages: (i) time consuming-calculating SCORAD usually takes about 7-10 minutes per patient, which poses a heavy burden on dermatologists and (ii) inconsistency-owing to the complexity of SCORAD calculation, even well-trained dermatologists could give different scores for the same case. In this study, we introduce the Automatic SCORAD, an automatic version of the SCORAD that deploys state-of-the-art convolutional neural networks that measure AD severity by analyzing skin lesion images. Overall, we have shown that Automatic SCORAD may prove to be a rapid and objective alternative method for the automatic assessment of AD, achieving results comparable with those of human expert assessment while reducing interobserver variability.Entities:
Keywords: AD, atopic dermatitis; AI, artificial intelligence; ASCORAD, Automatic SCOring Atopic Dermatitis; AUC, area under the curve; CADx, Computer-Aided Diagnosis; FAR, full agreement rate; IoU, intersection over union; PAR, partial agreement rate; RMAE, relative mean absolute error; RSD, relative SD; SCORAD, SCOring Atopic Dermatitis
Year: 2022 PMID: 35990535 PMCID: PMC9382656 DOI: 10.1016/j.xjidi.2022.100107
Source DB: PubMed Journal: JID Innov ISSN: 2667-0267
Annotator’s Performance in Lesion Surface Segmentation
| Datasets | ACC | AUC | IoU | F1 | RSD | Cohen’s Kappa |
|---|---|---|---|---|---|---|
| Legit.Health-AD | 86.9 | 0.91 | 0.91 | 0.88 | 8.6 | 0.78 |
| Legit.Health-AD-Test | 81.0 | 0.91 | 0.86 | 0.91 | 9.1 | 0.79 |
| Legit.Health-AD-FPK-IVI | 91.3 | 0.91 | 0.80 | 0.86 | 9.0 | 0.80 |
Abbreviations: ACC, accuracy; AUC, area under the curve; IoU, intersection over union; RSD, relative SD.
These results provide the background for comparing with the results of Legit.Health-SCORADNet.
F1 denotes F1 score.
Annotator’s Performance in Legit.Health-AD Visual Sign Severity Assessment
| Visual Signs | RSD | RMAE (Mean) | RMAE (Median) | FAR | PAR1 | PAR2 | Cohen’s Kappa |
|---|---|---|---|---|---|---|---|
| Erythema | 11.5 | 10.7 | 8.3 | 33.1 | 92.0 | 94.5 | 0.34 |
| Edema | 16.2 | 14.7 | 11.9 | 21.3 | 74.1 | 84.9 | 0.15 |
| Oozing | 20.0 | 18.2 | 14.8 | 18.0 | 59.6 | 79.3 | 0.19 |
| Excoriations | 17.4 | 15.9 | 12.9 | 22.6 | 66.5 | 81.2 | 0.17 |
| Lichenification | 20.3 | 18.3 | 15.1 | 10.7 | 59.1 | 74.6 | 0.06 |
| Dryness | 18.7 | 16.9 | 12.8 | 20.0 | 69.3 | 82.3 | 0.14 |
| Average | 17.4 | 15.8 | 13.8 | 14.4 | 64.7 | 79.3 | 0.17 |
Abbreviations: FAR, full agreement rate; PAR, partial agreement rate; RMAE, relative mean absolute error; RSD, relative SD.
These results provide the baseline to appraise the results of Legit.Health-SCORADNet.
Annotator’s Performance in Legit.Health-AD-Test Visual Sign Severity Assessment
| Visual Signs | RSD | RMAE (Mean) | RMAE (Median) | FAR | PAR1 | PAR2 | Cohen’s Kappa |
|---|---|---|---|---|---|---|---|
| Erythema | 12.1 | 11.2 | 8.8 | 34.0 | 88.0 | 91.5 | 0.35 |
| Edema | 7.9 | 7.3 | 5.6 | 55.8 | 93.1 | 96.7 | 0.22 |
| Oozing | 10.3 | 9.5 | 7.5 | 44.4 | 89.9 | 93.1 | 0.39 |
| Excoriations | 12.7 | 11.6 | 9.4 | 39.7 | 79.0 | 87.1 | 0.20 |
| Lichenification | 10.1 | 9.3 | 7.4 | 46.8 | 88.0 | 92.9 | 0.21 |
| Dryness | 16.5 | 14.9 | 12.2 | 20.4 | 72.4 | 80.3 | 0.19 |
| Average | 11.6 | 10.6 | 8.5 | 40.2 | 85.0 | 90.3 | 0.26 |
Abbreviations: FAR, full agreement rate; PAR, partial agreement rate; RMAE, relative mean absolute error; RSD, relative SD.
These results provide the baseline to appraise the results of Legit.Health-SCORADNet.
Annotator’s Performance in Legit.Health-AD-FPK-IVI Visual Sign Severity Assessment
| Visual Signs | RSD | RMAE (Mean) | RMAE (Median) | FAR | PAR1 | PAR2 | Cohen’s Kappa |
|---|---|---|---|---|---|---|---|
| Erythema | 11.9 | 10.8 | 8.8 | 42.3 | 80.1 | 88.2 | 0.23 |
| Edema | 8.6 | 8.0 | 6.3 | 54.0 | 90.9 | 94.5 | 0.13 |
| Oozing | 12.7 | 11.6 | 9.4 | 35.1 | 81.9 | 87.3 | 0.27 |
| Excoriations | 9.7 | 9.0 | 7.0 | 45.0 | 92.7 | 95.5 | 0.08 |
| Lichenification | 13.3 | 12.2 | 9.7 | 27.9 | 85.5 | 90.9 | 0.27 |
| Dryness | 18.2 | 16.4 | 13.4 | 10.8 | 70.2 | 81.0 | 0.09 |
| Average | 12.4 | 11.3 | 9.1 | 35.9 | 86.6 | 89.6 | 0.18 |
Abbreviations: FAR, full agreement rate; PAR, partial agreement rate; RMAE, relative mean absolute error; RSD, relative SD.
These results provide the baseline to appraise the results of Legit.Health-SCORADNet.
Legit.Health-SCORADNet’s Results in Light Skin Lesion Surface Segmentation
| Clinical Sign | ACC, % (95% CI) | AUC (95% CI) | IoU (95% CI) | F1 |
|---|---|---|---|---|
| Lesion surface | 84.6 (80.9‒88.3) | 0.93 (0.90‒0.96) | 0.64 (0.59‒0.69) | 0.75 (0.71‒0.79) |
Abbreviations: ACC, accuracy; AUC, area under the curve; CI, confidence interval; IoU, intersection over union.
F1 denotes F1 score.
Legit.Health-SCORADNet’s Results in Dark Skin Lesion Surface Segmentation
| XXX | Experiment 1 | Experiment 2 | ||||||
|---|---|---|---|---|---|---|---|---|
| ACC, % (95% CI) | AUC (95% CI) | IoU (95% CI) | F1 | ACC, % (95% CI) | AUC (95% CI) | IoU (95% CI) | F1 | |
| Lesion surface | 74.0 | 0.83 | 0.32 | 0.42 | 79.2 | 0.87 | 0.45 | 0.55 |
| (65.9‒82.1) | (0.76‒0.90) | (0.23‒0.41) | (0.33‒0.51) | (66.3‒92.1) | (0.76‒0.98) | (0.29‒0.61) | (0.39‒0.71) | |
Abbreviations: ACC, accuracy; AUC, area under the curve; CI, confidence interval; IoU, intersection over union.
Results are divided by experiment. The algorithm in experiment 1 was trained solely on light-skinned patient images, and the algorithm in experiment 2 was trained on mixed data containing 8% of dark-skinned patient images.
F1 denotes F1 score.
Legit.Health-SCORADNet’s Results in Visual Sign Severity Assessment
| Range | Training GT | Legit.Health-AD-Test | Legit.Health-AD-FPK-IVI | ||
|---|---|---|---|---|---|
| RMAE 1 | RMAE 2 | RMAE 1 (95% CI) | RMAE 2 (95% CI) | ||
| 0‒3 | Median | 13.6 (9.7‒17.5) | 14.3 (10.4‒18.2) | 21.2 (17.3‒25.0) | 20.8 (16.9‒24.7) |
| 0‒10 | Median | 14.3 (10.4‒18.2) | 13.2 (9.3‒17.0) | 22.8 (18.9‒26.7) | 20.0 (16.0‒23.9) |
| 0‒100 | Median | 14.4 (10.5‒18.3) | 13.0 (9.1‒16.9) | 22.6 (18.7‒26.5) | 19.8 (15.9‒23.7) |
| 0‒100 | Mean | 13.5 (9.6‒17.4) | 13.4 (9.5‒17.3) | 21.1 (17.2‒25.0) | 19.9 (16.0‒23.8) |
Abbreviations: CI, confidence interval; DEX, Deep EXpectation; RMAE, relative mean absolute error.
The models were trained on Legit.Health-AD using a different range and ground truth method and tested on Legit.Health-AD-Test and Legit.Health-AD-FPK-IVI.
RMAE 1 is obtained by applying the argmax function to the prediction.
RMAE 2 is obtained by applying the DEX method to the prediction.
Legit.Health-SCORADNet’s Results in Light-Skin Visual Sign Severity Assessment
| Visual Signs | RMAE 1 | RMAE 2 |
|---|---|---|
| Erythema | 14.1 (10.2‒18.0) | 13.3 (9.4‒17.2) |
| Edema | 16.1 (12.2‒20.0) | 16.0 (12.1‒19.9) |
| Oozing | 22.3 (18.4‒26.2) | 19.4 (15.5‒23.3) |
| Excoriations | 11.5 (7.6‒15.4) | 9.6 (5.7‒15.4) |
| Lichenification | 10.3 (6.4‒14.2) | 8.7 (4.8‒12.6) |
| Dryness | 12.4 (8.5‒16.3) | 11.3 (7.4‒15.2) |
| Average | 14.4 (10.5‒18.3) | 13.0 (9.1‒16.9) |
Abbreviations: CI, confidence interval; DEX, Deep EXpectation; RMAE, relative mean absolute error.
RMAE 1 is obtained by applying the argmax function to the prediction.
RMAE 2 is obtained by applying the DEX method to the prediction.
Legit.Health-SCORADNet’s Results in Dark Skin Visual Sign Severity Assessment
| Visual Signs | Experiment 1 | Experiment 2 | ||
|---|---|---|---|---|
| RMAE 1 | RMAE 2 | RMAE 1 (95% CI) | RMAE 2 (95% CI) | |
| Erythema | 17.8 (13.9‒21.7) | 15.7 (11.8‒19.6) | 16.2 (12.2‒20.2) | 14.3 (10.3‒18.3) |
| Edema | 16.8 (12.9‒20.7) | 18.6 (14.7‒22.5) | 18.1 (14.1‒22.0) | 15.4 (11.4‒19.4) |
| Oozing | 24.9 (21.0‒28.8) | 22.7 (18.8‒26.6) | 9.3 (5.3‒13.3) | 9.0 (5.0‒13.0) |
| Excoriations | 10.1 (6.2‒14.0) | 9.6 (5.7‒13.5) | 10.2 (6.2‒14.2) | 8.0 (4.0‒12.0) |
| Lichenification | 25.9 (22.0‒29.8) | 20.6 (16.7‒24.5) | 24.0 (20.0‒28.0) | 19.8 (15.8‒23.8) |
| Dryness | 39.9 (36.0‒43.8) | 31.7 (27.8‒35.6) | 26.0 (22.0‒30.0) | 19.3 (15.3‒23.3) |
| Average | 22.6 (18.7‒26.5) | 19.8 (15.9‒23.7) | 17.3 (13.3‒21.3) | 14.3 (10.3‒18.3) |
Abbreviations: CI, confidence interval; DEX, Deep EXpectation; RMAE, relative mean absolute error.
Results are divided by experiment. The algorithm in experiment 1 was trained solely on light-skinned patient images, and the algorithm in experiment 2 was trained on mixed data containing 8% of dark-skinned patient images.
RMAE 1 is obtained by applying the argmax function to the prediction.
RMAE 2 is obtained by applying the DEX method to the prediction.
Figure 1Lesion surface segmentation masks. (a) Original image. (b) Legit.Health-SCORADNet’s prediction. (c) Ground truth. (d) Mask drawn by the first specialist. (e) Mask drawn by the second specialist. (f) Mask drawn by the third specialist. Legit.Health-AD-Test sample image gathered from Danderm Dermatology Atlas with the owner's permission.
Figure 2Results of experiments 1 and 2 models on a dark skin image. (a) The predicted surface mask of the model trained on light skin. (b) The predicted surface mask of the model trained on both light and dark skin. (c) The ground truth mask. Legit.Health-AD-FPK-IVI sample image gathered from Danderm Dermatology Atlas with the owner's permission.
Figure 3Legit.Health-AD-Test visual sign intensity distribution of ground truth labels and predictions. The horizontal axis is in the range 0‒100 because the results are given using the best performing model, which was trained with ground truth labels in that range.
Demographic Characteristics
| Datasets | Age Groups (%) | Sex (%) | Skin Type (%) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| <18 | 18‒29 | 30‒39 | 40‒49 | 50‒64 | >65 | Male | Female | Light | Dark | |
| Legit.Health-AD | 31 | 23 | 26 | 14 | 4 | 2 | 39 | 61 | 100 | 0 |
| Legit.Health-AD-Test | — | — | — | — | — | — | — | — | 100 | 0 |
| Legit.Health-AD-FPK-IVI | — | — | — | — | — | — | — | — | 0 | 100 |
Figure 4Comparison of the intensity level distribution by a visual sign of the datasets used in the study.
Figure 5The visual signs that compose the SCORAD. Each visual sign can be classified into four intensity levels: none (0), mild (1), moderate (2), and severe (3). The multioutput EfficientNet-B0 network trained for visual sign intensity estimation has one head for each visual sign. Lichenf., lichenification; SCORAD, SCOring Atopic Dermatitis.
Figure 6CADx system. (a) Illustration of the questionnaire. (b) Illustration of the report generated by the CADx system. The report contains the evolution across the time of the ASCORAD, the last reported ASCORAD item by item, a picture of the lesion surface predicted by the algorithm, the final ASCORAD score with its translation to a category, and some additional information such as image quality. The example record shown is fictional. ASCORAD, Automatic SCOring Atopic Dermatitis; CADx, computer-aided diagnosis; CET, Central European Time; DIQA, Dermatology Image Quality Assessment; DLQI, Dermatology Life Quality Index; Jul, July.