| Literature DB >> 34653265 |
M J Schaap1, N J Cardozo1,2, A Patel2, E M G J de Jong1, B van Ginneken2, M M B Seyger1.
Abstract
BACKGROUND: The Psoriasis Area and Severity Index (PASI) score is commonly used in clinical practice and research to monitor disease severity and determine treatment efficacy. Automating the PASI score with deep learning algorithms, like Convolutional Neural Networks (CNNs), could enable objective and efficient PASI scoring.Entities:
Mesh:
Year: 2021 PMID: 34653265 PMCID: PMC9298301 DOI: 10.1111/jdv.17711
Source DB: PubMed Journal: J Eur Acad Dermatol Venereol ISSN: 0926-9959 Impact factor: 9.228
Figure 1Concatenated images used as input for the Convolutional Neural Networks. The dorsal and ventral views of the trunk were concatenated side by side to represent the trunk region (a). The dorsal and ventral views of the right and left arm were concatenated in a 2 x 2 grid for the arms region (b). The dorsal, ventral and side views of the legs were concatenated in a 2 x 2 grid to represent the legs region (c). The images shown in this figure were pre‐processed by extracting the skin and cropping the images of the arms. The patient gave written informed consent for publication of the images.
Characteristics of included imaging series (N = 655)
| Imaging series ( | |
|---|---|
| Age, years, mean (±SD) | 13.0 (±4.5) |
| Gender, | 333 (50.8%) |
| Psoriasis severity, median (range) | |
| PASI | 6.7 (0.0–42.4) |
| BSA | 8.0 (0.0–76.0) |
| PGA | 3.0 (0.0–5.0) |
| Fitzpatrick skin type, | |
| I | 65 (9.9%) |
| II | 330 (50.4%) |
| III | 158 (24.1%) |
| IV | 86 (13.1%) |
| V | 5 (0.8%) |
| VI | 11 (1.7%) |
The 655 imaging series were yielded in 326 individual patients, who could have multiple imaging series over the years. The characteristics at the time of each imaging series are presented.
SD, Standard deviation.
Psoriasis Area and Severity Index (PASI; range 0–72).
Affected Body Surface Area (BSA; range 0–100%).
§Physician Global Assessment (PGA; range 0–5).
Performance of imaged‐based PASI scoring by the CNN and physicians
| CNN | Physician image‐based scoring ( | |||||
|---|---|---|---|---|---|---|
| Agreement with real‐life scores | Agreement with real‐life scores | Inter‐rater agreement | ||||
| Accuracy | MAE | ICC | Mean Accuracy | Mean ICC | ICC (95% CI) | |
|
| ||||||
| Erythema | 0.660 | 0.367 | 0.616 (0.485–0.721) | 0.436 | 0.558 | 0.793 (0.739–0.842) |
| Desquamation | 0.633 | 0.376 | 0.580 (0.441–0.692) | 0.533 | 0.589 | 0.753 (0.679–0.815) |
| Induration | 0.743 | 0.266 | 0.580 (0.442–0.692) | 0.612 | 0.573 | 0.769 (0.708–0.824) |
| Area | 0.734 | 0.275 | 0.793 (0.712–0.854) | 0.634 | 0.694 | 0.706 (0.601–0.788) |
|
| ||||||
| Erythema | 0.603 | 0.408 | 0.614 (0.486–0.717) |
|
|
|
| Desquamation | 0.612 | 0.393 | 0.568 (0.431–0.680) |
|
|
|
| Induration | 0.681 | 0.318 | 0.655 (0.537–0.747) |
|
|
|
| Area | 0.707 | 0.293 | 0.799 (0.722–0.856) |
|
|
|
|
| ||||||
| Erythema | 0.667 | 0.352 | 0.711 (0.599–0.795) |
|
|
|
| Desquamation | 0.618 | 0.401 | 0.590 (0.447–0.703) |
|
|
|
| Induration | 0.676 | 0.323 | 0.618 (0.482–0.725) |
|
|
|
| Area | 0.794 | 0.205 | 0.832 (0.761–0.883) |
|
|
|
Performance of the Coral Convolutional Neural Network (CNNCORAL architecture) and PASI‐trained physicians (N = 5) on image‐based PASI scoring. The performance of the CNNCORAL is expressed as the accuracy, the mean absolute error (MAE) and the intraclass correlation coefficient (ICC) with respect to real‐life scores. The CNNCORAL was separately trained for each task (erythema, desquamation, induration and area scoring) in each anatomical region. Therefore, the performance is shown on the test set images of each subscore in each anatomical region. The test sets consisted of concatenated imaging series of 109 trunk, 116 arm and 102 leg regions. Performance of image‐based scoring by the PASI trained physicians was only assessed for the trunk region test set images. Agreement between image‐based scoring by physicians and real‐life scoring by the treating physician was calculated using the mean accuracy and mean ICC. Inter‐rater agreement of the five physicians on image‐based scoring is expressed by the intraclass correlation coefficient (ICC) and corresponding 95% confidence intervals (CI).
The accuracy (0–1) measures the absolute agreement between the severity subscores assigned by the CNN and the ground truth (real‐life PASI scoring). A score of 1 means that all scores were correctly classified.
The mean absolute error reflects the absolute difference between the severity subscores assigned by the CNN and the ground truth averaged over the entire set. A score of 0 indicates perfect agreement.
The ICC measures the inter‐rater agreement between the severity subscores assigned by the CNN or PASI‐trained physicians and the real‐life scores. A score of 0 indicates no agreement and a score of 1 indicates full agreement.
Figure 2Confusion matrices of CNNCORAL for (a) erythema, (b) desquamation, (c) induration and (d) area scoring of the trunk region. The confusion matrices for each subscore show the frequencies and percentage of correct classifications and misclassifications for each class in the test data set of the trunk region. The presented numbers are low, since the test data set is a 20% subset of the total database and included 109 concatenated imaging series of the trunk region. Most misclassifications are shown to be classified in the adjacent classes of the ground truth (real‐life scores).
Figure 3Comparison of agreement with real‐life scores between image‐based scoring of the trunk region by the CNNCORAL and physicians. The performance of the CNNCORAL and physicians on image‐based scoring is depicted for each subscore (erythema, desquamation, induration, area) of the trunk region. The test set of the trunk region included 109 concatenated imaging series for each subscore. The performance was evaluated using the intraclass correlation coefficient (ICC) with 95% confidence intervals (CI), reflecting the agreement with real‐life scores. The agreement was defined as < 0.50, poor agreement; 0.51–0.75, moderate; 0.75–0.90, good; 0.91–1, excellent. *Overall agreement of image‐based scoring by physicians and real‐life scores shown by the mean ICC of the five physicians. **ICC (95% CI) of the CNNCORAL and real‐life scores.