| Literature DB >> 32751349 |
Shunichi Jinnai1, Naoya Yamazaki1, Yuichiro Hirano2, Yohei Sugawara2, Yuichiro Ohe3, Ryuji Hamamoto4,5.
Abstract
Recent studies have demonstrated the usefulness of convolutional neural networks (CNNs) to classify images of melanoma, with accuracies comparable to those achieved by dermatologists. However, the performance of a CNN trained with only clinical images of a pigmented skin lesion in a clinical image classification task, in competition with dermatologists, has not been reported to date. In this study, we extracted 5846 clinical images of pigmented skin lesions from 3551 patients. Pigmented skin lesions included malignant tumors (malignant melanoma and basal cell carcinoma) and benign tumors (nevus, seborrhoeic keratosis, senile lentigo, and hematoma/hemangioma). We created the test dataset by randomly selecting 666 patients out of them and picking one image per patient, and created the training dataset by giving bounding-box annotations to the rest of the images (4732 images, 2885 patients). Subsequently, we trained a faster, region-based CNN (FRCNN) with the training dataset and checked the performance of the model on the test dataset. In addition, ten board-certified dermatologists (BCDs) and ten dermatologic trainees (TRNs) took the same tests, and we compared their diagnostic accuracy with FRCNN. For six-class classification, the accuracy of FRCNN was 86.2%, and that of the BCDs and TRNs was 79.5% (p = 0.0081) and 75.1% (p < 0.00001), respectively. For two-class classification (benign or malignant), the accuracy, sensitivity, and specificity were 91.5%, 83.3%, and 94.5% by FRCNN; 86.6%, 86.3%, and 86.6% by BCD; and 85.3%, 83.5%, and 85.9% by TRN, respectively. False positive rates and positive predictive values were 5.5% and 84.7% by FRCNN, 13.4% and 70.5% by BCD, and 14.1% and 68.5% by TRN, respectively. We compared the classification performance of FRCNN with 20 dermatologists. As a result, the classification accuracy of FRCNN was better than that of the dermatologists. In the future, we plan to implement this system in society and have it used by the general public, in order to improve the prognosis of skin cancer.Entities:
Keywords: artificial intelligence (AI); deep learning; melanoma; neural network; skin cancer
Mesh:
Year: 2020 PMID: 32751349 PMCID: PMC7465007 DOI: 10.3390/biom10081123
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Figure 1Flow diagram of this study: extracting the pictures of pigment lesions, annotation of lesions in images, deep learning with a convolutional neural network (CNN), and evaluation by the test dataset.
The results of six-class classification of the faster, region-based CNN (FRCNN); board-certified dermatologists (BCDs); and trainees (TRNs). Gray cells indicate correct answers.
|
| ||||||||
|---|---|---|---|---|---|---|---|---|
| Prediction | ||||||||
| True diagnosis | MM | BCC | Nevus | SK | H/H | SL | Total | |
| MM | 327 | 9 | 48 | 21 | 0 | 3 | 408 | |
| BCC | 6 | 108 | 12 | 6 | 0 | 0 | 132 | |
| Nevus | 42 | 6 | 967 | 30 | 3 | 0 | 1048 | |
| SK | 21 | 9 | 36 | 223 | 0 | 0 | 289 | |
| H/H | 3 | 0 | 18 | 0 | 57 | 0 | 78 | |
| SL | 0 | 0 | 0 | 3 | 0 | 42 | 45 | |
| Total | 399 | 132 | 1081 | 283 | 60 | 45 | 2000 | |
|
| ||||||||
|
| ||||||||
| True diagnosis | MM | BCC | Nevus | SK | H/H | SL | Total | |
| MM | 340 | 12 | 22 | 26 | 3 | 5 | 408 | |
| BCC | 10 | 104 | 3 | 14 | 1 | 0 | 132 | |
| Nevus | 131 | 11 | 823 | 68 | 11 | 4 | 1048 | |
| SK | 18 | 24 | 17 | 225 | 0 | 5 | 289 | |
| H/H | 9 | 1 | 6 | 1 | 61 | 0 | 78 | |
| SL | 0 | 1 | 0 | 7 | 0 | 37 | 45 | |
| Total | 508 | 153 | 871 | 341 | 76 | 51 | 2000 | |
|
| ||||||||
|
| ||||||||
| True diagnosis | MM | BCC | Nevus | SK | H/H | SL | Total | |
| MM | 327 | 15 | 42 | 12 | 8 | 4 | 408 | |
| BCC | 22 | 87 | 6 | 12 | 5 | 0 | 132 | |
| Nevus | 136 | 17 | 812 | 57 | 20 | 6 | 1048 | |
| SK | 26 | 17 | 37 | 191 | 1 | 17 | 289 | |
| H/H | 8 | 1 | 16 | 2 | 51 | 0 | 78 | |
| SL | 1 | 0 | 3 | 7 | 0 | 34 | 45 | |
| Total | 520 | 137 | 916 | 281 | 85 | 61 | 2000 | |
MM: malignant melanoma; BCC: basal cell carcinoma; SK: seborrheic keratosis; H/H: hematoma/hemangioma; SL: senile lentigo.
The accuracy of six-class classification for each examinee. The best accuracy for each test (test #1–10) is shown in gray.
| TEST # | FRCNN | BCD | TRN |
|---|---|---|---|
| 1 | 90.00% | 84.00% | 76.50% |
| 2 | 82.50% | 86.00% | 72.00% |
| 3 | 84.50% | 83.50% | 74.50% |
| 4 | 90.00% | 79.00% | 74.50% |
| 5 | 83.00% | 78.00% | 73.00% |
| 6 | 86.50% | 85.50% | 75.00% |
| 7 | 88.00% | 70.50% | 79.00% |
| 8 | 86.50% | 79.50% | 75.00% |
| 9 | 82.50% | 73.50% | 78.00% |
| 10 | 88.50% | 75.50% | 73.50% |
Figure 2The accuracy of six-class classification by FRCNN, BCDs, and TRNs. In six-class classification, the accuracy of the FRCNN surpassed that of BCDs and TRNs.
The results of two-class classification (benign or malignant) of the FRCNN, BCDs, and TRNs. Gray cells indicate correct answers.
|
| ||||
|---|---|---|---|---|
| Prediction | ||||
| malignant | benign | Total | ||
| True diagnosis | malignant | 450 | 90 | 540 |
| benign | 81 | 1379 | 1460 | |
| Total | 531 | 1469 | 2000 | |
|
| ||||
|
| ||||
| malignant | benign | Total | ||
| True diagnosis | malignant | 466 | 74 | 540 |
| benign | 195 | 1265 | 1460 | |
| Total | 661 | 1339 | 2000 | |
|
| ||||
|
| ||||
| malignant | benign | Total | ||
| True diagnosis | malignant | 451 | 89 | 540 |
| benign | 206 | 1254 | 1460 | |
| Total | 657 | 1343 | 2000 | |
Figure 3The accuracy of two-class classification (benign or malignant) by FRCNN, BCDs, and TRNs. The accuracy of the FRCNN surpassed that of the BCDs and TRNs.
The accuracy of two-class classification for each examinee. The best accuracy for each test (test #1–10) is shown in gray. The accuracy of the BCDs was the best in test #2. In test #6, the BCDs and FRCNN achieved the same accuracy.
| TEST # | FRCNN | BCD | TRN |
|---|---|---|---|
| 1 | 93.50% | 89.50% | 85.00% |
| 2 | 88.50% | 92.00% | 86.00% |
| 3 | 91.00% | 89.00% | 85.00% |
| 4 | 93.50% | 87.00% | 80.50% |
| 5 | 89.50% | 84.50% | 85.50% |
| 6 | 91.50% | 91.50% | 85.50% |
| 7 | 92.50% | 83.50% | 89.00% |
| 8 | 92.00% | 86.50% | 86.50% |
| 9 | 89.50% | 81.50% | 86.00% |
| 10 | 93.00% | 80.50% | 83.50% |
Summary of classification accuracy, sensitivity, specificity, false negative rates, false positive rates, and positive predictive values by the FRCNN, BCDs, and TRNs.
| FRCNN | BCDs | TRNs | |
|---|---|---|---|
| Accuracy (six classes) | 86.2 | 79.5 | 75.1 |
| Accuracy (two classes) | 91.5 | 86.6 | 85.3 |
| Sensitivity | 83.3 | 86.3 | 83.5 |
| Specificity | 94.5 | 86.6 | 85.9 |
| False negative | 16.7 | 13.7 | 16.5 |
| False positive | 5.5 | 13.4 | 14.1 |
| Positive predictive value | 84.7 | 70.5 | 68.5 |