| Literature DB >> 35082315 |
Murtaza Ashraf1, Willmer Rafell Quiñones Robles1, Mujin Kim1, Young Sin Ko2, Mun Yong Yi3.
Abstract
This paper proposes a deep learning-based patch label denoising method (LossDiff) for improving the classification of whole-slide images of cancer using a convolutional neural network (CNN). Automated whole-slide image classification is often challenging, requiring a large amount of labeled data. Pathologists annotate the region of interest by marking malignant areas, which pose a high risk of introducing patch-based label noise by involving benign regions that are typically small in size within the malignant annotations, resulting in low classification accuracy with many Type-II errors. To overcome this critical problem, this paper presents a simple yet effective method for noisy patch classification. The proposed method, validated using stomach cancer images, provides a significant improvement compared to other existing methods in patch-based cancer classification, with accuracies of 98.81%, 97.30% and 89.47% for binary, ternary, and quaternary classes, respectively. Moreover, we conduct several experiments at different noise levels using a publicly available dataset to further demonstrate the robustness of the proposed method. Given the high cost of producing explicit annotations for whole-slide images and the unavoidable error-prone nature of the human annotation of medical images, the proposed method has practical implications for whole-slide image annotation and automated cancer diagnosis.Entities:
Year: 2022 PMID: 35082315 PMCID: PMC8791954 DOI: 10.1038/s41598-022-05001-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The portion of the tissue circled in blue is a dysplasia annotation by a professional pathologist. The red zoomed-in regions are abnormal (true positive) regions within the annotation and the green zoomed-in regions are normal, benign (false positive) regions within the annotation.
Figure 2Example of hematoxylin and eosin-stained raw (left) and annotated (right) whole-slide images.
Data acquisition details.
| Parameter | Details |
|---|---|
| Thickness of section | 3–4 µm |
| Staining method | Hematoxylin and eosin |
| WSI scanner model | Panoramic Flash 250 III |
| Sensor resolution | 200× |
| Number of pathologists for annotation | 2 |
Figure 3Four types of pathologic classes in whole-slide images of the stomach: red (1st row), navy blue (2nd row), yellow (3rd row) and green (4th row) annotated patches represent malignant, dysplasia, uncategorized, and benign classes, respectively.
Information about the number of stomach whole-slide images (WSIs) for each data split.
| Classes | |||||
|---|---|---|---|---|---|
| Malignant | Dysplasia | Uncategorized | Benign | ||
| Pilot WSIs | 24 | 30 | 10 | 35 | |
| Baseline WSIs | Training | 174 | 220 | 75 | 254 |
| Validation | 22 | 27 | 10 | 32 | |
| Testing | 22 | 27 | 10 | 32 | |
Information about the number of patches for each data split based on stomach whole-slide images.
| Classes | |||||
|---|---|---|---|---|---|
| Malignant | Dysplasia | Uncategorized | Benign | ||
| Pilot patches | 2172 | 2435 | 423 | 4890 | |
| Baseline patches | Training | 26,855 | 21,881 | 8376 | 49,564 |
| Validation | 2563 | 2324 | 1006 | 6476 | |
| Testing | 3078 | 2772 | 247 | 4588 | |
Figure 4Two types of pathological findings for lymph node sections: red (1st row) and green (2nd row) annotated patches denote malignant and benign classes, respectively.
Information about the number of patches in each data split for PatchCamelyon.
| Classes | |||
|---|---|---|---|
| Malignant | Benign | ||
| Patches | Training | 131,072 | 131,072 |
| Validation | 16,369 | 16,399 | |
| Testing | 16,377 | 16,391 | |
Figure 5Training and validation loss of models with (a) noisy data and (b) cleaned data; Train training; Val validation.
Figure 6The second and third phases of the proposed LossDiff method.
Preliminary study for selecting the final architecture (the accuracy of each architecture is reported as a percentage).
| Architecture | Method | |
|---|---|---|
| Fine-tuning pretrained model | Training from scratch | |
| AlexNet | 69.21 | 66.07 |
| Inception | 72.45 | 71.86 |
| VGG | 72.13 | 71.32 |
| ResNet | 73.09 | 70.09 |
| DenseNet | 73.38 | 70.78 |
DenseNet-201 architecture details for the experiments.
| Layers | Output Size | DenseNet-201 |
|---|---|---|
| Convolution | 112 | 7 |
| Pooling | 56 | 3 |
Dense block (1) | 56 | |
Transition layer (1) | 56 | 1 |
| 28 | 2 | |
Dense block (2) | 28 | |
Transition layer (2) | 28 | 1 |
| 14 | 2 | |
Dense block (3) | 14 | |
Transition layer (3) | 14 | 1 |
| 7 | 2 | |
Dense block (4) | 7 | |
Classification layer Final layer | 1 | 7 |
| Softmax (2 | 3 | 4) |
In the final layer, 2 refers to the malignant and benign classes; 3 refers to the malignant, dysplasia, and benign classes; and 4 refers to the malignant, dysplasia, uncategorized, and benign classes.
Accuracy comparison between the baseline and LossDiff results for ternary and quaternary classes.
| Classes | Method | Malignant and benign (binary) | Malignant, dysplasia, and benign (ternary) | Malignant, dysplasia, uncategorized, and benign (quaternary) | |||
|---|---|---|---|---|---|---|---|
| Accuracy | Baseline ( | 94.73% | 91.63% | 73.38% | |||
| LossDiff | |||||||
| Samples discarded by | 6837 | 9387 | 10,100 | ||||
Significant values are in bold.
Figure 7Confusion matrix for ternary classes in the first row (a,b) and quaternary classes in the second row (c,d).
Figure 8ROC analysis. The first row (a,b) shows model performance for ternary classes, and the second row (c,d) shows the model performance for quaternary classes.
Figure 9Difference in the area under the ROC curve for the baseline () and cleaned data ().
Figure 10Feature space visualization for DenseNet-201 features using t-SNE dimensionality reduction based on baseline and cleaned data. The red, blue, yellow, and green colors denote the, malignant, dysplastic, uncategorized and benign classes, respectively.
Accuracy comparisons for different noise levels between the baseline method (with label noise) and LossDiff (without label noise) for sample discarding and label flipping approaches.
| Measure | Configuration | Percentage of noise | |||
|---|---|---|---|---|---|
| 10 | 20 | 30 | 40 | ||
| Accuracy | Baseline | 84.23 | 83.31 | 78.45 | 69.33 |
| 85.59 | |||||
| 84.09 | 81.31 | 77.13 | |||
| Number of samples affected | Samples discarded by | 26,178 | |||
| Samples flipped by | 53,779 | 85,253 | 113,565 | ||
Significant values are in bold.
Figure 11Accuracy comparison between baseline and LossDiff at different levels of noise.
Accuracy comparisons between extant label noising methods and LossDiff using the same set of stomach image data.
| Classes | Malignant and benign (binary) | Malignant, dysplasia, and benign (multiple ternary) | Malignant, dysplasia, uncategorized, and benign (multiple quaternary) |
|---|---|---|---|
| Mixup[ | 98.61 | 91.23 | 76.16 |
| Co-teaching[ | 93.72 | 88.30 | 71.25 |
| Deep abstaining classifier[ | 98.59 | 95.14 | 77.18 |
| Symmetric cross-entropy loss[ | 95.74 | 91.90 | 72.57 |
| Confidence learning[ | 93.51 | 89.87 | 70.70 |
| LossDiff |
Significant values are in bold.
Figure 12Training time comparison for different noise reduction methods.
Figure 13The final output of the CNN trained based on the cleaned malignant, dysplasia, uncategorized, and benign data and the corresponding heatmaps of abnormal regions.
Methods and results for computer-aided analyses of whole-slide stomach images.
| Study | Objective | Feature selection | Technique |
|---|---|---|---|
| Sharma et al.[ | Leukocytes, epithelial nuclei, fibrocytes/border cells, other nuclei classification | Handcrafted | AdaBoost classification |
| Sharma et al.[ | Feature extraction and Nontumor, Her2/neu + tumor, Her2/neu-tumor classification | Handcrafted | Relational graphs |
| Sharma et al.[ | HER2 + tumor, HER2 − tumor, and Nontumor classification | Automated | CNN |
| Qu et al.[ | Epithelium, stroma, and tissue background classification | Automated | CNN fine tuning |
| Li et al.[ | Malignant and benign classification | Automated | CNN |
| Kim et al.[ | Malignant region, tubular adenoma (TA), and benign classification | Automated | CNN and random forest classifier |
| Wang et al.[ | Malignant, dysplasia, and benign classification | Automated | Multi-instance learning using a CNN |
| Song et al.[ | Malignant and benign classification | Automated | DeepLab v3 segmentation for slide-level classification |