| Literature DB >> 33964886 |
Sebastian Otálora1,2, Niccolò Marini3,4, Henning Müller3,5, Manfredo Atzori3,6.
Abstract
BACKGROUND: One challenge to train deep convolutional neural network (CNNs) models with whole slide images (WSIs) is providing the required large number of costly, manually annotated image regions. Strategies to alleviate the scarcity of annotated data include: using transfer learning, data augmentation and training the models with less expensive image-level annotations (weakly-supervised learning). However, it is not clear how to combine the use of transfer learning in a CNN model when different data sources are available for training or how to leverage from the combination of large amounts of weakly annotated images with a set of local region annotations. This paper aims to evaluate CNN training strategies based on transfer learning to leverage the combination of weak and strong annotations in heterogeneous data sources. The trade-off between classification performance and annotation effort is explored by evaluating a CNN that learns from strong labels (region annotations) and is later fine-tuned on a dataset with less expensive weak (image-level) labels.Entities:
Keywords: Computational pathology; Deep learning; Prostate cancer; Transfer learning; Weak supervision
Mesh:
Year: 2021 PMID: 33964886 PMCID: PMC8105943 DOI: 10.1186/s12880-021-00609-0
Source DB: PubMed Journal: BMC Med Imaging ISSN: 1471-2342 Impact factor: 1.930
Number of TMA cores in the TMAZ dataset
| Class/set | Train | Val | Test |
|---|---|---|---|
| Benign | 61 | 42 | 12 |
| GS6 | 165 | 35 | 88 |
| GS7 | 58 | 25 | 38 |
| GS8 | 120 | 15 | 91 |
| GS9 | 26 | 2 | 3 |
| GS10 | 78 | 14 | 13 |
| Total | 508 | 133 | 245 |
Reported performance for prostate cancer grading and scoring using deep learning models
| Reference | Classes | Results | #Patients | Annotations | Multicenter |
|---|---|---|---|---|---|
| Arvaniti [ | GS6,GS7,GS8,GS9,GS10 | 641 | Strong | No | |
| Nagpal [ | GS6,GS7,GS8,GS9,GS10 | ACC | 342 | Strong | Yes |
| Burlutskiy [ | With/out basal cells | 229 | Strong | No | |
| Ström [ | ISUP: 1,2,3,4,5 | 976 | Strong | Yes | |
| Otálora [ | GS6, GS7, GS8, GS9, GS10 | 341 | Weak | Yes | |
| This work | ISUP: 1,2,3,4,5 | 341 WSI + 641 TMA | Weak and strong | Yes | |
| Arvaniti [ | ISUP: 1,2,3,4,5 | 447 WSI + 641 TMA | Weak and strong | Yes | |
| Bulten [ | ISUP: 1,2,3,4,5 | 1243 | Weak and strong | Yes | |
| Campanella [ | Benign versus cancer | AUCs of 0.98 | 7159 | Weak | Yes |
The first four rows correspond to strongly supervised methods using pixel-level annotations. The last four rows are weakly supervised methods that use global labels. Multi-Center studies involve training with images from multiple institutions, which increases complexity and requires good generalization performance
Fig. 1The main components of our approach: Datasets for PCa grading with strong and weak labels ("Datasets" section, CNN model training with three strategies "Datasets" section and third, the tests performed in two scenarios of PCa grading: tissue microarrays of prostate tissue and prostactectomy WSIs, "Results" section. The patches from the strongly labeled TMAs are used to train CNN models with an increasing number of annotations, evaluating the performance depending on the number of strong labels used for training. The ImageNet pre-trained models are either trained using only the weak WSI-level label or fine-tuned with WSI patches and weak labels, combining different sources of supervision. The models are tested in the two scenarios of PCa grading: tissue-microarrays, and prostactectomies. Arrows of the same color indicate the data or model input from the previous step
Number of patches for each Gleason pattern in the TMAZ dataset
| Class/Set | Train | Val | Test |
|---|---|---|---|
| Benign | 1831 | 1260 | 127 |
| GP3 | 5992 | 1352 | 1602 |
| GP4 | 4472 | 831 | 2121 |
| GP5 | 2766 | 457 | 387 |
| Total | 15,061 | 3901 | 4237 |
Due to high class-imbalance, particularly for the high GP 5 and the benign tissue, class-wise data augmentation was applied
Number of WSIs from the TCGA-PRAD dataset that were used
| Class/set | Training | Validation | Test |
|---|---|---|---|
| GS6 | 13 | 20 | 5 |
| GS7 (3+4) | 42 | 10 | 6 |
| GS7 (4+3) | 30 | 14 | 11 |
| GS8 | 37 | 12 | 13 |
| GS9-10 | 49 | 28 | 11 |
| Total | 171 | 84 | 46 |
Fig. 2Results for the average performance of the trained models as measured by -score as function of the strong annotation percentage used for training
Fig. 3Results for the average performance of the trained models as measured by the -score as function of the strong annotation percentage used for training
Fig. 4Results for the average performance of the trained models using the TCGA-PRAD test dataset. The performance is measured by the -score as function of the strong annotation percentage
Performance for the evaluated methods on the test data of the TCGA-PRAD dataset
| Model | Avg. acc. | Error rate | Micro-precision | |||
|---|---|---|---|---|---|---|
| Supervised (100%) | 0.30 ± 0.13 | 0.23 ± 0.16 | 0.10 ± 0.1 | 0.51 ± 0.06 | 2.45 ± 0.33 | 0.27 ± 0.05 |
| Weakly supervised | 0.49 ± 0.08 | 0.36 ± 0.11 | 0.30 ±0.09 | 0.67 ± 0.03 | 1.65 ± 0.18 | 0.43 ± 0.04 |
| Fine-tuning (100%) | 0.52 ± 0.05 | 0.34 ±0.10 | 0.40 ± 0.10 | 0.69 ±0.02 | 1.51 ± 0.14 | 0.46 ± 0.03 |
Fig. 5Confusion matrices displaying the correct cases classified for each of the five ISUP grades (diagonals of the matrices) in the TCGA-PRAD dataset. The top row shows the normalized matrices from the bottom row matrices displaying the total number of cases