| Literature DB >> 36005456 |
Eduardo Conde-Sousa1,2, João Vale1,3, Ming Feng4, Kele Xu5, Yin Wang4, Vincenzo Della Mea6, David La Barbera6, Ehsan Montahaei7, Mahdieh Baghshah7, Andreas Turzynski8, Jacob Gildenblat9, Eldad Klaiman10, Yiyu Hong11, Guilherme Aresta12,13, Teresa Araújo12,13, Paulo Aguiar1,2, Catarina Eloy1,3,14, Antonio Polónia1,3.
Abstract
Breast cancer is the most common malignancy in women worldwide, and is responsible for more than half a million deaths each year. The appropriate therapy depends on the evaluation of the expression of various biomarkers, such as the human epidermal growth factor receptor 2 (HER2) transmembrane protein, through specialized techniques, such as immunohistochemistry or in situ hybridization. In this work, we present the HER2 on hematoxylin and eosin (HEROHE) challenge, a parallel event of the 16th European Congress on Digital Pathology, which aimed to predict the HER2 status in breast cancer based only on hematoxylin-eosin-stained tissue samples, thus avoiding specialized techniques. The challenge consisted of a large, annotated, whole-slide images dataset (509), specifically collected for the challenge. Models for predicting HER2 status were presented by 21 teams worldwide. The best-performing models are presented by detailing the network architectures and key parameters. Methods are compared and approaches, core methodologies, and software choices contrasted. Different evaluation metrics are discussed, as well as the performance of the presented models for each of these metrics. Potential differences in ranking that would result from different choices of evaluation metrics highlight the need for careful consideration at the time of their selection, as the results show that some metrics may misrepresent the true potential of a model to solve the problem for which it was developed. The HEROHE dataset remains publicly available to promote advances in the field of computational pathology.Entities:
Keywords: HER2; breast cancer; computational pathology; deep learning
Year: 2022 PMID: 36005456 PMCID: PMC9410129 DOI: 10.3390/jimaging8080213
Source DB: PubMed Journal: J Imaging ISSN: 2313-433X
Figure 1(A) HER2-negative BC (HE); (B) HER2-positive BC (HE); (C–F) HER2 IHC (score of 0, 1+, 2+, and 3+, respectively); (G,H) bright-field ISH assay (HER2-negative and HER2-positive, respectively).
Distribution of IHC scores and HER2 status in the training dataset.
| IHC Score | HER2-Negative | HER2-Positive | Total |
|---|---|---|---|
| 0 | 43 (12%) | 0 (0%) | 43 (12%) |
| 1+ | 46 (13%) | 1 (0%) | 47 (13%) |
| 2+ | 126 (35%) | 104 (29%) | 230 (64%) |
| 3+ | 0 (0%) | 39 (11%) | 39 (11%) |
| Total | 215 (60%) | 144 (40%) | 359 (100%) |
Distribution of IHC scores and HER2 status in the test dataset.
| IHC Score | HER2-Negative | HER2-Positive | Total |
|---|---|---|---|
| 0 | 19 (13%) | 0 (0%) | 19 (13%) |
| 1+ | 18 (12%) | 0 (0%) | 18 (12%) |
| 2+ | 53 (35%) | 32 (21%) | 85 (57%) |
| 3+ | 0 (0%) | 27 (18%) | 27 (18%) |
| Not Tested | 0 (0%) | 1 (1%) | 1 (1%) |
| Total | 90 (60%) | 60 (40%) | 150 (100%) |
Figure 2Overall architecture of the model developed by team Macaroon.
Figure 3Overall architecture of the model developed by team MITEL.
Figure 4Overall architecture of the model developed by team Piaz.
Figure 5Overall architecture of the model developed by team Dratur.
Figure 6Overall architecture of the model developed by team IRISAI.
Figure 7Overall architecture of the model developed by team Arontier_HYY.
Final classification of the HEROHE Challenge according to score.
| Rank | Team | AUC | Precision | Recall | F1 Score |
|---|---|---|---|---|---|
| 1 | Macaroon | 0.71 | 0.57 | 0.83 | 0.68 |
| 2 | MITEL | 0.74 | 0.58 | 0.78 | 0.67 |
| 3 | Piaz | 0.84 | 0.77 | 0.55 | 0.64 |
| 4 | Dratur | 0.75 | 0.57 | 0.70 | 0.63 |
| 5 | IRISAI | 0.67 | 0.58 | 0.67 | 0.62 |
| 6 | Arontier_HYY | 0.72 | 0.52 | 0.73 | 0.61 |
| 7 | KDE | 0.62 | 0.51 | 0.75 | 0.61 |
| 8 | joangibert14 | 0.66 | 0.48 | 0.78 | 0.60 |
| 9 | VISILAB | 0.63 | 0.51 | 0.73 | 0.60 |
| 10 | MIRL | 0.50 | 0.40 | 1.00 | 0.57 |
| 11 | aetherAI | 0.66 | 0.49 | 0.67 | 0.57 |
| 12 | NCIC | 0.63 | 0.52 | 0.62 | 0.56 |
| 13 | biocenas | 0.57 | 0.46 | 0.53 | 0.50 |
| 14 | HEROH | 0.59 | 0.46 | 0.53 | 0.49 |
| 15 | Reza Mohebbian | 0.61 | 0.51 | 0.43 | 0.47 |
| 16 | mindmork | 0.63 | 0.53 | 0.38 | 0.45 |
| 17 | Institute of Pathology Graz | 0.63 | 0.50 | 0.38 | 0.43 |
| 18 | katherandco | 0.44 | 0.44 | 0.40 | 0.42 |
| 19 | QUILL | 0.63 | 0.50 | 0.33 | 0.40 |
| 20 | HEROHE_Challenge | 0.48 | 0.37 | 0.27 | 0.31 |
| 21 | UC-CSSE | 0.47 | 0.31 | 0.27 | 0.29 |
Classification of the HEROHE Challenge when the best possible threshold for the test dataset is used.
| Rank | Team | Threshold | F1 Score |
|---|---|---|---|
| 1 | Piaz | 0.39 | 0.73 |
| 2 | MITEL | 0.37 | 0.7 |
| 3 | Dratur | 0.34 | 0.69 |
| 4 | irisai | 0.39 | 0.68 |
| 5 | Macaroon | 0.01 | 0.68 |
| 6 | Arontier_HYY | 0.17 | 0.66 |
| 7 | visilab | 0.1 | 0.65 |
| 8 | KDE | 0.26 | 0.63 |
| 9 | katherandco | 0.83 | 0.62 |
| 10 | QUILL | 0.23 | 0.62 |
| 11 | aetherAI | 0.17 | 0.6 |
| 12 | HEROH | 0.12 | 0.6 |
| 13 | joangibert14 | 0.5 | 0.6 |
| 14 | biocenas | 0.23 | 0.59 |
| 15 | Institute_of_Pathology_Graz | 0.42 | 0.59 |
| 16 | mindmork | 0.07 | 0.59 |
| 17 | NCIC | 0.49 | 0.59 |
| 18 | Reza_Mohebbian | 0.01 | 0.58 |
| 19 | uc_csse | 0.02 | 0.58 |
| 20 | HEROHE_Challenge | 0 | 0.57 |
| 21 | MIRL | 0 | 0.57 |
Figure 8ROC curves for the methods proposed by the six best teams in the test dataset.
Figure 9Precision–recall curves for the methods proposed by the six best teams in the test dataset.
Classification of the HEROHE Challenge in the subset of equivocal cases by IHC (score of 2+).
| Team | AUC | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Macaroon | 0.84 | 0.75 | 0.84 | 0.79 |
| Arontier_HYY | 0.88 | 0.67 | 0.81 | 0.73 |
| MITEL | 0.85 | 0.74 | 0.72 | 0.73 |
| Dratur | 0.85 | 0.71 | 0.75 | 0.73 |
| IRISAI | 0.85 | 0.72 | 0.72 | 0.72 |
| KDE | 0.77 | 0.67 | 0.75 | 0.71 |
| Piaz | 0.84 | 0.79 | 0.59 | 0.68 |
| VISILAB | 0.77 | 0.64 | 0.66 | 0.65 |
| NCIC | 0.70 | 0.58 | 0.69 | 0.63 |
| Biocenas | 0.71 | 0.61 | 0.63 | 0.62 |
| AetherAI | 0.77 | 0.53 | 0.72 | 0.61 |
| QUILL | 0.78 | 0.79 | 0.47 | 0.59 |
| Joangibert14 | 0.70 | 0.46 | 0.72 | 0.56 |
| MIRL | 0.50 | 0.38 | 1.00 | 0.55 |
| Reza Mohebbian | 0.64 | 0.52 | 0.47 | 0.49 |
| Institute of Pathology Graz | 0.70 | 0.50 | 0.47 | 0.48 |
| HEROH | 0.63 | 0.46 | 0.50 | 0.48 |
| Mindmork | 0.61 | 0.43 | 0.31 | 0.36 |
| Katherandco | 0.32 | 0.67 | 0.25 | 0.36 |
| UC-CSSE | 0.61 | 0.42 | 0.31 | 0.36 |
| HEROHE_Challenge | 0.50 | 0.37 | 0.22 | 0.27 |
Figure 10ROC curves for the methods proposed by the six best teams in the subset of equivocal cases by IHC (score of 2+).
Figure 11Precision–recall curves for the methods proposed by the six best teams in the subset of equivocal cases by IHC (score of 2+).
Main characteristics of the submitted methods. “Approach” lists the main methods used to classify the WSI; “pre-trained” indicates whether the transfer learning approach was used; “ensemble” indicates whether the method uses one or multiple models, and their number; “external sets” indicates external datasets used in pre-trained models; “input size” indicates the size, in pixels, of the images or tiles required by the model (WSI signifies that the entire WSI was input into the model at the same time).
| Rank | Team | Approach | Pre-Trained | Ensemble | External Sets | Input Size |
|---|---|---|---|---|---|---|
| 1 | Macaroon | ResNet34 | yes | 2 | CAMELYON16 | 256 × 256 |
| 2 | MITEL | DenseNet201 + ResNet152 | yes | 2 | ImageNet + BACH | 512 × 512 |
| 3 | Piaz | EfficientNetB0 | yes | x | BACH | 222 × 222 |
| 4 | Dratur | EfficientNetB2 + EfficientNetB4 + | yes | 5 | ImageNet | 256 × 256 |
| 5 | IRISAI | U-Net + ResNet50 | no + yes | 2 | ImageNet | 256 × 256 |
| 6 | Arontier_HYY | EfficientNetB1 + EfficientNetB3 + EfficientNetB5 + LSTM | no | 4 | x | 1024 × 1024 + 480 × 840 + 912 × 912 |
| 7 | KDE | Custom + InceptionV3 | no | 3 | x | 128 × 128 |
| 8 | joangibert14 | ResNet101 | yes | x | [ | 224 × 224 |
| 9 | VISILAB | SE-ResNet50 | no | x | x | 299 × 299 |
| 10 | MIRL | DenseNet201 | yes | x | ImageNet | 9192 × 9192 |
| 11 | aetherAI | Custom based on ResNet 50 v2 | no | x | x | WSI re-scaled to 10,000 × 10,000 |
| 12 | NCIC | ResNet101 + ResNet50 [ | yes | 2 | ImageNet | 1024 × 1024 |
| 13 | biocenas | Custom CNN model | no | 3 | x | 32 × 32 |
| 14 | HEROH | ResNet18 + ResNet50 | yes | 2 | ImageNet | 128 × 128 |
| 15 | Reza Mohebbian | Custom (non-Deep Learning) | no | x | x | WSI |
| 16 | mindmork | Kmeans + U-Net + Xception [ | no | 3 | x | 256 × 256 |
| 17 | Institute of Pathology Graz | QuPath for color deconvolution and feature extractor + Custom CNN | no | 2 | x | WSI |
| 18 | katherandco | QuPath for tumor segmentation + ResNet50 | no | x | ImageNet | 512 × 512 |
| 19 | QUILL | SuperPixel patch splitting + DenseNet + Mean Shift Clustering | no | 2 | x | WSI |
| 20 | HEROHE_Challenge | Custom CNN + Kmeans + XGBoost | yes | 3 | CIFAR-10 dataset | 200 × 200 |
| 21 | UC-CSSE | Xception + DenseNet169 + ResNet34 + ResNet101 + random forest + extra trees + gradient boosting | yes | 7 | CAMELYON16 + Data Science Bowl 2018 | 299 × 299 |