| Literature DB >> 35551217 |
Wenqi Lu1, Islam M Miligy2,3, Fayyaz Minhas1, Young Saeng Park4, David R J Snead5, Emad A Rakha2, Clare Verrill6,7,8, Nasir Rajpoot9,10.
Abstract
Due to COVID-19 outbreaks, most school pupils have had to be home-schooled for long periods of time. Two editions of a web-based competition "Beat the Pathologists" for school age participants in the UK ran to fill up pupils' spare time after home-schooling and evaluate their ability on contributing to AI annotation. The two editions asked the participants to annotate different types of cells on Ki67 stained breast cancer images. The Main competition was at four levels with different level of complexity. We obtained annotations of four kinds of cells entered by school pupils and ground truth from expert pathologists. In this paper, we analyse school pupils' performance on differentiating different kinds of cells and compare their performance with two neural networks (AlexNet and VGG16). It was observed that children tend to get very good performance in tumour cell annotation with the best F1 measure 0.81 which is a metrics taking both false positives and false negatives into account. Low accuracy was achieved with F1 score 0.75 on positive non-tumour cells and 0.59 on negative non-tumour cells. Superior performance on non-tumour cell detection was achieved by neural networks. VGG16 with training from scratch achieved an F1 score over 0.70 in all cell categories and 0.92 in tumour cell detection. We conclude that non-experts like school pupils have the potential to contribute to large-scale labelling for AI algorithm development if sufficient training activities are organised. We hope that competitions like this can promote public interest in pathology and encourage participation by more non-experts for annotation.Entities:
Mesh:
Year: 2022 PMID: 35551217 PMCID: PMC9098471 DOI: 10.1038/s41598-022-11782-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1(a) One Ki67 stained WSI where one tumour region is zoomed in. (b) Example images of each level in Pilot competition. (c) Example images of each level in Main competition (Red: Positive Tumour or PT; green: Negative Tumour or NT; yellow: Positive Non-Tumour or PNT; blue: Negative Non-Tumour or NNT).
Setup of the pilot and main competition.
| Launch time | No. of levels | No. of images | No. of cell categories | |
|---|---|---|---|---|
| Practice | 3rd August 2020–6th September 2020 | 1 | 50 | 4 in all images |
| Competition | 3 (Mild, Hot, Spicy) | 20, 30, 50 respectively | 4 in all images | |
| Practice | 17th October 2020–16th November 2020 | 2 | 10 | 1 in 5 images and 2 in another 5 images |
| Competition | 4 (Mild, Hot, Spicy, Supercharger) | 20, 40, 60, 80 respectively | 1, 2, 3, 4 respectively | |
Figure 2Illustrations of (a) transfer learning: a neural network is pretrained on ImageNet and subsequently trained on Ki67 cell images to perform 4 class classification (b) training from scratch: training the complete network from the first layer using the Ki67 cell images (PT: Positive Tumour; NT: Negative Tumour; PNT: Positive Non-Tumour; NNT: Negative Non-Tumour).
Figure 3(a) Percentage of different kinds of cells in the four competition levels: Mild, Hot, Spicy and Supercharger. (b) Percentage of participants who pass the current level and join the next level. (c) Accuracy of all participants in the competition levels.
Figure 4Examples of cell annotation results by the pathologist and three participants who achieved the top three accuracies in the Supercharger level. (Positive Tumour: red; Negative Tumour: green; Positive Non-Tumour: yellow; Negative Non-Tumour: blue). The annotations by the pathologists were used as ground truth.
F1 score on cell detection by participants who ranked top three in each level.
| Level | Cell | F1 score (1st) | F1 score (2nd) | F1 score (3rd) |
|---|---|---|---|---|
| Mild | PT | 0.90 | 0.89 | 0.89 |
| Hot | PT | 0.88 | 0.88 | 0.87 |
| NT | 0.84 | 0.84 | 0.82 | |
| Spicy | PT | 0.84 | 0.84 | 0.81 |
| NT | 0.82 | 0.80 | 0.80 | |
| PNT | 0.78 | 0.72 | 0.74 | |
| Supercharger | PT | 0.82 | 0.78 | 0.74 |
| NT | 0.80 | 0.80 | 0.81 | |
| PNT | 0.75 | 0.68 | 0.61 | |
| NNT | 0.59 | 0.60 | 0.61 |
Figure 5Distribution of F1 among different cell categories and competition levels. (a) Mild level; (b) Hot level; (c) Spicy level; (d) Supercharger level. (PT: Positive Tumour; NT: Negative Tumour; PNT: Positive Non-Tumour; NNT: Negative Non-Tumour).
F1 score on cell detection using different neural networks and strategies.
| Network | Pre-trained | F1 (PT) | F1 (NT) | F1 (PNT) | F1 (NNT) |
|---|---|---|---|---|---|
| AlexNet | Yes (ImageNet) | 0.78 | 0.82 | 0.34 | 0.44 |
| AlexNet | No | 0.88 | 0.93 | 0.69 | 0.71 |
| VGG16 | Yes (ImageNet) | 0.72 | 0.82 | 0.34 | 0.48 |
| VGG16 | No |
Significant values are in bold.
Figure 6Comparison between the F1 achieved by participants and the one by neural networks which are trained from scratch in Supercharger level. Purple triangle represents the results using VGG16 while purple square represents the results by AlexNet.