Literature DB >> 32622359

Accuracy and efficiency of an artificial intelligence tool when counting breast mitoses.

Liron Pantanowitz^1,2, Douglas Hartman³, Yan Qi⁴, Eun Yoon Cho⁵, Beomseok Suh⁶, Kyunghyun Paeng⁶, Rajiv Dhir³, Pamela Michelow⁷, Scott Hazelhurst⁸, Sang Yong Song⁵, Soo Youn Cho⁵.

Abstract

BACKGROUND: The mitotic count in breast carcinoma is an important prognostic marker. Unfortunately substantial inter- and intra-laboratory variation exists when pathologists manually count mitotic figures. Artificial intelligence (AI) coupled with whole slide imaging offers a potential solution to this problem. The aim of this study was to accordingly critique an AI tool developed to quantify mitotic figures in whole slide images of invasive breast ductal carcinoma.
METHODS: A representative H&E slide from 320 breast invasive ductal carcinoma cases was scanned at 40x magnification. Ten expert pathologists from two academic medical centers labeled mitotic figures in whole slide images to train and validate an AI algorithm to detect and count mitoses. Thereafter, 24 readers of varying expertise were asked to count mitotic figures with and without AI support in 140 high-power fields derived from a separate dataset. Their accuracy and efficiency of performing these tasks were calculated and statistical comparisons performed.
RESULTS: For each experience level the accuracy, precision and sensitivity of counting mitoses by users improved with AI support. There were 21 readers (87.5%) that identified more mitoses using AI support and 13 reviewers (54.2%) that decreased the quantity of falsely flagged mitoses with AI. More time was spent on this task for most participants when not provided with AI support. AI assistance resulted in an overall time savings of 27.8%.
CONCLUSIONS: This study demonstrates that pathology end-users were more accurate and efficient at quantifying mitotic figures in digital images of invasive breast carcinoma with the aid of AI. Higher inter-pathologist agreement with AI assistance suggests that such algorithms can also help standardize practice. Not surprisingly, there is much enthusiasm in pathology regarding the prospect of using AI in routine practice to perform mundane tasks such as counting mitoses.

Entities: Chemical Disease Gene Species

Keywords: Artificial intelligence; Breast; Carcinoma; Counting; Digital pathology; Informatics; Mitosis; Tumor grade; Whole slide imaging

Mesh：

Year: 2020 PMID： 32622359 PMCID： PMC7335442 DOI： 10.1186/s13000-020-00995-z

Source DB: PubMed Journal: Diagn Pathol ISSN： 1746-1596 Impact factor: 2.644

Background

Handling breast cancer specimens is common in pathology practice. Rendering a pathology report after processing these specimens not only requires an accurate diagnosis, but in the case of invasive carcinoma also requires pathologists to assign the correct histologic tumor grade. A key component of the Nottingham (or modified Scarff-Bloom-Richardson) grading system for invasive breast carcinoma includes the mitotic count [1]. A mitotic count per 10 high-power fields (HPFs) of 0–7 is scored 1, 8–15 is scored 2, and greater than or equal to 16 is given a score of 3. This proliferation activity in breast carcinoma is an important prognostic marker [2]. Some studies have shown that the mitotic count is even a better marker than Ki67 (proliferation index) at selecting patients for certain therapy such as tamoxifen [3]. Counting mitotic figures in hematoxylin and eosin (H&E) stained histology sections is a task typically performed by pathologists while they visually examine a glass slide using a conventional light microscope. Unfortunately, there is substantial inter- and intra-laboratory variation with manual grading of breast cancer in routine pathology practice [4]. This is not surprising, as manually counting mitotic figures by pathologists is subjective and suffers from low reproducibility. Manually counting mitoses can take a pathologist around 5–10 min to perform [5]. Sometimes it may be difficult to discern a mitotic figure from a cell undergoing degeneration, apoptosis or necrosis. There are also differences of opinion on how best to count mitotic figures [6, 7]. The reason for this controversy is that the mitotic activity index depends on the number of mitoses counted in a predefined area (usually in mm2) or within a certain number of HPFs that may vary depending on a microscope’s lenses and widefield microscopy view. Artificial intelligence (AI) coupled with whole slide imaging offers a potential solution to the aforementioned problem. If developed and deployed successfully, an AI-based tool could potentially automate the task of counting mitotic figures in breast carcinoma with better accuracy and efficiency. To date, investigators have validated that making a histopathologic diagnosis in breast specimens can be reliably performed on a whole slide image (WSI) [8]. Moreover, using WSIs to manually count mitoses in breast cancer is reported to be reliable and reproducible [9, 10]. Hanna et al. showed that counting mitotic figures in WSIs outperformed counts using glass slides, albeit this took readers longer using WSI [11]. Several studies have been published showing that digital image analysis can successfully automate the quantification of mitoses [12-18]. Clearly, there is great potential for leveraging digital pathology and AI [19]. AI can benefit pathologists practicing in high, middle and low income countries, especially with the rise in cancer and shortage of anatomical pathologists [20]. However, AI applications in healthcare have not been vigorously validated for reproducibility, generalizability and in the clinical setting [21]. Moreover, hardly any pathology laboratories are currently using AI tools on a routine basis. To the best of our knowledge, there have been no studies addressing whether an AI-based algorithm actually improves pathologist accuracy and efficiency when scoring mitotic figures. The aim of this study was to accordingly critique an AI tool developed to detect and quantify mitotic figures in breast carcinoma.

Methods

Figure 1 depicts a flow chart of the methodology and datasets employed in developing and validating the AI-based tool utilized in this study to quantify mitotic figures in digital images of invasive breast carcinoma.

Fig. 1

Flow chart of the methodology and datasets employed in developing and validating an AI-based tool to quantify mitoses in breast carcinoma

Datasets

A total of 320 invasive breast ductal carcinoma cases with an equal distribution of grades were selected. Half of these cases were from the archives of the University of Pittsburgh Medical Center (UPMC) in the USA and the rest obtained from Samsung Medical Center (SMC) in Seoul, South Korea. Nearly all of the cases were from females (1 case was from a male with breast cancer). The average patient age was 54.7 years. All cases included were mastectomies with the following range of tumor stages: stage IA (23.6%), IB (7.1%), IIA (31.4%), IIB (23.6%), IIC (0.7%), IIIA (6.4%), IV (0.7%), and data unavailable in 9 cases (6.4%). Table 1 provides a summary of the cancer grade, hormone receptor and HER2 status for enrolled cases (with available data). The average Ki-67 index was 38.3% (Mdn = 34.5%, range 3.0–99.0%). This result was only available in 80 cases, and this subset of cases had higher mitosis scores (n = 23 score 2, n = 48 score 3) and Nottingham grades (n = 34 grade 2, n = 41 grade 3). The average proliferation index was accordingly skewed in this subset and higher than would be expected for a typical mixed breast cancer population [22].

Table 1

Profile of invasive ductal carcinoma cases enrolled in the study

Reported breast carcinoma parameters		%
Mitosis Score	1	21.4%
	2	31.4%
	3	47.1%
Nottingham Grade	1	7.9%
	2	46.4%
	3	45.7%
ER	Not available	5.0%
	Negative	25.7%
	Positive	69.3%
PR	Not available	5.0%
	Negative	32.1%
	Positive	62.9%
HER2/neu (IHC status)	Not available	5.0%
	Negative	59.3%
	Equivocal	9.3%
	Weakly positive	1.4%
	Positive	25.0%
HER2/neu (FISH status)	Not available	89.3%
	Negative	10.0%
	Positive	0.7%

ER estrogen receptor, FISH fluorescence in situ hybridization, HER2 human epidermal growth factor receptor 2, IHC immunohistochemistry, PR progesterone receptor

Profile of invasive ductal carcinoma cases enrolled in the study ER estrogen receptor, FISH fluorescence in situ hybridization, HER2 human epidermal growth factor receptor 2, IHC immunohistochemistry, PR progesterone receptor A representative H&E glass slide from each case was scanned. At UPMC slides were scanned at 40x magnification (0.25 μm/pixel resolution) using an Aperio AT2 scanner (Leica Biosystems Inc., Buffalo Grove, IL, USA). At SMC slides were digitized at 40x magnification (0.2 μm/pixel resolution) using a 3D Histech P250 instrument (3DHISTECH, Budapest, Hungary). All acquired whole slide image (WSI) files were de-identified. The AI training dataset was comprised of 60 WSIs from UPMC and 60 WSIs from SMC, which provided 16,800 grids (1 grid = ¼ high-power field [HPF]). One HPF is equivalent to 0.19 mm2. The AI validation dataset, comprised of another 30 WSIs from UPMC and 30 WSIs from SMC, was used to generate 120 HPFs for annotation. A separate dataset (70 WSIs from UPMC and 70 WSIs from SMC) was subsequently used for a reader study where each WSI file was randomly broken up into 140 representative digital patches (HPFs). Users interacted with individual patches on a computer monitor. The dataset used for analytical validation of the algorithm was different from the dataset selected for the clinical validation study.

Training (deep learning algorithm)

A deep learning algorithm (Lunit Inc., Seoul, South Korea) was employed for the automated detection of mitoses in digital images [23]. The AI algorithm was trained on an independent dataset, that consisted of 16,800 digital image patches from 120 WSIs (half from UPMC and half from SMC). Three expert pathologists annotated mitoses to construct the ground truth for training. The mitotic figures, which were the consensus of at least two of these pathologists, were used to train the AI algorithm. The algorithm was based on Faster RCNN [24] by ResNet-101 [25] backbone network that has pre-trained weights. The down sampling ratio was 8 and feature maps from the first stage were cropped and resized at 14 × 14 an then max pooled to 7 × 7 for the second stage classifier. Anchor size was 128 × 128 with a single fixed ratio. The number of proposals at the first stage was 2000 to enable a very dense sampling of proposal boxes. Then, box IOU based NMS was performed for post-processing. Various input data augmentation methods such as contrast, brightness, jittering, flip and rotation were performed to build a robust AI algorithm. To select the final model for our reader study, the deep learning algorithm was validated on a separate dataset. Employing the validation dataset we achieved 0.803 mean AP (mAP) which demonstrates good performance. The mAP represents the area under the precision recall curve. A precision recall curve was used to calculate the mAP instead of AUC, because of the large class imbalance (i.e., many non-mitotic cells).

Ground truth

Seven expert pathologists (4 from UPMC and 3 from SMC) annotated (labeled) mitotic figures in 140 digital image patches using a web-based annotation tool. The tool displayed image patches of breast carcinoma at high magnification, in which clicking on cells automatically generated a square box that annotated the specified cell (i.e. with the mitotic figure present). It required around 10 s to annotate mitotic figures per patch. Pathologist consensus was used to establish ground truth, where agreement of at least 4/7 pathologists was required for each image. Whilst there is no published data available to support the exact number of pathologists required to be in agreement to reach consensus, a consensus of 4 out of 7 was chosen for this study in order to utilize the highest number of cases (n = 93, 66.4%) while maintaining consensus among the majority of ground truth makers (57.1%). Table S1 shows the number of cases for each consensus level. Further, for 100% agreement the mitotic figures would likely be very obvious and thus too easy to detect, which would not be suitable to measure performance. Since prior studies have proven that WSI can be used for mitotic cell detection and offers similar reproducibility to the microsocpe [10, 26], we opted to use WSI and not glass slides for establishing the ground truth in this study. Pathologists who annotated slides for ground truth generation did not participate in the subsequent reader study.

Observer performance test (OPT)

For the OPT (reader study), the accuracy and efficiency of mitotic cell detection was compared based on mitotic figure scores provided by humans and the AI algorithm. There were 12 readers at each institution (total of 24 reviewers) that varied in expertise/years of experience (n = 6 2nd-4th year pathology residents/registrars, n = 3 fellows/post-residency trainees, and n = 3 board-certified pathologists). Table S2 summarizes the experience level of all participants involved in the study. Digital slides were presented to test takers in the form of 140 HPFs. Each HPF was equivalent to four digital image patches. There were two reader groups. In group 1 (no AI), readers were first shown HPFs and asked to manually select mitotic figures without AI support. In group 2 (with AI), readers were first shown HPFs where mitotic figures were pre-marked by the AI tool (Figure 2) and asked to accept/reject the algorithm’s selection. Each group repeated this task, but now with/without AI employing a cross-over design to minimize sequential confounding bias. A washout period of 4 weeks was used to control for recall bias between re-reviews of each image. A web-based tool recorded user clicks on images and their time (in seconds) to perform this task. The OPT was replicated at UPMC and SMC institutions. All readers were trained prior to the start of the study, anonymized, and provided informed consent to participate. The readers were not formally asked to provide feedback about their user experience.

Fig. 2

Web-based tool showing a HPF of breast carcinoma. a Screenshot of the web-based tool used for the observer performance test without AI. The small green dots indicate mitotic figures marked by the reader. b Screenshot of the web-based tool used for the observer performance test with AI. The green boxes indicate mitotic figures detected by AI

Statistical Analysis

Accuracy of mitotic cell detection was calculated by comparing cells identified by reviewers to cells identified by the ground truth (i.e. consensus of at least 4 of the 7 ground truth makers). Accuracy was compared for reviews with and without AI support for each reviewer. The hypothesis being tested was that reviewer accuracy improves with AI support. To test this hypothesis a Pearson chi-square analysis was performed. For the OPT part of this study, true positive (TP), false positive (FP) and false negative (FN) were calculated with and without AI support. Precision for pathologists was calculated as TP / (TP + FP). Sensitivity was calculated as TP / (TP + FN). As true negatives (TN) represented not only cells, but also all of the white space where no cells were present in an image, TN greatly outnumber the combination of TP + FP + FN and therefore f-scores were calculated (f-score = 2 * ((sensitivity * precision) / (sensitivity + precision)). F-scores closer to 1 indicate perfect detection and precision. Since TN were not calculated, specificity was not possible to calculate. Efficiency was calculated as seconds spent reviewing each case. The normality of the distribution of the time variable was examined using the Shapiro–Wilk normality test. As the data were not normally distributed, non-parametric statistical tests were used. Wilcoxon signed-rank test was used to compare time spent on the task of counting mitoses with and without AI support. We assumed that image reviews lasting longer than 10 min were outliers (e.g. indicative of an interruption) and thus excluded. Out of the 6720 values in the dataset, 73 (1.1%) were accordingly excluded from analysis. Statistical comparisons were performed for time spent per case with and without AI support for each individual, for each user’s experience level, and overall. Statistical significance was assumed at p < .05. Analysis was performed using IBM SPSS Statistics 22 and Microsoft Excel 365.

Results

Accuracy and precision findings

A precision recall (PR) curve shows the algorithm’s performance (Figure 3). This PR curve shows the relationship between positive predictive value and sensitivity for every possible cut-off. Akin to the area under a ROC curve (i.e. AUC), the area under the PR curve is large indicating the high recall and precision value of the algorithm at specific cut-offs. Figure 4 shows the accuracy and precision of mitotic cell detection with and without the use of AI support. For each experience level the accuracy and precision were higher with AI support. Table 2 with Chi-square results confirmed that accurate mitotic cell detection was significantly higher with the use of AI support for each experience level. Table S3 shows the individual reviewer accuracy results. Of note, all but one reviewer had higher accuracy with the support of AI. Of the 23 reviewers with improved accuracy, 20 (87%) had a statistically significant increase. Table 3 demonstrates TP, FP and FN values for readers (Table S4 shows individual reviewer results). There were 21 out of the 24 readers (87.5%) that identified more mitoses using AI support. Further, 13 reviewers (54.2%) decreased the quantity of falsely flagged mitoses (FP) using AI support, and 21 (87.5%) decreased the quantity of mitoses that were missed (FN) using AI support. There were six reviewers that falsely detected 100 or more additional mitoses (FP) when screening cases without AI support. Table 3 shows that the number of FPs detected with the use of AI support (2899) is lower than without the use of AI support (3587).

Fig. 3

Algorithm performance for mitotic figure detection in the analytical validation dataset

Fig. 4

Accuracy and precision with and without AI support per user experience level

Table 2

Accuracy by experience level

User Experience Level	No AI Support	With AI Support	Improved Accuracy with AI support?	X² (degrees of freedom)	p-value
PGY-2 (n = 4)	36.8%	51.6%	Yes	89.30 (1)	<.001
PGY-3 (n = 4)	47.5%	58.4%	Yes	53.12 (1)	<.001
PGY-4 (n = 4)	38.6%	52.9%	Yes	87.13 (1)	<.001
Fellow (n = 6)	50.1%	57.1%	Yes	29.82 (1)	<.001
Faculty (n = 6)	43.1%	55.2%	Yes	89.84 (1)	<.001
Overall	43.9%	55.2%	Yes	320.61 (1)	<.001

PGY postgraduate year

Table 3

True positive (TP), false positive (FP), and false negative (FN) values for mitotic cell detection

User Experience Level	No AI support			With AI support
User Experience Level	TP	FP	FN	TP	FP	FN
PGY-2 (n = 4)	749	509	779	1003	414	525
PGY-3 (n = 4)	1135	861	393	1208	539	320
PGY-4 (n = 4)	793	525	735	1149	642	379
Fellow (n = 6)	1524	751	768	1659	611	633
Faculty (n = 6)	1395	941	897	1647	693	645
Overall	5596	3587	3572	6666	2899	2502

PGY postgraduate year

Algorithm performance for mitotic figure detection in the analytical validation dataset Accuracy and precision with and without AI support per user experience level Accuracy by experience level PGY postgraduate year True positive (TP), false positive (FP), and false negative (FN) values for mitotic cell detection PGY postgraduate year Sensitivity for mitotic cell detection increased with the use of AI support for each experience level (Table S5). Sensitivity for mitotic cell detection per individual reviewer was higher for all but 3 reviewers. Precision for mitotic cell detection also increased with the use of AI support for each experience level (Table S6). Sixteen of the 24 reviewers (66.7%) had increased precision with AI support. The f-score (Table S7) for mitotic cell detection without the use of AI support was 0.61, and with the use of AI support was 0.71. The higher f-score with the use of AI suggests that AI support improves overall precision and TP detection of mitotic cells. Cases with AI support also had higher f-scores for each experience level, with 23 of the 24 reviewers (95.8%) demonstrating a higher f-score with AI support. The datasets utilized included only the overall grade (i.e. sum of percent tubules, nuclear pleomorphism and mitoses/10 HPF) for all breast cancers and no details of the exact mitotic figures (i.e. score 1, 2 or 3) for each case. Therefore, we were unable to investigate whether any change in the number of mitoses scored in this study may have altered the grade.

Efficiency findings

A Wilcoxon signed-rank test indicated that more time was spent on detecting mitotic cells without the use of AI support (median = 36.00 s) than with AI support (median = 26.00 s), Z = − 14.759, p < .001, r = .25. Overall, this represents a time savings of 27.8%. Irrespective of whether readers started counting mitoses with or without AI support, nearly all of them read faster with AI assistance, but this was not statistically different. Figure 5 shows the median time spent detecting mitoses with and without AI support by reader experience level. Despite experience level, most participants spent less time detecting mitotic cells with the use of AI support. Fellows had the largest decline, with a median of 44 s spent without the aid of AI compared to 16 s with AI support. The only experience level that had a longer median time spent with AI support was postgraduate year (PGY)-4 users. Table 4 summarizes the median time spent and statistical results per user’s experience level with and without AI support (Table S8 shows individual reviewer results).

Fig. 5

Median number of seconds spent with and without AI support per user experience level

Table 4

Median time to count mitoses by study participant experience level

User Experience Level	Median # of seconds		AI or no AI faster?	Z	p-value	r
User Experience Level	No AI support	With AI support	AI or no AI faster?	Z	p-value	r
PGY-2 (n = 4)	38.00	26.00	AI	−8.799	<.001	.37
PGY-3 (n = 4)	39.00	30.00	AI	−3.290	.001	.14
PGY-4 (n = 4)	22.00	29.50	No AI	−3.058	.002	.13
Fellow (n = 6)	44.00	16.00	AI	−16.730	<.001	.58
Faculty (n = 6)	33.00	30.00	AI	−2.584	.010	.09
Overall	36.00	26.00	AI	−14.759	<.001	.25

r effect size, PGY postgraduate year

Median number of seconds spent with and without AI support per user experience level Median time to count mitoses by study participant experience level r effect size, PGY postgraduate year

Conclusions

There are formidable challenges with successfully translating AI in healthcare [10, 19, 26]. Some of these challenges include technical difficulties, complex implementations, data ownership issues, lack of reimbursement, delayed regulatory approval, ethical concerns, and overcoming human trepidation regarding AI (e.g. mistrust related to the ‘black box’ phenomenon of AI). Bairnov et al. showed that an AI-based decision support tool in Radiology had significant differences with accuracy and inter-operator variability depending on how AI was deployed (i.e. sequential or independent workflow) [21]. To the best of our knowledge, no studies have been published examining the interaction of pathology end users with AI to determine the pros and cons of using AI to assist with counting mitoses. Such studies would provide much needed translational evidence that could help develop recommendations and guidelines for the safe and effective use of AI in routine diagnostic Anatomical Pathology workflow. This cross validation study demonstrates that pathology end-users were more accurate and efficient at quantifying mitotic figures in digital images of invasive breast carcinoma with the aid of an AI tool that detects mitoses. These data show that the accuracy, sensitivity, precision, and f-scores all increased for each participant experience level with the use of AI support. Readers in both groups had higher inter-pathologist agreement with AI assistance, suggesting that AI can help standardize practice and perhaps result in more reproducible diagnoses. Very few participants unexpectedly had a lower accuracy performance with AI support. The results of this study showed that only 54.2% of reviewers decreased the quantity of falsely flagged mitoses using AI support. The reason why false positives were not reduced across all readers with AI support could be that they missed annotated mitotic figures because they were not clearly visible in the user interface or that some readers may not have believed the AI results. A detailed analysis of the sessions from these individuals showed that for some cases they spent an unusually long time counting mitoses (e.g. 451 s in one case with AI support, but only 15 s on the same case without AI support). This likely points to distraction more than AI causing an actual delay and it is uncertain if these outliers skewed the data. With regard to improved efficiency, the use of AI resulted in a 27.8% decrease in time for mitotic cell detection. In other words, for every 1 h spent searching for cells with mitotic figures without AI support, roughly 16.7 min could be saved using AI support. Nearly every subgroup of participants had faster reading speeds with the use of AI (PGY-4 was the exception). Overall, 66.7% of pathologists read faster with AI (statistically significantly faster for 33.3%). For pathology trainees, use of AI support resulted in faster reads for 83.3% of residents/registrars (statistically significantly faster for 25.0%) and 83.3% of fellows (all 83.3% statistically significantly faster). Methods to automatically detect mitoses in breast cancer images were introduced in the literature several decades ago [27]. Despite limited access to large digital datasets and prior to the availability of today’s computer processing power, many early image analysis projects demonstrated the feasibility of using computers to assist in counting mitoses [28, 29]. Although some of these first generation algorithms provided promising results, they were not yet suitable for clinical practice. Since then, with the advent of newer technologies including WSI, deep learning methods, graphics processing units and cloud computing we have witnessed a new generation of AI-based algorithms that are able to automate mitosis detection with impressive performance [16, 30–36]. Several international challenges using public datasets catalyzed the development of these sophisticated AI tools [37, 38], including algorithms to predict breast tumor proliferation [39]. The Lunit algorithm utilized in this study to automate mitosis counting in breast carcinoma WSIs integrates three modules: (i) image processing to handle digital slides (e.g. tissue region and patch extraction, region of interest detection, stain normalization), (ii) deep learning mitosis detection network (based on Residual Network or ResNet architecture), and (iii) a proliferation score prediction module [23]. For the Tumor Proliferation Assessment Challenge in 2016 (TUPAC16; http://tupac.tue-image.nl/), Lunit won all tasks including the prediction of mitosis grading. For this specific task their method achieved a Cohen’s kappa score of κ = 0.567, 95% CI [0.464, 0.671] between the predicted scores and the ground truth [17]. In general, mitotic figures are detectable in H&E stained tissue sections due to their hyperchromatic appearance and characteristic shapes. However, it is plausible that mitoses may be missed by humans and/or even AI algorithms due to tissue or imaging artifacts. To address this, using a biomarker such as Phosphorylated Histone H3 (PHH3) may have helped objectively confirm mitotic figures [40, 41]. Even though overall accuracy for readers in the OPT study was determined to be 55.2%, with AI support this was still more sensitive than counting mitotic figures manually. Further, contrary to classifying mitoses into scores 1, 2, and 3 for actual diagnostic purposes, this study was aimed at finding individual mitotic cells in a simulated format, which is expected to have relatively lower performance that could have caused missed or incorrect mitotic figure detection. Davidson et al. have shown that while pathologists’ reproducibility is similar for Nottingham grade using glass slides or WSI, there is still slightly lower intraobserver agreement because grading breast cancer using digital WSI is more challenging [42]. Another limitation of our study was not standardizing the monitors used for annotation and the reader study. However, Norgan et al. showed that manual mitotic figure enumeration by pathologists was not affected by medical-grade versus commercial off-the-shelf displays [43]. In this study we did not equate a glass slide HPF with a digital HPF. Indeed, currently the HPF is typically used in manual microscopy with glass slides when quantifying mitoses (e.g. breast mitoses are evaluated using 10 HPFs at 400x magnification) [44]. However, this HPF at 400x on a glass slide is unlikely to be equivalent to a digital HPF at “40x view” view in a WSI [45]. As verified by this study, expected benefits of adopting AI in pathology practice include automation, elimination of tedious tasks, improved accuracy, and efficiency. Not surprisingly, there is much enthusiasm in pathology regarding the prospect of using AI in routine practice. Interestingly, some of the trainees involved in this study expressed their gratitude for being invited to participate because of the opportunity to experience working with AI first hand. Of course, there is much to still be learned before successfully embedding AI into routine workflows. If AI is indeed more accurate than humans at counting mitoses we will need to determine how this impacts patient outcomes and whether man-made scoring systems may need to be revised. Additional file 1: Table S1. Number of cases based on consensus among ground truth makers. Table S2. Experience level of participants involved in the OPT component of the study. Table S3. Individual accuracy reviewer results for the OPT. Table S4. Individual reviewer TP, FP, FN mitotic cell detection results for the OPT. Table S5. Sensitivity results by experience level and individual reviewer for the OPT. Table S6. Precision results by experience level and individual reviewer for the OPT. Table S7. F-scores by experience level and individual reviewer for the OPT. Table S8. Individual reviewer results for time spent during the OPT.

39 in total

1. Towards semantic-driven high-content image analysis: an operational instantiation for mitosis detection in digital histopathology.

Authors: D Racoceanu; F Capron
Journal: Comput Med Imaging Graph Date: 2014-10-02 Impact factor: 4.790

2. Are we counting mitoses correctly?

Authors: Nuri Yigit; Armagan Gunal; Zafer Kucukodaci; Yildirim Karslioglu; Onder Onguru; Ayhan Ozcan
Journal: Ann Diagn Pathol Date: 2013-06-24 Impact factor: 2.090

3. Maximized Inter-Class Weighted Mean for Fast and Accurate Mitosis Cells Detection in Breast Cancer Histopathology Images.

Authors: Ramin Nateghi; Habibollah Danyali; Mohammad Sadegh Helfroush
Journal: J Med Syst Date: 2017-08-14 Impact factor: 4.460

4. Mitosis detection in breast cancer histology images with deep neural networks.

Authors: Dan C Cireşan; Alessandro Giusti; Luca M Gambardella; Jürgen Schmidhuber
Journal: Med Image Comput Comput Assist Interv Date: 2013

5. Method for counting mitoses by image processing in Feulgen stained breast cancer sections.

Authors: T K ten Kate; J A Beliën; A W Smeulders; J P Baak
Journal: Cytometry Date: 1993

6. Image processing for mitoses in sections of breast cancer: a feasibility study.

Authors: E J Kaman; A W Smeulders; P W Verbeek; I T Young; J P Baak
Journal: Cytometry Date: 1984-05

7. Assessment of algorithms for mitosis detection in breast cancer histopathology images.

Authors: Mitko Veta; Paul J van Diest; Stefan M Willems; Haibo Wang; Anant Madabhushi; Angel Cruz-Roa; Fabio Gonzalez; Anders B L Larsen; Jacob S Vestergaard; Anders B Dahl; Dan C Cireşan; Jürgen Schmidhuber; Alessandro Giusti; Luca M Gambardella; F Boray Tek; Thomas Walter; Ching-Wei Wang; Satoshi Kondo; Bogdan J Matuszewski; Frederic Precioso; Violet Snell; Josef Kittler; Teofilo E de Campos; Adnan M Khan; Nasir M Rajpoot; Evdokia Arkoumani; Miangela M Lacle; Max A Viergever; Josien P W Pluim
Journal: Med Image Anal Date: 2014-11-29 Impact factor: 8.545

8. Performance of 4 Immunohistochemical Phosphohistone H3 Antibodies for Marking Mitotic Figures in Breast Cancer.

Authors: Cornelia M Focke; Kai Finsterbusch; Thomas Decker; Paul J van Diest
Journal: Appl Immunohistochem Mol Morphol Date: 2018-01

9. Agreement in Histological Assessment of Mitotic Activity Between Microscopy and Digital Whole Slide Images Informs Conversion for Clinical Diagnosis.

Authors: Bih-Rong Wei; Charles H Halsey; Shelley B Hoover; Munish Puri; Howard H Yang; Brandon D Gallas; Maxwell P Lee; Weijie Chen; Amy C Durham; Jennifer E Dwyer; Melissa D Sánchez; Ryan P Traslavina; Chad Frank; Charles Bradley; Lawrence D McGill; D Glen Esplin; Paula A Schaffer; Sarah D Cramer; L Tiffany Lyle; Jessica Beck; Elizabeth Buza; Qi Gong; Stephen M Hewitt; R Mark Simpson
Journal: Acad Pathol Date: 2019-07-11

10. Mitosis Counting in Breast Cancer: Object-Level Interobserver Agreement and Comparison to an Automatic Method.

Authors: Mitko Veta; Paul J van Diest; Mehdi Jiwa; Shaimaa Al-Janabi; Josien P W Pluim
Journal: PLoS One Date: 2016-08-16 Impact factor: 3.240

11 in total

Review 1. Artificial intelligence applied to breast pathology.

Authors: Mustafa Yousif; Paul J van Diest; Arvydas Laurinavicius; David Rimm; Jeroen van der Laak; Anant Madabhushi; Stuart Schnitt; Liron Pantanowitz
Journal: Virchows Arch Date: 2021-11-18 Impact factor: 4.064

Review 2. Integrating digital pathology into clinical practice.

Authors: Matthew G Hanna; Orly Ardon; Victor E Reuter; Sahussapont Joseph Sirintrapun; Christine England; David S Klimstra; Meera R Hameed
Journal: Mod Pathol Date: 2021-10-01 Impact factor: 7.842

Review 3. The state of the art for artificial intelligence in lung digital pathology.

Authors: Vidya Sankar Viswanathan; Paula Toro; Germán Corredor; Sanjay Mukhopadhyay; Anant Madabhushi
Journal: J Pathol Date: 2022-06-20 Impact factor: 9.883

4. System for quantitative evaluation of DAB&H-stained breast cancer biopsy digital images (CHISEL).

Authors: Lukasz Roszkowiak; Anna Korzynska; Krzysztof Siemion; Jakub Zak; Dorota Pijanowska; Ramon Bosch; Marylene Lejeune; Carlos Lopez
Journal: Sci Rep Date: 2021-04-29 Impact factor: 4.379

5. The histopathological diagnosis of atypical meningioma: glass slide versus whole slide imaging for grading assessment.

Authors: Serena Ammendola; Elena Bariani; Albino Eccher; Arrigo Capitanio; Claudio Ghimenton; Liron Pantanowitz; Anil Parwani; Ilaria Girolami; Aldo Scarpa; Valeria Barresi
Journal: Virchows Arch Date: 2020-12-10 Impact factor: 4.064

6. GENERATOR Breast DataMart-The Novel Breast Cancer Data Discovery System for Research and Monitoring: Preliminary Results and Future Perspectives.

Authors: Fabio Marazzi; Luca Tagliaferri; Valeria Masiello; Francesca Moschella; Giuseppe Ferdinando Colloca; Barbara Corvari; Alejandro Martin Sanchez; Nikola Dino Capocchiano; Roberta Pastorino; Chiara Iacomini; Jacopo Lenkowicz; Carlotta Masciocchi; Stefano Patarnello; Gianluca Franceschini; Maria Antonietta Gambacorta; Riccardo Masetti; Vincenzo Valentini
Journal: J Pers Med Date: 2021-01-22

7. The Ethics of Artificial Intelligence in Pathology and Laboratory Medicine: Principles and Practice.

Authors: Brian R Jackson; Ye Ye; James M Crawford; Michael J Becich; Somak Roy; Jeffrey R Botkin; Monica E de Baca; Liron Pantanowitz
Journal: Acad Pathol Date: 2021-02-16

8. Computer-assisted mitotic count using a deep learning-based algorithm improves interobserver reproducibility and accuracy.

Authors: Christof A Bertram; Marc Aubreville; Taryn A Donovan; Alexander Bartel; Frauke Wilm; Christian Marzahl; Charles-Antoine Assenmacher; Kathrin Becker; Mark Bennett; Sarah Corner; Brieuc Cossic; Daniela Denk; Martina Dettwiler; Beatriz Garcia Gonzalez; Corinne Gurtner; Ann-Kathrin Haverkamp; Annabelle Heier; Annika Lehmbecker; Sophie Merz; Erica L Noland; Stephanie Plog; Anja Schmidt; Franziska Sebastian; Dodd G Sledge; Rebecca C Smedley; Marco Tecilla; Tuddow Thaiwong; Andrea Fuchs-Baumgartinger; Donald J Meuten; Katharina Breininger; Matti Kiupel; Andreas Maier; Robert Klopfleisch
Journal: Vet Pathol Date: 2021-12-30 Impact factor: 2.221

9. Defining the area of mitoses counting in invasive breast cancer using whole slide image.

Authors: Asmaa Ibrahim; Ayat G Lashen; Ayaka Katayama; Raluca Mihai; Graham Ball; Michael S Toss; Emad A Rakha
Journal: Mod Pathol Date: 2021-12-11 Impact factor: 8.209

10. Artificial Intelligence-Powered Spatial Analysis of Tumor-Infiltrating Lymphocytes as Complementary Biomarker for Immune Checkpoint Inhibition in Non-Small-Cell Lung Cancer.

Authors: Sehhoon Park; Chan-Young Ock; Hyojin Kim; Sergio Pereira; Seonwook Park; Minuk Ma; Sangjoon Choi; Seokhwi Kim; Seunghwan Shin; Brian Jaehong Aum; Kyunghyun Paeng; Donggeun Yoo; Hongui Cha; Sunyoung Park; Koung Jin Suh; Hyun Ae Jung; Se Hyun Kim; Yu Jung Kim; Jong-Mu Sun; Jin-Haeng Chung; Jin Seok Ahn; Myung-Ju Ahn; Jong Seok Lee; Keunchil Park; Sang Yong Song; Yung-Jue Bang; Yoon-La Choi; Tony S Mok; Se-Hoon Lee
Journal: J Clin Oncol Date: 2022-03-10 Impact factor: 50.717