Babak Ehteshami Bejnordi1, Mitko Veta2, Paul Johannes van Diest3, Bram van Ginneken1, Nico Karssemeijer1, Geert Litjens4, Jeroen A W M van der Laak4, Meyke Hermsen4, Quirine F Manson3, Maschenka Balkenhol4, Oscar Geessink4,5, Nikolaos Stathonikos3, Marcory Crf van Dijk6, Peter Bult4, Francisco Beca7, Andrew H Beck7,8, Dayong Wang7,8, Aditya Khosla8,9, Rishab Gargeya10, Humayun Irshad7, Aoxiao Zhong11, Qi Dou11,12, Quanzheng Li11, Hao Chen12, Huang-Jing Lin12, Pheng-Ann Heng12, Christian Haß13, Elia Bruni13, Quincy Wong14, Ugur Halici15,16, Mustafa Ümit Öner15, Rengul Cetin-Atalay17, Matt Berseth18, Vitali Khvatkov19, Alexei Vylegzhanin19, Oren Kraus20, Muhammad Shaban21, Nasir Rajpoot21,22, Ruqayya Awan23, Korsuk Sirinukunwattana21, Talha Qaiser21, Yee-Wah Tsang22, David Tellez4, Jonas Annuscheit24, Peter Hufnagl24, Mira Valkonen25, Kimmo Kartasalo24,26, Leena Latonen27, Pekka Ruusuvuori24,28, Kaisa Liimatainen24, Shadi Albarqouni29, Bharti Mungal29, Ami George29, Stefanie Demirci29, Nassir Navab29, Seiryo Watanabe30, Shigeto Seno30, Yoichi Takenaka30, Hideo Matsuda30, Hady Ahmady Phoulady31, Vassili Kovalev32, Alexander Kalinovsky32, Vitali Liauchuk32, Gloria Bueno33, M Milagro Fernandez-Carrobles33, Ismael Serrano33, Oscar Deniz33, Daniel Racoceanu34,35, Rui Venâncio36. 1. Diagnostic Image Analysis Group, Department of Radiology and Nuclear Medicine, Radboud University Medical Center, Nijmegen, the Netherlands. 2. Medical Image Analysis Group, Eindhoven University of Technology, Eindhoven, the Netherlands. 3. Department of Pathology, University Medical Center Utrecht, Utrecht, the Netherlands. 4. Department of Pathology, Radboud University Medical Center, Nijmegen, the Netherlands. 5. Laboratorium Pathologie Oost Nederland, Hengelo, the Netherlands. 6. Rijnstate Hospital, Arnhem, the Netherlands. 7. BeckLab, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts. 8. PathAI, Cambridge, Massachusetts. 9. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts. 10. Harker School, San Jose, California. 11. Center for Clinical Data Science, Gordon Center for Medical Imaging, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts. 12. Chinese University of Hong Kong, Hong Kong, China. 13. ExB Research and Development GmbH, Munich, Germany. 14. Munich Business School, Munich, Germany. 15. Department of Electrical and Electronics Engineering, Middle East Technical University, Ankara, Turkey. 16. Neuroscience and Neurotechnology, Graduate School of Natural and Applied Sciences, Middle East Technical University, Ankara, Turkey. 17. Cancer System Biology Laboratory, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey. 18. NLP LOGIX, Jacksonville, Florida. 19. Smart Imaging Technologies, Houston, Texas. 20. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada. 21. Tissue Image Analytics Lab, Department of Computer Science, University of Warwick, Coventry, United Kingdom. 22. Department of Pathology, University Hospitals Coventry and Warwickshire National Health Service Foundation Trust, Coventry, United Kingdom. 23. Department of Computer Science and Engineering, Qatar University, Doha, Qatar. 24. Hochschule für Technik und Wirtschaft, Berlin, Germany. 25. BioMediTech Institute and Faculty of Medicine and Life Sciences, Tampere University of Technology, Tampere, Finland. 26. BioMediTech Institute and Faculty of Biomedical Science and Engineering, Tampere University of Technology, Tampere, Finland. 27. Prostate Cancer Research Center, Faculty of Medicine and Life Sciences and BioMediTech, University of Tampere, Tampere, Finland. 28. Faculty of Computing and Electrical Engineering, Tampere University of Technology, Pori, Finland. 29. Technical University of Munich, Munich, Germany. 30. Department of Bioinformatic Engineering, Osaka University. 31. University of South Florida, Tampa, Florida. 32. Biomedical Image Analysis Department, United Institute of Informatics Problems, Belarus National Academy of Sciences, Minsk, Belarus. 33. Visilab, University of Castilla-La Mancha, Ciudad Real, Spain. 34. INSERM, Laboratoire d'Imagerie Biomédicale, Sorbonne Universiteś, Pierre and Marie Curie University, Paris, France. 35. Pontifical Catholic University of Peru, San Miguel, Lima, Peru. 36. Sorbonne University, Pierre and Marie Curie University, Paris, France.
Abstract
Importance: Application of deep learning algorithms to whole-slide pathology images can potentially improve diagnostic accuracy and efficiency. Objective: Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin-stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists' diagnoses in a diagnostic setting. Design, Setting, and Participants: Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC). Exposures: Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation. Main Outcomes and Measures: The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor. Results: The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P < .001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95% CI, 0.927-0.998] for the pathologist WOTC). Conclusions and Relevance: In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.
Importance: Application of deep learning algorithms to whole-slide pathology images can potentially improve diagnostic accuracy and efficiency. Objective: Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin-stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists' diagnoses in a diagnostic setting. Design, Setting, and Participants: Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC). Exposures: Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation. Main Outcomes and Measures: The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor. Results: The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P < .001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95% CI, 0.927-0.998] for the pathologist WOTC). Conclusions and Relevance: In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.
Authors: Nancy A Obuchowski; Sergey V Beiden; Kevin S Berbaum; Stephen L Hillis; Hemant Ishwaran; Hae Hiang Song; Robert F Wagner Journal: Acad Radiol Date: 2004-09 Impact factor: 3.173
Authors: Babak Ehteshami Bejnordi; Geert Litjens; Nadya Timofeeva; Irene Otte-Höller; André Homeyer; Nico Karssemeijer; Jeroen A W M van der Laak Journal: IEEE Trans Med Imaging Date: 2015-09-04 Impact factor: 10.048
Authors: Andre Esteva; Brett Kuprel; Roberto A Novoa; Justin Ko; Susan M Swetter; Helen M Blau; Sebastian Thrun Journal: Nature Date: 2017-01-25 Impact factor: 49.962
Authors: Anees Chagpar; Lavinia P Middleton; Aysegul A Sahin; Funda Meric-Bernstam; Henry M Kuerer; Barry W Feig; Merrick I Ross; Frederick C Ames; S Eva Singletary; Thomas A Buchholz; Vincente Valero; Kelly K Hunt Journal: Cancer Date: 2005-04-15 Impact factor: 6.860
Authors: Jennifer Reed; Martin Rosman; Kathryn M Verbanac; Ann Mannie; Zandra Cheng; Lorraine Tafra Journal: J Am Coll Surg Date: 2008-12-25 Impact factor: 6.113
Authors: Geert Litjens; Clara I Sánchez; Nadya Timofeeva; Meyke Hermsen; Iris Nagtegaal; Iringo Kovacs; Christina Hulsbergen-van de Kaa; Peter Bult; Bram van Ginneken; Jeroen van der Laak Journal: Sci Rep Date: 2016-05-23 Impact factor: 4.379
Authors: Andrew A Borkowski; Catherine P Wilson; Steven A Borkowski; L Brannon Thomas; Lauren A Deland; Stefanie J Grewe; Stephen M Mastorides Journal: Fed Pract Date: 2019-10
Authors: Samuel Ortega; Martin Halicek; Himar Fabelo; Raul Guerra; Carlos Lopez; Marylene Lejaune; Fred Godtliebsen; Gustavo M Callico; Baowei Fei Journal: Proc SPIE Int Soc Opt Eng Date: 2020-03-16
Authors: Pier-Luc Plante; Élina Francovic-Fontaine; Jody C May; John A McLean; Erin S Baker; François Laviolette; Mario Marchand; Jacques Corbeil Journal: Anal Chem Date: 2019-04-01 Impact factor: 6.986
Authors: Martin Halicek; Maysam Shahedi; James V Little; Amy Y Chen; Larry L Myers; Baran D Sumer; Baowei Fei Journal: Proc SPIE Int Soc Opt Eng Date: 2019-03-18
Authors: Cathy A Goldstein; Richard B Berry; David T Kent; David A Kristo; Azizi A Seixas; Susan Redline; M Brandon Westover Journal: J Clin Sleep Med Date: 2020-04-15 Impact factor: 4.062
Authors: Diane C Lim; Diego R Mazzotti; Kate Sutherland; Jesse W Mindel; Jinyoung Kim; Peter A Cistulli; Ulysses J Magalang; Allan I Pack; Philip de Chazal; Thomas Penzel Journal: Sleep Med Rev Date: 2020-03-20 Impact factor: 11.609