Mattie Salim1,2, Erik Wåhlin3, Karin Dembrower4,5, Edward Azavedo1,6, Theodoros Foukakis1,2, Yue Liu7, Kevin Smith8, Martin Eklund9, Fredrik Strand1,10. 1. Department of Oncology-Pathology, Karolinska Institute, Stockholm, Sweden. 2. Department of Radiology, Karolinska University Hospital, Stockholm, Sweden. 3. Department of Medical Radiation Physics and Nuclear Medicine, Karolinska University Hospital, Stockholm, Sweden. 4. Department of Physiology and Pharmacology, Karolinska Institute, Stockholm, Sweden. 5. Department of Radiology, Capio Sankt Görans Hospital, Stockholm, Sweden. 6. Department of Molecular Medicine and Surgery, Karolinska Institute, Stockholm, Sweden. 7. Division of Computational Science and Technology, KTH Royal Institute of Technology, Science for Life Laboratory, Solna, Sweden. 8. KTH Royal Institute of Technology, Science for Life Laboratory, Solna, Sweden. 9. Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden. 10. Breast Radiology, Karolinska University Hospital, Stockholm, Sweden.
Abstract
Importance: A computer algorithm that performs at or above the level of radiologists in mammography screening assessment could improve the effectiveness of breast cancer screening. Objective: To perform an external evaluation of 3 commercially available artificial intelligence (AI) computer-aided detection algorithms as independent mammography readers and to assess the screening performance when combined with radiologists. Design, Setting, and Participants: This retrospective case-control study was based on a double-reader population-based mammography screening cohort of women screened at an academic hospital in Stockholm, Sweden, from 2008 to 2015. The study included 8805 women aged 40 to 74 years who underwent mammography screening and who did not have implants or prior breast cancer. The study sample included 739 women who were diagnosed as having breast cancer (positive) and a random sample of 8066 healthy controls (negative for breast cancer). Main Outcomes and Measures: Positive follow-up findings were determined by pathology-verified diagnosis at screening or within 12 months thereafter. Negative follow-up findings were determined by a 2-year cancer-free follow-up. Three AI computer-aided detection algorithms (AI-1, AI-2, and AI-3), sourced from different vendors, yielded a continuous score for the suspicion of cancer in each mammography examination. For a decision of normal or abnormal, the cut point was defined by the mean specificity of the first-reader radiologists (96.6%). Results: The median age of study participants was 60 years (interquartile range, 50-66 years) for 739 women who received a diagnosis of breast cancer and 54 years (interquartile range, 47-63 years) for 8066 healthy controls. The cases positive for cancer comprised 618 (84%) screen detected and 121 (16%) clinically detected within 12 months of the screening examination. The area under the receiver operating curve for cancer detection was 0.956 (95% CI, 0.948-0.965) for AI-1, 0.922 (95% CI, 0.910-0.934) for AI-2, and 0.920 (95% CI, 0.909-0.931) for AI-3. At the specificity of the radiologists, the sensitivities were 81.9% for AI-1, 67.0% for AI-2, 67.4% for AI-3, 77.4% for first-reader radiologist, and 80.1% for second-reader radiologist. Combining AI-1 with first-reader radiologists achieved 88.6% sensitivity at 93.0% specificity (abnormal defined by either of the 2 making an abnormal assessment). No other examined combination of AI algorithms and radiologists surpassed this sensitivity level. Conclusions and Relevance: To our knowledge, this study is the first independent evaluation of several AI computer-aided detection algorithms for screening mammography. The results of this study indicated that a commercially available AI computer-aided detection algorithm can assess screening mammograms with a sufficient diagnostic performance to be further evaluated as an independent reader in prospective clinical trials. Combining the first readers with the best algorithm identified more cases positive for cancer than combining the first readers with second readers.
Importance: A computer algorithm that performs at or above the level of radiologists in mammography screening assessment could improve the effectiveness of breast cancer screening. Objective: To perform an external evaluation of 3 commercially available artificial intelligence (AI) computer-aided detection algorithms as independent mammography readers and to assess the screening performance when combined with radiologists. Design, Setting, and Participants: This retrospective case-control study was based on a double-reader population-based mammography screening cohort of women screened at an academic hospital in Stockholm, Sweden, from 2008 to 2015. The study included 8805 women aged 40 to 74 years who underwent mammography screening and who did not have implants or prior breast cancer. The study sample included 739 women who were diagnosed as having breast cancer (positive) and a random sample of 8066 healthy controls (negative for breast cancer). Main Outcomes and Measures: Positive follow-up findings were determined by pathology-verified diagnosis at screening or within 12 months thereafter. Negative follow-up findings were determined by a 2-year cancer-free follow-up. Three AI computer-aided detection algorithms (AI-1, AI-2, and AI-3), sourced from different vendors, yielded a continuous score for the suspicion of cancer in each mammography examination. For a decision of normal or abnormal, the cut point was defined by the mean specificity of the first-reader radiologists (96.6%). Results: The median age of study participants was 60 years (interquartile range, 50-66 years) for 739 women who received a diagnosis of breast cancer and 54 years (interquartile range, 47-63 years) for 8066 healthy controls. The cases positive for cancer comprised 618 (84%) screen detected and 121 (16%) clinically detected within 12 months of the screening examination. The area under the receiver operating curve for cancer detection was 0.956 (95% CI, 0.948-0.965) for AI-1, 0.922 (95% CI, 0.910-0.934) for AI-2, and 0.920 (95% CI, 0.909-0.931) for AI-3. At the specificity of the radiologists, the sensitivities were 81.9% for AI-1, 67.0% for AI-2, 67.4% for AI-3, 77.4% for first-reader radiologist, and 80.1% for second-reader radiologist. Combining AI-1 with first-reader radiologists achieved 88.6% sensitivity at 93.0% specificity (abnormal defined by either of the 2 making an abnormal assessment). No other examined combination of AI algorithms and radiologists surpassed this sensitivity level. Conclusions and Relevance: To our knowledge, this study is the first independent evaluation of several AI computer-aided detection algorithms for screening mammography. The results of this study indicated that a commercially available AI computer-aided detection algorithm can assess screening mammograms with a sufficient diagnostic performance to be further evaluated as an independent reader in prospective clinical trials. Combining the first readers with the best algorithm identified more cases positive for cancer than combining the first readers with second readers.
Authors: William E Barlow; Chen Chi; Patricia A Carney; Stephen H Taplin; Carl D'Orsi; Gary Cutter; R Edward Hendrick; Joann G Elmore Journal: J Natl Cancer Inst Date: 2004-12-15 Impact factor: 13.506
Authors: Alejandro Rodriguez-Ruiz; Kristina Lång; Albert Gubern-Merida; Mireille Broeders; Gisella Gennaro; Paola Clauser; Thomas H Helbich; Margarita Chevalier; Tao Tan; Thomas Mertelmeier; Matthew G Wallis; Ingvar Andersson; Sophia Zackrisson; Ritse M Mann; Ioannis Sechopoulos Journal: J Natl Cancer Inst Date: 2019-09-01 Impact factor: 13.506
Authors: Elizabeth S McDonald; Andrew Oustimov; Susan P Weinstein; Marie B Synnestvedt; Mitchell Schnall; Emily F Conant Journal: JAMA Oncol Date: 2016-06-01 Impact factor: 31.777
Authors: Joann G Elmore; Sara L Jackson; Linn Abraham; Diana L Miglioretti; Patricia A Carney; Berta M Geller; Bonnie C Yankaskas; Karla Kerlikowske; Tracy Onega; Robert D Rosenberg; Edward A Sickles; Diana S M Buist Journal: Radiology Date: 2009-10-28 Impact factor: 11.105
Authors: Constance D Lehman; Robert D Wellman; Diana S M Buist; Karla Kerlikowske; Anna N A Tosteson; Diana L Miglioretti Journal: JAMA Intern Med Date: 2015-11 Impact factor: 21.873
Authors: Scott Mayer McKinney; Marcin Sieniek; Varun Godbole; Jonathan Godwin; Natasha Antropova; Hutan Ashrafian; Trevor Back; Mary Chesus; Greg S Corrado; Ara Darzi; Mozziyar Etemadi; Florencia Garcia-Vicente; Fiona J Gilbert; Mark Halling-Brown; Demis Hassabis; Sunny Jansen; Alan Karthikesalingam; Christopher J Kelly; Dominic King; Joseph R Ledsam; David Melnick; Hormuz Mostofi; Lily Peng; Joshua Jay Reicher; Bernardino Romera-Paredes; Richard Sidebottom; Mustafa Suleyman; Daniel Tse; Kenneth C Young; Jeffrey De Fauw; Shravya Shetty Journal: Nature Date: 2020-01-01 Impact factor: 49.962
Authors: Tara A Retson; Kyle A Hasenstab; Seth J Kligerman; Kathleen E Jacobs; Andrew C Yen; Sharon S Brouha; Lewis D Hahn; Albert Hsiao Journal: Radiol Artif Intell Date: 2021-11-10
Authors: Sarah E Hickman; Ramona Woitek; Elizabeth Phuong Vi Le; Yu Ri Im; Carina Mouritsen Luxhøj; Angelica I Aviles-Rivero; Gabrielle C Baxter; James W MacKay; Fiona J Gilbert Journal: Radiology Date: 2021-10-19 Impact factor: 11.105
Authors: Anna W Anderson; M Luke Marinovich; Nehmat Houssami; Kathryn P Lowry; Joann G Elmore; Diana S M Buist; Solveig Hofvind; Christoph I Lee Journal: J Am Coll Radiol Date: 2022-01-20 Impact factor: 5.532