BACKGROUND: Artificial intelligence (AI) systems performing at radiologist-like levels in the evaluation of digital mammography (DM) would improve breast cancer screening accuracy and efficiency. We aimed to compare the stand-alone performance of an AI system to that of radiologists in detecting breast cancer in DM. METHODS: Nine multi-reader, multi-case study datasets previously used for different research purposes in seven countries were collected. Each dataset consisted of DM exams acquired with systems from four different vendors, multiple radiologists' assessments per exam, and ground truth verified by histopathological analysis or follow-up, yielding a total of 2652 exams (653 malignant) and interpretations by 101 radiologists (28 296 independent interpretations). An AI system analyzed these exams yielding a level of suspicion of cancer present between 1 and 10. The detection performance between the radiologists and the AI system was compared using a noninferiority null hypothesis at a margin of 0.05. RESULTS: The performance of the AI system was statistically noninferior to that of the average of the 101 radiologists. The AI system had a 0.840 (95% confidence interval [CI] = 0.820 to 0.860) area under the ROC curve and the average of the radiologists was 0.814 (95% CI = 0.787 to 0.841) (difference 95% CI = -0.003 to 0.055). The AI system had an AUC higher than 61.4% of the radiologists. CONCLUSIONS: The evaluated AI system achieved a cancer detection accuracy comparable to an average breast radiologist in this retrospective setting. Although promising, the performance and impact of such a system in a screening setting needs further investigation.
BACKGROUND: Artificial intelligence (AI) systems performing at radiologist-like levels in the evaluation of digital mammography (DM) would improve breast cancer screening accuracy and efficiency. We aimed to compare the stand-alone performance of an AI system to that of radiologists in detecting breast cancer in DM. METHODS: Nine multi-reader, multi-case study datasets previously used for different research purposes in seven countries were collected. Each dataset consisted of DM exams acquired with systems from four different vendors, multiple radiologists' assessments per exam, and ground truth verified by histopathological analysis or follow-up, yielding a total of 2652 exams (653 malignant) and interpretations by 101 radiologists (28 296 independent interpretations). An AI system analyzed these exams yielding a level of suspicion of cancer present between 1 and 10. The detection performance between the radiologists and the AI system was compared using a noninferiority null hypothesis at a margin of 0.05. RESULTS: The performance of the AI system was statistically noninferior to that of the average of the 101 radiologists. The AI system had a 0.840 (95% confidence interval [CI] = 0.820 to 0.860) area under the ROC curve and the average of the radiologists was 0.814 (95% CI = 0.787 to 0.841) (difference 95% CI = -0.003 to 0.055). The AI system had an AUC higher than 61.4% of the radiologists. CONCLUSIONS: The evaluated AI system achieved a cancer detection accuracy comparable to an average breast radiologist in this retrospective setting. Although promising, the performance and impact of such a system in a screening setting needs further investigation.
Authors: William E Barlow; Chen Chi; Patricia A Carney; Stephen H Taplin; Carl D'Orsi; Gary Cutter; R Edward Hendrick; Joann G Elmore Journal: J Natl Cancer Inst Date: 2004-12-15 Impact factor: 13.506
Authors: Joshua J Fenton; Stephen H Taplin; Patricia A Carney; Linn Abraham; Edward A Sickles; Carl D'Orsi; Eric A Berns; Gary Cutter; R Edward Hendrick; William E Barlow; Joann G Elmore Journal: N Engl J Med Date: 2007-04-05 Impact factor: 91.245
Authors: Aneesa S Majid; Ellen Shaw de Paredes; Richard D Doherty; Neil R Sharma; Xavier Salvador Journal: Radiographics Date: 2003 Jul-Aug Impact factor: 5.333
Authors: M J M Broeders; N C Onland-Moret; H J T M Rijken; J H C L Hendriks; A L M Verbeek; R Holland Journal: Eur J Cancer Date: 2003-08 Impact factor: 9.162
Authors: David Gur; Andriy I Bandos; Cathy S Cohen; Christiane M Hakim; Lara A Hardesty; Marie A Ganott; Ronald L Perrin; William R Poller; Ratan Shah; Jules H Sumkin; Luisa P Wallace; Howard E Rockette Journal: Radiology Date: 2008-08-05 Impact factor: 11.105
Authors: Fiona J Gilbert; Susan M Astley; Maureen G C Gillan; Olorunsola F Agbaje; Matthew G Wallis; Jonathan James; Caroline R M Boggis; Stephen W Duffy Journal: N Engl J Med Date: 2008-10-01 Impact factor: 91.245
Authors: Andreas Kleppe; Ole-Johan Skrede; Sepp De Raedt; Knut Liestøl; David J Kerr; Håvard E Danielsen Journal: Nat Rev Cancer Date: 2021-01-29 Impact factor: 60.716
Authors: Peter A Noseworthy; Zachi I Attia; LaPrincess C Brewer; Sharonne N Hayes; Xiaoxi Yao; Suraj Kapa; Paul A Friedman; Francisco Lopez-Jimenez Journal: Circ Arrhythm Electrophysiol Date: 2020-02-16