Anna W Anderson1, M Luke Marinovich2, Nehmat Houssami3, Kathryn P Lowry1, Joann G Elmore4, Diana S M Buist5, Solveig Hofvind6, Christoph I Lee7. 1. Department of Radiology, University of Washington School of Medicine, Seattle, Washington. 2. Curtin School of Population Health, Curtin University, Bentley, Australia. 3. The Daffodil Centre, University of Sydney, a joint venture with Cancer Council NSW, Sydney, Australia; NBCF Chair in Breast Cancer Prevention, University of Sydney, Sydney, Australia; Coeditor, The Breast. 4. David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, California; Director, UCLA's National Clinician Scholars Program, University of California, Los Angeles, Los Angeles, California; Editor-in-Chief of Adult Primary Care, UpToDate. 5. Kaiser Permanente Washington Health Research Institute, Seattle, Washington; Director of Research and Strategic Partnerships, Kaiser Permanente Washington Health Research Institute, Seattle, Washington. 6. Section Head of Breast Cancer Screening, Cancer Registry of Norway, Oslo, Norway. 7. Department of Radiology, University of Washington School of Medicine, Seattle, Washington; Director, Northwest Screening and Cancer Outcomes Research Enterprise, University of Washington, Seattle, Washington; Deputy Editor, JACR. Electronic address: stophlee@uw.edu.
Abstract
PURPOSE: The aim of this study was to describe the current state of science regarding independent external validation of artificial intelligence (AI) technologies for screening mammography. METHODS: A systematic review was performed across five databases (Embase, PubMed, IEEE Explore, Engineer Village, and arXiv) through December 10, 2020. Studies that used screening examinations from real-world settings to externally validate AI algorithms for mammographic cancer detection were included. The main outcome was diagnostic accuracy, defined by area under the receiver operating characteristic curve (AUC). Performance was also compared between radiologists and either stand-alone AI or combined radiologist and AI interpretation. Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 tool. RESULTS: After data extraction, 13 studies met the inclusion criteria (148,361 total patients). Most studies (77% [n = 10]) evaluated commercially available AI algorithms. Studies included retrospective reader studies (46% [n = 6]), retrospective simulation studies (38% [n = 5]), or both (15% [n = 2]). Across 5 studies comparing stand-alone AI with radiologists, 60% (n = 3) demonstrated improved accuracy with AI (AUC improvement range, 0.02-0.13). All 5 studies comparing combined radiologist and AI interpretation with radiologists alone demonstrated improved accuracy with AI (AUC improvement range, 0.028-0.115). Most studies had risk for bias or applicability concerns for patient selection (69% [n = 9]) and the reference standard (69% [n = 9]). Only two studies obtained ground-truth cancer outcomes through regional cancer registry linkage. CONCLUSIONS: To date, external validation efforts for AI screening mammographic technologies suggest small potential diagnostic accuracy improvements but have been retrospective in nature and suffer from risk for bias and applicability concerns.
PURPOSE: The aim of this study was to describe the current state of science regarding independent external validation of artificial intelligence (AI) technologies for screening mammography. METHODS: A systematic review was performed across five databases (Embase, PubMed, IEEE Explore, Engineer Village, and arXiv) through December 10, 2020. Studies that used screening examinations from real-world settings to externally validate AI algorithms for mammographic cancer detection were included. The main outcome was diagnostic accuracy, defined by area under the receiver operating characteristic curve (AUC). Performance was also compared between radiologists and either stand-alone AI or combined radiologist and AI interpretation. Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 tool. RESULTS: After data extraction, 13 studies met the inclusion criteria (148,361 total patients). Most studies (77% [n = 10]) evaluated commercially available AI algorithms. Studies included retrospective reader studies (46% [n = 6]), retrospective simulation studies (38% [n = 5]), or both (15% [n = 2]). Across 5 studies comparing stand-alone AI with radiologists, 60% (n = 3) demonstrated improved accuracy with AI (AUC improvement range, 0.02-0.13). All 5 studies comparing combined radiologist and AI interpretation with radiologists alone demonstrated improved accuracy with AI (AUC improvement range, 0.028-0.115). Most studies had risk for bias or applicability concerns for patient selection (69% [n = 9]) and the reference standard (69% [n = 9]). Only two studies obtained ground-truth cancer outcomes through regional cancer registry linkage. CONCLUSIONS: To date, external validation efforts for AI screening mammographic technologies suggest small potential diagnostic accuracy improvements but have been retrospective in nature and suffer from risk for bias and applicability concerns.
Authors: Matthew D F McInnes; David Moher; Brett D Thombs; Trevor A McGrath; Patrick M Bossuyt; Tammy Clifford; Jérémie F Cohen; Jonathan J Deeks; Constantine Gatsonis; Lotty Hooft; Harriet A Hunt; Christopher J Hyde; Daniël A Korevaar; Mariska M G Leeflang; Petra Macaskill; Johannes B Reitsma; Rachel Rodin; Anne W S Rutjes; Jean-Paul Salameh; Adrienne Stevens; Yemisi Takwoingi; Marcello Tonelli; Laura Weeks; Penny Whiting; Brian H Willis Journal: JAMA Date: 2018-01-23 Impact factor: 56.272
Authors: Penny F Whiting; Anne W S Rutjes; Marie E Westwood; Susan Mallett; Jonathan J Deeks; Johannes B Reitsma; Mariska M G Leeflang; Jonathan A C Sterne; Patrick M M Bossuyt Journal: Ann Intern Med Date: 2011-10-18 Impact factor: 25.391
Authors: Emily F Conant; Alicia Y Toledano; Senthil Periaswamy; Sergei V Fotin; Jonathan Go; Justin E Boatsman; Jeffrey W Hoffmeister Journal: Radiol Artif Intell Date: 2019-07-31
Authors: Serena Pacilè; January Lopez; Pauline Chone; Thomas Bertinotti; Jean Marie Grouin; Pierre Fillard Journal: Radiol Artif Intell Date: 2020-11-04
Authors: Alyssa T Watanabe; Vivian Lim; Hoanh X Vu; Richard Chim; Eric Weise; Jenna Liu; William G Bradley; Christopher E Comstock Journal: J Digit Imaging Date: 2019-08 Impact factor: 4.056
Authors: Thomas Schaffter; Diana S M Buist; Christoph I Lee; Yaroslav Nikulin; Dezso Ribli; Yuanfang Guan; William Lotter; Zequn Jie; Hao Du; Sijia Wang; Jiashi Feng; Mengling Feng; Hyo-Eun Kim; Francisco Albiol; Alberto Albiol; Stephen Morrell; Zbigniew Wojna; Mehmet Eren Ahsen; Umar Asif; Antonio Jimeno Yepes; Shivanthan Yohanandan; Simona Rabinovici-Cohen; Darvin Yi; Bruce Hoff; Thomas Yu; Elias Chaibub Neto; Daniel L Rubin; Peter Lindholm; Laurie R Margolies; Russell Bailey McBride; Joseph H Rothstein; Weiva Sieh; Rami Ben-Ari; Stefan Harrer; Andrew Trister; Stephen Friend; Thea Norman; Berkman Sahiner; Fredrik Strand; Justin Guinney; Gustavo Stolovitzky; Lester Mackey; Joyce Cahoon; Li Shen; Jae Ho Sohn; Hari Trivedi; Yiqiu Shen; Ljubomir Buturovic; Jose Costa Pereira; Jaime S Cardoso; Eduardo Castro; Karl Trygve Kalleberg; Obioma Pelka; Imane Nedjar; Krzysztof J Geras; Felix Nensa; Ethan Goan; Sven Koitka; Luis Caballero; David D Cox; Pavitra Krishnaswamy; Gaurav Pandey; Christoph M Friedrich; Dimitri Perrin; Clinton Fookes; Bibo Shi; Gerard Cardoso Negrie; Michael Kawczynski; Kyunghyun Cho; Can Son Khoo; Joseph Y Lo; A Gregory Sorensen; Hwejin Jung Journal: JAMA Netw Open Date: 2020-03-02
Authors: Karoline Freeman; Julia Geppert; Chris Stinton; Daniel Todkill; Samantha Johnson; Aileen Clarke; Sian Taylor-Phillips Journal: BMJ Date: 2021-09-01