Shadi Ebrahimian1, Mannudeep K Kalra1, Sheela Agarwal2, Bernardo C Bizzo3, Mona Elkholy4, Christoph Wald5, Bibb Allen6, Keith J Dreyer7. 1. Department of Radiology, Massachusetts General Hospital and Harvard Medical School, 25 New Chardon Street, Boston, MA 02114. 2. Lenox Hill Radiology, New York, New York; ACR Data Science Institute, Reston, Virginia. 3. Department of Radiology, Massachusetts General Hospital and Harvard Medical School, 25 New Chardon Street, Boston, MA 02114; MGH & BWH Center for Clinical Data Science, Boston, Massachusetts. 4. ACR Data Science Institute, Reston, Virginia. 5. Department of Radiology, Lahey Hospital & Medical Center, Burlington, MA and Tufts University Medical School, Boston, Massachusetts. 6. Department of Radiology, Grandview Medical Center, Birmingham, Alabama. 7. Department of Radiology, Massachusetts General Hospital and Harvard Medical School, 25 New Chardon Street, Boston, MA 02114; MGH & BWH Center for Clinical Data Science, Boston, Massachusetts. Electronic address: kdreyer@partners.org.
Abstract
RATIONALE AND OBJECTIVES: To assess key trends, strengths, and gaps in validation studies of the Food and Drug Administration (FDA)-regulated imaging-based artificial intelligence/machine learning (AI/ML) algorithms. MATERIALS AND METHODS: We audited publicly available details of regulated AI/ML algorithms in imaging from 2008 until April 2021. We reviewed 127 regulated software (118 AI/ML) to classify information related to their parent company, subspecialty, body area and specific anatomy type, imaging modality, date of FDA clearance, indications for use, target pathology (such as trauma) and findings (such as fracture), technique (CAD triage, CAD detection and/or characterization, CAD acquisition or improvement, and image processing/quantification), product performance, presence, type, strength and availability of clinical validation data. Pertaining to validation data, where available, we recorded the number of patients or studies included, sensitivity, specificity, accuracy, and/or receiver operating characteristic area under the curve, along with information on ground-truthing of use-cases. Data were analyzed with pivot tables and charts for descriptive statistics and trends. RESULTS: We noted an increasing number of FDA-regulated AI/ML from 2008 to 2021. Seventeen (17/118) regulated AI/ML algorithms posted no validation claims or data. Just 9/118 reviewed AI/ML algorithms had a validation dataset sizes of over 1000 patients. The most common type of AI/ML included image processing/quantification (IPQ; n = 59/118), and triage (CADt; n = 27/118). Brain, breast, and lungs dominated the targeted body regions of interest. CONCLUSION: Insufficient public information on validation datasets in several FDA-regulated AI/ML algorithms makes it difficult to justify clinical applications since their generalizability and presence of bias cannot be inferred.
RATIONALE AND OBJECTIVES: To assess key trends, strengths, and gaps in validation studies of the Food and Drug Administration (FDA)-regulated imaging-based artificial intelligence/machine learning (AI/ML) algorithms. MATERIALS AND METHODS: We audited publicly available details of regulated AI/ML algorithms in imaging from 2008 until April 2021. We reviewed 127 regulated software (118 AI/ML) to classify information related to their parent company, subspecialty, body area and specific anatomy type, imaging modality, date of FDA clearance, indications for use, target pathology (such as trauma) and findings (such as fracture), technique (CAD triage, CAD detection and/or characterization, CAD acquisition or improvement, and image processing/quantification), product performance, presence, type, strength and availability of clinical validation data. Pertaining to validation data, where available, we recorded the number of patients or studies included, sensitivity, specificity, accuracy, and/or receiver operating characteristic area under the curve, along with information on ground-truthing of use-cases. Data were analyzed with pivot tables and charts for descriptive statistics and trends. RESULTS: We noted an increasing number of FDA-regulated AI/ML from 2008 to 2021. Seventeen (17/118) regulated AI/ML algorithms posted no validation claims or data. Just 9/118 reviewed AI/ML algorithms had a validation dataset sizes of over 1000 patients. The most common type of AI/ML included image processing/quantification (IPQ; n = 59/118), and triage (CADt; n = 27/118). Brain, breast, and lungs dominated the targeted body regions of interest. CONCLUSION: Insufficient public information on validation datasets in several FDA-regulated AI/ML algorithms makes it difficult to justify clinical applications since their generalizability and presence of bias cannot be inferred.
Authors: Sara Merkaj; Ryan C Bahar; Tal Zeevi; MingDe Lin; Ichiro Ikuta; Khaled Bousabarah; Gabriel I Cassinelli Petersen; Lawrence Staib; Seyedmehdi Payabvash; John T Mongan; Soonmee Cha; Mariam S Aboian Journal: Cancers (Basel) Date: 2022-05-25 Impact factor: 6.575
Authors: Bernardo C Bizzo; Shadi Ebrahimian; Mark E Walters; Mark H Michalski; Katherine P Andriole; Keith J Dreyer; Mannudeep K Kalra; Tarik Alkasab; Subba R Digumarthy Journal: PLoS One Date: 2022-04-29 Impact factor: 3.240
Authors: Scott Monteith; Tasha Glenn; John Geddes; Peter C Whybrow; Eric Achtyes; Michael Bauer Journal: Curr Psychiatry Rep Date: 2022-10-10 Impact factor: 8.081
Authors: Leon Jekel; Waverly R Brim; Marc von Reppert; Lawrence Staib; Gabriel Cassinelli Petersen; Sara Merkaj; Harry Subramanian; Tal Zeevi; Seyedmehdi Payabvash; Khaled Bousabarah; MingDe Lin; Jin Cui; Alexandria Brackett; Amit Mahajan; Antonio Omuro; Michele H Johnson; Veronica L Chiang; Ajay Malhotra; Björn Scheffler; Mariam S Aboian Journal: Cancers (Basel) Date: 2022-03-08 Impact factor: 6.639