| Literature DB >> 34980622 |
M Luke Marinovich1,2, Elizabeth Wylie3, William Lotter4, Alison Pearce2, Stacy M Carter5, Helen Lund3, Andrew Waddell3, Jiye G Kim4, Gavin F Pereira6,7, Christoph I Lee8, Sophia Zackrisson9, Meagan Brennan2, Nehmat Houssami2,10.
Abstract
INTRODUCTION: Artificial intelligence (AI) algorithms for interpreting mammograms have the potential to improve the effectiveness of population breast cancer screening programmes if they can detect cancers, including interval cancers, without contributing substantially to overdiagnosis. Studies suggesting that AI has comparable or greater accuracy than radiologists commonly employ 'enriched' datasets in which cancer prevalence is higher than in population screening. Routine screening outcome metrics (cancer detection and recall rates) cannot be estimated from these datasets, and accuracy estimates may be subject to spectrum bias which limits generalisabilty to real-world screening. We aim to address these limitations by comparing the accuracy of AI and radiologists in a cohort of consecutive of women attending a real-world population breast cancer screening programme. METHODS AND ANALYSIS: A retrospective, consecutive cohort of digital mammography screens from 109 000 distinct women was assembled from BreastScreen WA (BSWA), Western Australia's biennial population screening programme, from November 2016 to December 2017. The cohort includes 761 screen-detected and 235 interval cancers. Descriptive characteristics and results of radiologist double-reading will be extracted from BSWA outcomes data collection. Mammograms will be reinterpreted by a commercial AI algorithm (DeepHealth). AI accuracy will be compared with that of radiologist single-reading based on the difference in the area under the receiver operating characteristic curve. Cancer detection and recall rates for combined AI-radiologist reading will be estimated by pairing the first radiologist read per screen with the AI algorithm, and compared with estimates for radiologist double-reading. ETHICS AND DISSEMINATION: This study has ethical approval from the Women and Newborn Health Service Ethics Committee (EC00350) and the Curtin University Human Research Ethics Committee (HRE2020-0316). Findings will be published in peer-reviewed journals and presented at national and international conferences. Results will also be disseminated to stakeholders in Australian breast cancer screening programmes and policy makers in population screening. © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.Entities:
Keywords: breast imaging; breast tumours; diagnostic radiology
Mesh:
Year: 2022 PMID: 34980622 PMCID: PMC8724814 DOI: 10.1136/bmjopen-2021-054005
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Figure 1Flowchart of cohort inclusions and exclusions.
Figure 2Digital mammogram mediolateral oblique view with region of interest (denoted by bounding box) identified by the AI algorithm as suspicious for malignancy. Cancer was confirmed as invasive ductal carcinoma. AI, artificial intelligence.
Significant gaps in knowledge needed to develop prospective real-world screening trials or evaluation (adapted from Houssami et al13)
| Knowledge gap or limitations of published studies | Addressed by this study? | Description of how addressed in our study |
| Few studies use commercially available AI systems. | Partly | The AI algorithm used in this study |
| Studies have used relatively small datasets, often consisting of mammograms from several hundred women (rarely several thousand). Larger validation datasets are required. | Yes | A large validation dataset including 109 000 women will be used. |
| The same or selected subsets of the same datasets were used to train and validate models. Validation using independent, external datasets is required. | Yes | The study dataset is external to and independent from the datasets used to train the algorithm. |
| Datasets were commonly enriched with malignant lesions, with studies often selecting images containing suspicious abnormalities. Studies are required in unselected screening populations. | Yes | The study dataset is a consecutive, unselected population drawn from a real world, biennial population-based breast screening programme (BreastScreen WA). The dataset is not enriched with cancers. The prevalence and disease spectrum of screen-detected and interval cancers are representative of population breast screening. |
| There is a paucity of studies reporting conventional screening metrics (CDR and recall rate). | Yes | The inclusion of unique, consecutive screening episodes will allow estimation of CDR and recall rate (it is not possible to accurately derive these metrics from case-controlled, cancer-enriched datasets). |
| There is limited data on AI versus human interpretation. Future studies should compare AI to radiologists’ performance or report the incremental improvement for AI algorithms in combination with radiologists. | Yes | The comparative accuracy of AI and radiologists will be estimated in terms of AUC-ROC, sensitivity and specificity. Incremental rates of cancer detection and recall will be estimated for double-reading with and without AI. |
| There are no studies on women’s or societal perspectives on the acceptability of AI. | No | This is beyond the scope of the present study. A parallel stream of social and ethical research by some of the study investigators will explore the acceptability of AI. |
| Future studies should include images from digital breast tomosynthesis, given the rapid adoption of this technology. | No | This is beyond the scope of the present study. Digital breast tomosynthesis is not currently used in Australian publicly funded population breast screening programmes. |
AI, artificial intelligence; AUC-ROC, area under the receiver operating characteristic curve; CDR, cancer detection rate; FDA, Food and Drug Administration.