Takeshi Yamaguchi1, Kenichi Inoue2, Hiroko Tsunoda3, Takayoshi Uematsu4, Norimitsu Shinohara5, Hirofumi Mukai6. 1. Division of Medical Oncology, Japanese Red Cross Musashino Hospital. 2. Breast Cancer Center, Shonan Memorial Hospital, Kanagawa. 3. Department of Radiology, St. Luke's International Hospital, Tokyo. 4. Division of Breast Imaging and Breast Interventional Radiology, Shizuoka Cancer Center Hospital, Shizuoka. 5. Department of Radiological Technology, Faculty of Health Sciences, Gifu University of Medical Science, Gifu. 6. Division of Breast and Medical Oncology, National Cancer Center Hospital East, Chiba, Japan.
Abstract
BACKGROUND: Screening mammography has led to reduced breast cancer-specific mortality and is recommended worldwide. However, the resultant doctors' workload of reading mammographic scans needs to be addressed. Although computer-aided detection (CAD) systems have been developed to support readers, the findings are conflicting regarding whether traditional CAD systems improve reading performance. Rapid progress in the artificial intelligence (AI) field has led to the advent of newer CAD systems using deep learning-based algorithms which have the potential to reach human performance levels. Those systems, however, have been developed using mammography images mainly from women in western countries. Because Asian women characteristically have higher-density breasts, it is uncertain whether those AI systems can apply to Japanese women. In this study, we will construct a deep learning-based CAD system trained using mammography images from a large number of Japanese women with high quality reading. METHODS: We will collect digital mammography images taken for screening or diagnostic purposes at multiple institutions in Japan. A total of 15,000 images, consisting of 5000 images with breast cancer and 10,000 images with benign lesions, will be collected. At least 1000 images of normal breasts will also be collected for use as reference data. With these data, we will construct a deep learning-based AI system to detect breast cancer on mammograms. The primary endpoint will be the sensitivity and specificity of the AI system with the test image set. DISCUSSION: When the ability of AI reading is shown to be on a par with that of human reading, images of normal breasts or benign lesions that do not have to be read by a human can be selected by AI beforehand. Our AI might work well in Asian women who have similar breast density, size, and shape to those of Japanese women. TRIAL REGISTRATION: UMIN, trial number UMIN000039009. Registered 26 December 2019, https://www.umin.ac.jp/ctr/.
BACKGROUND: Screening mammography has led to reduced breast cancer-specific mortality and is recommended worldwide. However, the resultant doctors' workload of reading mammographic scans needs to be addressed. Although computer-aided detection (CAD) systems have been developed to support readers, the findings are conflicting regarding whether traditional CAD systems improve reading performance. Rapid progress in the artificial intelligence (AI) field has led to the advent of newer CAD systems using deep learning-based algorithms which have the potential to reach human performance levels. Those systems, however, have been developed using mammography images mainly from women in western countries. Because Asian women characteristically have higher-density breasts, it is uncertain whether those AI systems can apply to Japanese women. In this study, we will construct a deep learning-based CAD system trained using mammography images from a large number of Japanese women with high quality reading. METHODS: We will collect digital mammography images taken for screening or diagnostic purposes at multiple institutions in Japan. A total of 15,000 images, consisting of 5000 images with breast cancer and 10,000 images with benign lesions, will be collected. At least 1000 images of normal breasts will also be collected for use as reference data. With these data, we will construct a deep learning-based AI system to detect breast cancer on mammograms. The primary endpoint will be the sensitivity and specificity of the AI system with the test image set. DISCUSSION: When the ability of AI reading is shown to be on a par with that of human reading, images of normal breasts or benign lesions that do not have to be read by a human can be selected by AI beforehand. Our AI might work well in Asian women who have similar breast density, size, and shape to those of Japanese women. TRIAL REGISTRATION: UMIN, trial number UMIN000039009. Registered 26 December 2019, https://www.umin.ac.jp/ctr/.
Several randomized controlled trials have demonstrated reduced breast cancer-specific mortality due to mammography screening programs.[ The Japanese guidelines for breast cancer screening recommend mammography every 2 years for women aged 40 years and older.[ However the interpretative performance of diagnostic mammography is influenced by readers’ experience and working time engaged in breast imaging.[ The Japan Central Organization on Quality Assurance of Breast Cancer Screening has a program to evaluate mammography readers’ ability to maintain the quality of screening. The organization rates readers on a scale of A to D according to the sensitivity and specificity of mammography reading tests. Readers with rank A or B are considered to have above-average skills and are certified by the organization.The breast cancer screening rate in Japan was 36.9% in 2016 (from Comprehensive Survey of Living Conditions), lower than that in other developed countries. Various efforts are being made to increase the rate. Increased numbers of women screened and the use of double reading in mammography screening programs, however, creates a high workload for readers and increases economic costs. Moreover, up to 25% of mammographically-visible cancers are still not detected at screening.[ Thus, alternative strategies are needed to reduce readers’ burden and detect breast cancer efficiently.Computer-aided detection (CAD) systems have been developed to help doctors read mammographic images. However, the clinical utility of traditional CAD systems is not yet determined. While some studies have reported improved readers’ performance,[ others have shown increased false-positive results and unnecessary biopsies due to a low specificity.[The field of artificial intelligence (AI) is rapidly evolving and novel CAD systems using deep learning convolutional neural networks have been developed. Several deep learning-based CAD systems for the analysis of mammograms have been developed, some of which have already shown very promising results.[ AI systems that have comparable reading ability to humans help readers improve the cancer detection rate as well as reducing the reading workload by pre-selecting suspicious lesions in mammographic images. However, the image analysis algorithms have been created and studied mainly in western countries. Because Asian women characteristically have higher-density breasts than women from other ethnic groups,[ it is uncertain whether AI systems created based on data from western countries can apply to Japanese women. Therefore, a deep learning-based automated diagnostic system for mammograms has to be developed with data from Japanese women.In this study, we aim to construct a deep learning-based CAD system trained using a large number of mammograms from Japanese women. A large number of learning images with high-quality readings is crucial to create a sophisticated system. Thus, we will use the mammographic images read by readers ranked grade A according to the Japan Central Organization on Quality Assurance of Breast Cancer Screening.This trial was approved by the institutional review board of National Cancer Center Hospital East.
Methods
Objectives
The aim of this study is to construct a deep learning-based AI system to detect breast cancer on mammograms with high specificity, and to evaluate the performance of the AI system.
Study setting
This is a multicenter retrospective study. We will use digital mammography (DM) images taken for screening or diagnostic purposes in participating institutions.
Endpoints
The primary endpoint is the sensitivity and specificity of the AI system to detect breast cancer with the test image set. Sensitivity is calculated as the number of images in which the AI system correctly diagnoses cancer among all images with biopsy-proven cancer. Specificity is calculated as the number of images in which the AI system correctly diagnoses normal or benign lesions among all images without cancer.
Eligibility criteria
Inclusion criteria
DM images fulfilling all the following criteria will be collected.Taken after 2010Images meeting either one of the following criteriaVisible breast cancer or benign lesions on images.Normal breast.If cancer or benign lesions are visible on images, their outlines can be traced manually.Images from patients aged 20 or older.Available mediolateral oblique view with or without cranial-caudal view.No visible axillary lymph node metastasis from breast cancer.Images from patients with no previous history of chemotherapy, endocrine therapy or radiotherapy.Images from patients who have not received any previous surgical breast procedure including partial resection, breast reconstruction, incisional biopsy, vacuum-assisted biopsy, and mammoplastyRead by readers ranked A according to the Japan Central Organization on Quality Assurance of Breast Cancer Screening.Benign lesions, breast cancer, and normal breast on images are confirmed by the following criteria.Meeting one of the following criteriaConfirmed by histopathology.Without malignancy development over at least 2 years of follow-up.Findings clearly indicating a simple cyst by mammography and other imaging modalities.Confirmed by histopathology.Meeting either one of the following criteriaIn addition to the findings of mammography, ultrasonography and MRI do not detect any lesions.Without malignancy development over at least 2 years of follow-up when no other imaging modalities except mammography are performed.
Exclusion criteria
DM images fulfilling any of the following criteria will not be collected.Tomosynthesis and synthetic 2D mammographic imagesSpot compression viewsPoor image qualityInappropriate images as judged by the local investigators
Construction of an AI algorithm
The machine learning community has been applying deep learning in various medical imaging fields, including mammography images. Based on this work, we have developed a convolutional neural network. The neural network includes pairs of the convolutional layer and the pooling layer with the fully convolutional layers. The classification output is calculated by the softmax function.From mammography images, corresponding masking images of breast lesions will be created manually, indicating the area of the lesions. Mammography images will then be cropped with rectangular patches so that the lesions are included in these patches, and patches will be classified as benign or malignant. Patches in which neither benign nor malignant tumors are included will also be cropped from the mammography images.These patches will be randomly divided into the training dataset, the validation dataset, or the test dataset. The neural network will be fine-tuned to classify whether these patches contain either a benign or a malignant tumor, or neither of them.
Statistical analysis
The Breast Cancer Surveillance Consortium reported readers’ interpretive performance with sensitivity and specificity of around 85% and 90% respectively.[ The sensitivity of mammography was 77% in the J-SATRT trial conducted in Japan.[ Thus, we set the target sensitivity and specificity at 80% or more with the AI system. By referring to the design of previous studies.[ 15,000 DM images will be collected in this study. They will consist of 5000 images with breast cancer and 10,000 images with benign lesions. At least 1000 images of normal breasts will also be collected for use as reference data.We will categorize images with breast cancer according to the main findings (mass, focal asymmetric density, calcification, architectural distortion). At least 750 images will be collected in each category.There will be a limit on the number of accumulated images with each benign lesion. Simple cyst 1500; fibroadenoma or benign phyllodes tumor 1500; intraductal (intracystic) papilloma 1500; adenoma 100; adenomyoepithelioma 100; sclerosing adenosis 1000; mastopathy 1000; breast calcifications 4000; other benign disease 300 images.
Discussion
AI with deep learning needs good training data to work properly. Thus, we put a premium on the quality of training image data in the development of our AI. Experienced and qualified readers will interpret mammograms and define the outline of the malignant or benign lesions. To delineate the outline precisely, surgical pathology reports or the results of other imaging modalities will be taken into account whenever possible. The diagnosis will be confirmed by histopathology. Benign diseases without available pathological results will be clinically diagnosed after at least 2 years of follow-up. To avoid biased collection toward a particular lesion, a target number of images has been set according to each radiographic finding in breast cancer or in each benign disease.Mammographic images will be obtained from multiple participating centers. The DM images will be acquired with devices from various vendors. The images used in this study will be taken from women aged 20 or older (not confined to the recommended screening age), and originating not only from screening but also from clinical practice. Larger and more advanced breast cancers are expected from clinical practice, with different characteristics from screen-detected cancers. These various kinds of data will increase the generalizability of our AI algorithms. Alongside images, we will collect clinicopathological features of each case, including pathological diagnosis, tumor diameter, hormone receptor status, HER2 status, breast density, and vendors. This will enable us to perform detailed analysis of which patient population is more suitable for AI reading.Our study has a limitation. The dataset will come only from Japanese women. Thus, our AI system might not be applicable to Caucasian patients. However, it will probably work well in Asian women who have similar breast density, size, and shape to Japanese women.Japan has a double reading system in screening mammography. When the ability of AI reading is shown to be on a par with that of human reading, images of normal breasts or benign lesions that do not have to be read by a human can be selected by AI beforehand (the first reader role). This AI preselection might obviate the need for double reading. However, future studies will be required to clarify how much the workload of readers is reduced or which threshold of cancer probability for alerting humans is optimal.
Author contributions
Acquisition of data and data analysis and interpretation: Takeshi Yamaguchi, Kenichi Inoue, Hiroko Tsunoda, Takayoshi Uematsu, Norimitsu Shinohara, Hirofumi Mukai.Critical revision of the manuscript: Kenichi Inoue, Hiroko Tsunoda, Takayoshi Uematsu, Norimitsu Shinohara, Hirofumi Mukai.Study concept and design: Takeshi Yamaguchi, Kenichi Inoue , Hiroko Tsunoda, Takayoshi Uematsu, Norimitsu Shinohara, Hirofumi Mukai.Writing the manuscript: Takeshi Yamaguchi.All authors read and approved the final manuscript.
Authors: Joshua J Fenton; Stephen H Taplin; Patricia A Carney; Linn Abraham; Edward A Sickles; Carl D'Orsi; Eric A Berns; Gary Cutter; R Edward Hendrick; William E Barlow; Joann G Elmore Journal: N Engl J Med Date: 2007-04-05 Impact factor: 91.245
Authors: Heidi D Nelson; Kari Tyne; Arpana Naik; Christina Bougatsos; Benjamin K Chan; Linda Humphrey Journal: Ann Intern Med Date: 2009-11-17 Impact factor: 25.391
Authors: Alejandro Rodriguez-Ruiz; Kristina Lång; Albert Gubern-Merida; Jonas Teuwen; Mireille Broeders; Gisella Gennaro; Paola Clauser; Thomas H Helbich; Margarita Chevalier; Thomas Mertelmeier; Matthew G Wallis; Ingvar Andersson; Sophia Zackrisson; Ioannis Sechopoulos; Ritse M Mann Journal: Eur Radiol Date: 2019-04-16 Impact factor: 5.315