| Literature DB >> 30465142 |
Dmytro S Lituiev1, Hari Trivedi1, Maryam Panahiazar1, Beau Norgeot1, Youngho Seo2, Benjamin Franc2, Roy Harnish2, Michael Kawczynski1, Dexter Hadley3.
Abstract
Applying state-of-the-art machine learning techniques to medical images requires a thorough selection and normalization of input data. One of such steps in digital mammography screening for breast cancer is the labeling and removal of special diagnostic views, in which diagnostic tools or magnification are applied to assist in assessment of suspicious initial findings. As a common task in medical informatics is prediction of disease and its stage, these special diagnostic views, which are only enriched among the cohort of diseased cases, will bias machine learning disease predictions. In order to automate this process, here, we develop a machine learning pipeline that utilizes both DICOM headers and images to predict such views in an automatic manner, allowing for their removal and the generation of unbiased datasets. We achieve AUC of 99.72% in predicting special mammogram views when combining both types of models. Finally, we apply these models to clean up a dataset of about 772,000 images with expected sensitivity of 99.0%. The pipeline presented in this paper can be applied to other datasets to obtain high-quality image sets suitable to train algorithms for disease detection.Entities:
Keywords: Convolutional neural networks; DICOM; Machine learning; Mammography
Mesh:
Year: 2019 PMID: 30465142 PMCID: PMC6456464 DOI: 10.1007/s10278-018-0154-z
Source DB: PubMed Journal: J Digit Imaging ISSN: 0897-1889 Impact factor: 4.056
Fig. 1Performance of models and model ensembles. Comparison of machine learning models and their ensembles (rows) is shown according to various metrics (columns). The “wire” row demonstrates performance of the WL model to detect all special views, while “wire (vs other views)” row shows performance of the model to specifically detect WL views. In the last two rows, the performance of the final ensembled model is shown on the validation set and on the holdout set, respectively
Fig. 2Relative feature influence in GBMT model trained on DICOM headers
Fig. 3Saliency maps. Two correctly classified spot views and one correctly classified normal view are presented. The region of highest contribution to the “special view” class are highlighted in green and the areas of highest contribution to the “normal view” are highlighted in red