Literature DB >> 20879576

Enhancement of breast CADx with unlabeled data.

Andrew R Jamieson1, Maryellen L Giger, Karen Drukker, Lorenzo L Pesce.   

Abstract

PURPOSE: Unlabeled medical image data are abundant, yet the process of converting them into a labeled ("truth-known") database is time and resource expensive and fraught with ethical and logistics issues. The authors propose a dual-stage CADx scheme in which both labeled and unlabeled (truth-known and "truth-unknown") data are used. This study is an initial exploration of the potential for leveraging unlabeled data toward enhancing breast CADx.
METHODS: From a labeled ultrasound image database consisting of 1126 lesions with an empirical cancer prevalence of 14%, 200 different randomly sampled subsets were selected and the truth status of a variable number of cases was masked to the algorithm to mimic different types of labeled and unlabeled data sources. The prevalence was fixed at 50% cancerous for the labeled data and 5% cancerous for the unlabeled. In the first stage of the dual-stage CADx scheme, the authors term "transductive dimension reduction regularization" (TDR-R), both labeled and unlabeled images characterized by extracted lesion features were combined using dimension reduction (DR) techniques and mapped to a lower-dimensional representation. (The first stage ignored truth status therefore was an unsupervised algorithm.) In the second stage, the labeled data from the reduced dimension embedding were used to train a classifier toward estimating the probability of malignancy. For the first CADx stage, the authors investigated three DR approaches: Laplacian eigen-maps, t-distributed stochastic neighbor embedding (t-SNE), and principal component analysis. For the TDR-R methods, the classifier in the second stage was a supervised (i.e., utilized truth) Bayesian neural net. The dual-stage CADx schemes were compared to a single-stage scheme based on manifold regularization (MR) in a semisupervised setting via the LapSVM algorithm. Performance in terms of areas under the ROC curve (AUC) of the CADx schemes was evaluated in leave-one-out and .632+ bootstrap analyses on a by-lesion basis. Additionally, the trained algorithms were applied to an independent test data set consisting of 101 lesions with approximately 50% cancer prevalence. The difference in AUC (deltaAUC) between with and without the use of unlabeled data was computed.
RESULTS: Statistically significant differences in the average AUC value (deltaAUC) were found in many instances between training with and without unlabeled data, based on the sample set distributions generated from this particular ultrasound data set during cross-validation and using independent test set. For example, when using 100 labeled and 900 unlabeled cases and testing on the independent test set, the TDR-R methods produced average deltaAUC=0.0361 with 95% intervals [0.0301; 0.0408] (p-value < 0.0001, adjusted for multiple comparisons, but considering the test set fixed) using t-SNE and average deltaAUC=.026 [0.0227, 0.0298] (adjusted p-value < 0.0001) using Laplacian eigenmaps, while the MR-based LapSVM produced an average deltaAUC=.0381 [0.0351; 0.0405] (adjusted p-value < 0.0001). The authors also found that schemes initially obtaining lower than average performance when using labeled data only showed the most prominent increase in performance when unlabeled data were added in the first CADx stage, suggesting a regularization effect due to the injection of unlabeled data.
CONCLUSION: The findings reveal evidence that incorporating unlabeled data information into the overall development of CADx methods may improve classifier performance by non-negligible amounts and warrants further investigation.

Entities:  

Mesh:

Year:  2010        PMID: 20879576      PMCID: PMC2921421          DOI: 10.1118/1.3455704

Source DB:  PubMed          Journal:  Med Phys        ISSN: 0094-2405            Impact factor:   4.071


  14 in total

1.  Feature selection with limited datasets.

Authors:  M A Kupinski; M L Giger
Journal:  Med Phys       Date:  1999-10       Impact factor: 4.071

2.  Feature selection and classifier performance in computer-aided diagnosis: the effect of finite sample size.

Authors:  B Sahiner; H P Chan; N Petrick; R F Wagner; L Hadjiiski
Journal:  Med Phys       Date:  2000-07       Impact factor: 4.071

3.  Learning eigenfunctions links spectral embedding and kernel PCA.

Authors:  Yoshua Bengio; Olivier Delalleau; Nicolas Le Roux; Jean-François Paiement; Pascal Vincent; Marie Ouimet
Journal:  Neural Comput       Date:  2004-10       Impact factor: 2.026

4.  Exploring nonlinear feature space dimension reduction and data representation in breast Cadx with Laplacian eigenmaps and t-SNE.

Authors:  Andrew R Jamieson; Maryellen L Giger; Karen Drukker; Hui Li; Yading Yuan; Neha Bhooshan
Journal:  Med Phys       Date:  2010-01       Impact factor: 4.071

5.  Multimodality computerized diagnosis of breast lesions using mammography and sonography.

Authors:  Karen Drukker; Karla Horsch; Maryellen L Giger
Journal:  Acad Radiol       Date:  2005-08       Impact factor: 3.173

6.  Holistic component of image perception in mammogram interpretation: gaze-tracking study.

Authors:  Harold L Kundel; Calvin F Nodine; Emily F Conant; Susan P Weinstein
Journal:  Radiology       Date:  2007-02       Impact factor: 11.105

Review 7.  Anniversary paper: History and status of CAD and quantitative image analysis: the role of Medical Physics and AAPM.

Authors:  Maryellen L Giger; Heang-Ping Chan; John Boone
Journal:  Med Phys       Date:  2008-12       Impact factor: 4.071

8.  Classifier performance prediction for computer-aided diagnosis using a limited dataset.

Authors:  Berkman Sahiner; Heang-Ping Chan; Lubomir Hadjiiski
Journal:  Med Phys       Date:  2008-04       Impact factor: 4.071

Review 9.  To recognize shapes, first learn to generate images.

Authors:  Geoffrey E Hinton
Journal:  Prog Brain Res       Date:  2007       Impact factor: 2.453

10.  Efficient use of unlabeled data for protein sequence classification: a comparative study.

Authors:  Pavel Kuksa; Pai-Hsi Huang; Vladimir Pavlovic
Journal:  BMC Bioinformatics       Date:  2009-04-29       Impact factor: 3.169

View more
  4 in total

1.  Classification of small lesions on dynamic breast MRI: Integrating dimension reduction and out-of-sample extension into CADx methodology.

Authors:  Mahesh B Nagarajan; Markus B Huber; Thomas Schlossbauer; Gerda Leinsinger; Andrzej Krol; Axel Wismüller
Journal:  Artif Intell Med       Date:  2013-11-23       Impact factor: 5.326

2.  Computerized detection of breast cancer on automated breast ultrasound imaging of women with dense breasts.

Authors:  Karen Drukker; Charlene A Sennett; Maryellen L Giger
Journal:  Med Phys       Date:  2014-01       Impact factor: 4.071

3.  Comparison of Breast Cancer Screening Results in Korean Middle-Aged Women: A Hospital-based Prospective Cohort Study.

Authors:  Taebum Lee
Journal:  Osong Public Health Res Perspect       Date:  2013-06-27

4.  Integrating dimension reduction and out-of-sample extension in automated classification of ex vivo human patellar cartilage on phase contrast X-ray computed tomography.

Authors:  Mahesh B Nagarajan; Paola Coan; Markus B Huber; Paul C Diemoz; Axel Wismüller
Journal:  PLoS One       Date:  2015-02-24       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.