Literature DB >> 33368376

Risks of feature leakage and sample size dependencies in deep feature extraction for breast mass classification.

Ravi K Samala1, Heang-Ping Chan1, Lubomir Hadjiiski1, Mark A Helvie1.   

Abstract

PURPOSE: Transfer learning is commonly used in deep learning for medical imaging to alleviate the problem of limited available data. In this work, we studied the risk of feature leakage and its dependence on sample size when using pretrained deep convolutional neural network (DCNN) as feature extractor for classification breast masses in mammography.
METHODS: Feature leakage occurs when the training set is used for feature selection and classifier modeling while the cost function is guided by the validation performance or informed by the test performance. The high-dimensional feature space extracted from pretrained DCNN suffers from the curse of dimensionality; feature subsets that can provide excessively optimistic performance can be found for the validation set or test set if the latter is allowed for unlimited reuse during algorithm development. We designed a simulation study to examine feature leakage when using DCNN as feature extractor for mass classification in mammography. Four thousand five hundred and seventy-seven unique mass lesions were partitioned by patient into three sets: 3222 for training, 508 for validation, and 847 for independent testing. Three pretrained DCNNs, AlexNet, GoogLeNet, and VGG16, were first compared using a training set in fourfold cross validation and one was selected as the feature extractor. To assess generalization errors, the independent test set was sequestered as truly unseen cases. A training set of a range of sizes from 10% to 75% was simulated by random drawing from the available training set in addition to 100% of the training set. Three commonly used feature classifiers, the linear discriminant, the support vector machine, and the random forest were evaluated. A sequential feature selection method was used to find feature subsets that could achieve high classification performance in terms of the area under the receiver operating characteristic curve (AUC) in the validation set. The extent of feature leakage and the impact of training set size were analyzed by comparison to the performance in the unseen test set.
RESULTS: All three classifiers showed large generalization error between the validation set and the independent sequestered test set at all sample sizes. The generalization error decreased as the sample size increased. At 100% of the sample size, one classifier achieved an AUC as high as 0.91 on the validation set while the corresponding performance on the unseen test set only reached an AUC of 0.72.
CONCLUSIONS: Our results demonstrate that large generalization errors can occur in AI tools due to feature leakage. Without evaluation on unseen test cases, optimistically biased performance may be reported inadvertently, and can lead to unrealistic expectations and reduce confidence for clinical implementation.
© 2020 American Association of Physicists in Medicine.

Entities:  

Keywords:  breast cancer classification; feature leakage; generalization error; pretrained DCNN; sample size

Mesh:

Year:  2021        PMID: 33368376      PMCID: PMC8601676          DOI: 10.1002/mp.14678

Source DB:  PubMed          Journal:  Med Phys        ISSN: 0094-2405            Impact factor:   4.506


  18 in total

1.  Classifier design for computer-aided diagnosis: effects of finite sample size on the mean performance of classical and neural network classifiers.

Authors:  H P Chan; B Sahiner; R F Wagner; N Petrick
Journal:  Med Phys       Date:  1999-12       Impact factor: 4.071

2.  Classification of mass and normal breast tissue: a convolution neural network classifier with spatial domain and texture images.

Authors:  B Sahiner; H P Chan; N Petrick; D Wei; M A Helvie; D D Adler; M M Goodsitt
Journal:  IEEE Trans Med Imaging       Date:  1996       Impact factor: 10.048

3.  Evaluation of computer-aided detection and diagnosis systems.

Authors:  Nicholas Petrick; Berkman Sahiner; Samuel G Armato; Alberto Bert; Loredana Correale; Silvia Delsanto; Matthew T Freedman; David Fryd; David Gur; Lubomir Hadjiiski; Zhimin Huo; Yulei Jiang; Lia Morra; Sophie Paquerault; Vikas Raykar; Frank Samuelson; Ronald M Summers; Georgia Tourassi; Hiroyuki Yoshida; Bin Zheng; Chuan Zhou; Heang-Ping Chan
Journal:  Med Phys       Date:  2013-08       Impact factor: 4.071

Review 4.  Deep learning.

Authors:  Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal:  Nature       Date:  2015-05-28       Impact factor: 49.962

5.  An Ensemble of Fine-Tuned Convolutional Neural Networks for Medical Image Classification.

Authors:  Ashnil Kumar; Jinman Kim; David Lyndon; Michael Fulham; Dagan Feng
Journal:  IEEE J Biomed Health Inform       Date:  2016-12-05       Impact factor: 5.772

6.  Mass detection in digital breast tomosynthesis: Deep convolutional neural network with transfer learning from mammography.

Authors:  Ravi K Samala; Heang-Ping Chan; Lubomir Hadjiiski; Mark A Helvie; Jun Wei; Kenny Cha
Journal:  Med Phys       Date:  2016-12       Impact factor: 4.071

7.  Multi-task transfer learning deep convolutional neural network: application to computer-aided diagnosis of breast cancer on mammograms.

Authors:  Ravi K Samala; Heang-Ping Chan; Lubomir M Hadjiiski; Mark A Helvie; Kenny H Cha; Caleb D Richter
Journal:  Phys Med Biol       Date:  2017-11-10       Impact factor: 3.609

Review 8.  Computer-aided diagnosis in the era of deep learning.

Authors:  Heang-Ping Chan; Lubomir M Hadjiiski; Ravi K Samala
Journal:  Med Phys       Date:  2020-06       Impact factor: 4.071

9.  Breast Cancer Diagnosis in Digital Breast Tomosynthesis: Effects of Training Sample Size on Multi-Stage Transfer Learning Using Deep Neural Nets.

Authors:  Ravi K Samala; Lubomir Hadjiiski; Mark A Helvie; Caleb D Richter; Kenny H Cha
Journal:  IEEE Trans Med Imaging       Date:  2019-03       Impact factor: 10.048

10.  A Deep Learning-Based Radiomics Model for Prediction of Survival in Glioblastoma Multiforme.

Authors:  Jiangwei Lao; Yinsheng Chen; Zhi-Cheng Li; Qihua Li; Ji Zhang; Jing Liu; Guangtao Zhai
Journal:  Sci Rep       Date:  2017-09-04       Impact factor: 4.379

View more
  3 in total

Review 1.  Transfer learning for medical image classification: a literature review.

Authors:  Mate E Maros; Thomas Ganslandt; Hee E Kim; Alejandro Cosa-Linan; Nandhini Santhanam; Mahboubeh Jannesari
Journal:  BMC Med Imaging       Date:  2022-04-13       Impact factor: 1.930

Review 2.  Machine learning for the life-time risk prediction of Alzheimer's disease: a systematic review.

Authors:  Thomas W Rowe; Ioanna K Katzourou; Joshua O Stevenson-Hoare; Matthew R Bracher-Smith; Dobril K Ivanov; Valentina Escott-Price
Journal:  Brain Commun       Date:  2021-10-21

3.  A Novel Integration of IF-DEMATEL and TOPSIS for the Classifier Selection Problem in Assistive Technology Adoption for People with Dementia.

Authors:  Miguel Angel Ortíz-Barrios; Matias Garcia-Constantino; Chris Nugent; Isaac Alfaro-Sarmiento
Journal:  Int J Environ Res Public Health       Date:  2022-01-20       Impact factor: 3.390

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.