Literature DB >> 20229900

Effect of finite sample size on feature selection and classification: a simulation study.

Ted W Way1, Berkman Sahiner, Lubomir M Hadjiiski, Heang-Ping Chan.   

Abstract

PURPOSE: The small number of samples available for training and testing is often the limiting factor in finding the most effective features and designing an optimal computer-aided diagnosis (CAD) system. Training on a limited set of samples introduces bias and variance in the performance of a CAD system relative to that trained with an infinite sample size. In this work, the authors conducted a simulation study to evaluate the performances of various combinations of classifiers and feature selection techniques and their dependence on the class distribution, dimensionality, and the training sample size. The understanding of these relationships will facilitate development of effective CAD systems under the constraint of limited available samples.
METHODS: Three feature selection techniques, the stepwise feature selection (SFS), sequential floating forward search (SFFS), and principal component analysis (PCA), and two commonly used classifiers, Fisher's linear discriminant analysis (LDA) and support vector machine (SVM), were investigated. Samples were drawn from multidimensional feature spaces of multivariate Gaussian distributions with equal or unequal covariance matrices and unequal means, and with equal covariance matrices and unequal means estimated from a clinical data set. Classifier performance was quantified by the area under the receiver operating characteristic curve Az. The mean Az values obtained by resubstitution and hold-out methods were evaluated for training sample sizes ranging from 15 to 100 per class. The number of simulated features available for selection was chosen to be 50, 100, and 200.
RESULTS: It was found that the relative performance of the different combinations of classifier and feature selection method depends on the feature space distributions, the dimensionality, and the available training sample sizes. The LDA and SVM with radial kernel performed similarly for most of the conditions evaluated in this study, although the SVM classifier showed a slightly higher hold-out performance than LDA for some conditions and vice versa for other conditions. PCA was comparable to or better than SFS and SFFS for LDA at small samples sizes, but inferior for SVM with polynomial kernel. For the class distributions simulated from clinical data, PCA did not show advantages over the other two feature selection methods. Under this condition, the SVM with radial kernel performed better than the LDA when few training samples were available, while LDA performed better when a large number of training samples were available.
CONCLUSIONS: None of the investigated feature selection-classifier combinations provided consistently superior performance under the studied conditions for different sample sizes and feature space distributions. In general, the SFFS method was comparable to the SFS method while PCA may have an advantage for Gaussian feature spaces with unequal covariance matrices. The performance of the SVM with radial kernel was better than, or comparable to, that of the SVM with polynomial kernel under most conditions studied.

Mesh:

Year:  2010        PMID: 20229900      PMCID: PMC2826389          DOI: 10.1118/1.3284974

Source DB:  PubMed          Journal:  Med Phys        ISSN: 0094-2405            Impact factor:   4.071


  14 in total

1.  Classifier design for computer-aided diagnosis: effects of finite sample size on the mean performance of classical and neural network classifiers.

Authors:  H P Chan; B Sahiner; R F Wagner; N Petrick
Journal:  Med Phys       Date:  1999-12       Impact factor: 4.071

2.  Feature selection and classifier performance in computer-aided diagnosis: the effect of finite sample size.

Authors:  B Sahiner; H P Chan; N Petrick; R F Wagner; L Hadjiiski
Journal:  Med Phys       Date:  2000-07       Impact factor: 4.071

3.  Support vector machines committee classification method for computer-aided polyp detection in CT colonography.

Authors:  Anna K Jerebko; James D Malley; Marek Franaszek; Ronald M Summers
Journal:  Acad Radiol       Date:  2005-04       Impact factor: 3.173

4.  Computer-aided diagnosis of pulmonary nodules on CT scans: segmentation and classification using 3D active contours.

Authors:  Ted W Way; Lubomir M Hadjiiski; Berkman Sahiner; Heang-Ping Chan; Philip N Cascade; Ella A Kazerooni; Naama Bogot; Chuan Zhou
Journal:  Med Phys       Date:  2006-07       Impact factor: 4.071

5.  Analysis and minimization of overtraining effect in rule-based classifiers for computer-aided diagnosis.

Authors:  Qiang Li; Kunio Doi
Journal:  Med Phys       Date:  2006-02       Impact factor: 4.071

6.  Support vector machines for histogram-based image classification.

Authors:  O Chapelle; P Haffner; V N Vapnik
Journal:  IEEE Trans Neural Netw       Date:  1999

7.  Classifier performance estimation under the constraint of a finite sample size: resampling schemes applied to neural network classifiers.

Authors:  Berkman Sahiner; Heang-Ping Chan; Lubomir Hadjiiski
Journal:  Neural Netw       Date:  2007-12-17

8.  Classifier performance prediction for computer-aided diagnosis using a limited dataset.

Authors:  Berkman Sahiner; Heang-Ping Chan; Lubomir Hadjiiski
Journal:  Med Phys       Date:  2008-04       Impact factor: 4.071

9.  Computer-aided detection of interstitial abnormalities in chest radiographs using a reference standard based on computed tomography.

Authors:  Yulia Arzhaeva; Mathias Prokop; David M J Tax; Pim A De Jong; Cornelia M Schaefer-Prokop; Bram van Ginneken
Journal:  Med Phys       Date:  2007-12       Impact factor: 4.071

10.  Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data.

Authors:  Xuegong Zhang; Xin Lu; Qian Shi; Xiu-Qin Xu; Hon-Chiu E Leung; Lyndsay N Harris; James D Iglehart; Alexander Miron; Jun S Liu; Wing H Wong
Journal:  BMC Bioinformatics       Date:  2006-04-10       Impact factor: 3.169

View more
  13 in total

Review 1.  Machine Learning for Medical Imaging.

Authors:  Bradley J Erickson; Panagiotis Korfiatis; Zeynettin Akkus; Timothy L Kline
Journal:  Radiographics       Date:  2017-02-17       Impact factor: 5.333

2.  Mass detection in digital breast tomosynthesis: Deep convolutional neural network with transfer learning from mammography.

Authors:  Ravi K Samala; Heang-Ping Chan; Lubomir Hadjiiski; Mark A Helvie; Jun Wei; Kenny Cha
Journal:  Med Phys       Date:  2016-12       Impact factor: 4.071

Review 3.  How to read and review papers on machine learning and artificial intelligence in radiology: a survival guide to key methodological concepts.

Authors:  Burak Kocak; Ece Ates Kus; Ozgur Kilickesmez
Journal:  Eur Radiol       Date:  2020-10-01       Impact factor: 5.315

4.  Computer-aided detection system for clustered microcalcifications in digital breast tomosynthesis using joint information from volumetric and planar projection images.

Authors:  Ravi K Samala; Heang-Ping Chan; Yao Lu; Lubomir M Hadjiiski; Jun Wei; Mark A Helvie
Journal:  Phys Med Biol       Date:  2015-10-14       Impact factor: 3.609

5.  Support vector machine model for diagnosing pneumoconiosis based on wavelet texture features of digital chest radiographs.

Authors:  Biyun Zhu; Hui Chen; Budong Chen; Yan Xu; Kuan Zhang
Journal:  J Digit Imaging       Date:  2014-02       Impact factor: 4.056

Review 6.  Machine Learning-Based Radiomics in Neuro-Oncology.

Authors:  Felix Ehret; David Kaul; Hans Clusmann; Daniel Delev; Julius M Kernbach
Journal:  Acta Neurochir Suppl       Date:  2022

7.  Personalized prediction model for seizure-free epilepsy with levetiracetam therapy: a retrospective data analysis using support vector machine.

Authors:  Jia-Hui Zhang; Xiong Han; Hong-Wei Zhao; Di Zhao; Na Wang; Ting Zhao; Gui-Nv He; Xue-Rui Zhu; Ying Zhang; Jiu-Yan Han; Dian-Ling Huang
Journal:  Br J Clin Pharmacol       Date:  2018-09-03       Impact factor: 4.335

8.  Automatic detection of mind wandering in a simulated driving task with behavioral measures.

Authors:  Yuyu Zhang; Takatsune Kumada
Journal:  PLoS One       Date:  2018-11-12       Impact factor: 3.240

9.  RiGoR: reporting guidelines to address common sources of bias in risk model development.

Authors:  Kathleen F Kerr; Allison Meisner; Heather Thiessen-Philbrook; Steven G Coca; Chirag R Parikh
Journal:  Biomark Res       Date:  2015-01-24

10.  Identifying classifier input signals to predict a cross-slope during transtibial amputee walking.

Authors:  Courtney E Shell; Glenn K Klute; Richard R Neptune
Journal:  PLoS One       Date:  2018-02-16       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.