Literature DB >> 18272329

Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance.

Maciej A Mazurowski1, Piotr A Habas, Jacek M Zurada, Joseph Y Lo, Jay A Baker, Georgia D Tourassi.   

Abstract

This study investigates the effect of class imbalance in training data when developing neural network classifiers for computer-aided medical diagnosis. The investigation is performed in the presence of other characteristics that are typical among medical data, namely small training sample size, large number of features, and correlations between features. Two methods of neural network training are explored: classical backpropagation (BP) and particle swarm optimization (PSO) with clinically relevant training criteria. An experimental study is performed using simulated data and the conclusions are further validated on real clinical data for breast cancer diagnosis. The results show that classifier performance deteriorates with even modest class imbalance in the training data. Further, it is shown that BP is generally preferable over PSO for imbalanced training data especially with small data sample and large number of features. Finally, it is shown that there is no clear preference between oversampling and no compensation approach and some guidance is provided regarding a proper selection.

Entities:  

Mesh:

Year:  2007        PMID: 18272329      PMCID: PMC2346433          DOI: 10.1016/j.neunet.2007.12.031

Source DB:  PubMed          Journal:  Neural Netw        ISSN: 0893-6080


  15 in total

1.  Feature selection and classifier performance in computer-aided diagnosis: the effect of finite sample size.

Authors:  B Sahiner; H P Chan; N Petrick; R F Wagner; L Hadjiiski
Journal:  Med Phys       Date:  2000-07       Impact factor: 4.071

Review 2.  A review of evidence of health benefit from artificial neural networks in medical intervention.

Authors:  P J G Lisboa
Journal:  Neural Netw       Date:  2002-01

3.  Receiver operating characteristic curves and their use in radiology.

Authors:  Nancy A Obuchowski
Journal:  Radiology       Date:  2003-10       Impact factor: 11.105

4.  On the repeated use of databases for testing incremental improvement of computer-aided detection schemes.

Authors:  David Gur; Robert F Wagner; Heang-Ping Chan
Journal:  Acad Radiol       Date:  2004-01       Impact factor: 3.173

5.  Reduction of bias and variance for evaluation of computer-aided diagnostic schemes.

Authors:  Qiang Li; Kunio Doi
Journal:  Med Phys       Date:  2006-04       Impact factor: 4.071

Review 6.  The use of artificial neural networks in decision support in cancer: a systematic review.

Authors:  Paulo J Lisboa; Azzam F G Taktak
Journal:  Neural Netw       Date:  2006-02-14

7.  Backpropagation uses prior information efficiently.

Authors:  E Barnard; E C Botha
Journal:  IEEE Trans Neural Netw       Date:  1993

8.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data.

Authors:  C E Metz; B A Herman; J H Shen
Journal:  Stat Med       Date:  1998-05-15       Impact factor: 2.373

9.  A receiver operating characteristic partial area index for highly sensitive diagnostic tests.

Authors:  Y Jiang; C E Metz; R M Nishikawa
Journal:  Radiology       Date:  1996-12       Impact factor: 11.105

10.  Artificial neural networks in mammography: application to decision making in the diagnosis of breast cancer.

Authors:  Y Wu; M L Giger; K Doi; C J Vyborny; R A Schmidt; C E Metz
Journal:  Radiology       Date:  1993-04       Impact factor: 11.105

View more
  54 in total

1.  Integrating new data balancing technique with committee networks for imbalanced data: GRSOM approach.

Authors:  Danaipong Chetchotsak; Sirorat Pattanapairoj; Banchar Arnonkijpanich
Journal:  Cogn Neurodyn       Date:  2015-07-31       Impact factor: 5.082

Review 2.  Role of deep learning in infant brain MRI analysis.

Authors:  Mahmoud Mostapha; Martin Styner
Journal:  Magn Reson Imaging       Date:  2019-06-20       Impact factor: 2.546

3.  Distance Metric Based Oversampling Method for Bioinformatics and Performance Evaluation.

Authors:  Meng-Fong Tsai; Shyr-Shen Yu
Journal:  J Med Syst       Date:  2016-05-16       Impact factor: 4.460

4.  The importance of physicochemical characteristics and nonlinear classifiers in determining HIV-1 protease specificity.

Authors:  Timmy Manning; Paul Walsh
Journal:  Bioengineered       Date:  2016-04-02       Impact factor: 3.269

5.  Predicting substance use disorder using long-term attention deficit hyperactivity disorder medication records in Truven.

Authors:  Sajjad Fouladvand; Emily R Hankosky; Heather Bush; Jin Chen; Linda P Dwoskin; Patricia R Freeman; Darren W Henderson; Kathleen Kantak; Jeffery Talbert; Shiqiang Tao; Guo-Qiang Zhang
Journal:  Health Informatics J       Date:  2019-05-19       Impact factor: 2.681

6.  A Preventive Model for Muscle Injuries: A Novel Approach based on Learning Algorithms.

Authors:  Alejandro López-Valenciano; Francisco Ayala; JOSé Miguel Puerta; Mark Brian Amos DE Ste Croix; Francisco Jose Vera-Garcia; Sergio Hernández-Sánchez; Iñaki Ruiz-Pérez; Gregory D Myer
Journal:  Med Sci Sports Exerc       Date:  2018-05       Impact factor: 5.411

7.  Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging.

Authors:  Luke Oakden-Rayner; Jared Dunnmon; Gustavo Carneiro; Christopher Ré
Journal:  Proc ACM Conf Health Inference Learn (2020)       Date:  2020-04

8.  Automated Identification of Orthopedic Implants on Radiographs Using Deep Learning.

Authors:  Ravi Patel; Elizabeth H E Thong; Vineet Batta; Anil Anthony Bharath; Darrel Francis; James Howard
Journal:  Radiol Artif Intell       Date:  2021-03-17

Review 9.  Machine learning to detect signatures of disease in liquid biopsies - a user's guide.

Authors:  Jina Ko; Steven N Baldassano; Po-Ling Loh; Konrad Kording; Brian Litt; David Issadore
Journal:  Lab Chip       Date:  2018-01-30       Impact factor: 6.799

10.  A particle swarm based hybrid system for imbalanced medical data sampling.

Authors:  Pengyi Yang; Liang Xu; Bing B Zhou; Zili Zhang; Albert Y Zomaya
Journal:  BMC Genomics       Date:  2009-12-03       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.