Literature DB >> 9929274

Improving machine learning performance by removing redundant cases in medical data sets.

L Ohno-Machado1, H S Fraser, A Ohrn.   

Abstract

Neural network models and other machine learning methods have successfully been applied to several medical classification problems. These models can be periodically refined and retrained as new cases become available. Since training neural networks by backpropagation is time consuming, it is desirable that a minimum number of representative cases be kept in the training set (i.e., redundant cases should be removed). The removal of redundant cases should be carefully monitored so that classification performance is not significantly affected. We made experiments on data removal on a data set of 700 patients suspected of having myocardial infarction and show that there is no statistical difference in classification performance (measured by the differences in areas under the ROC curve on two previously unknown sets of 553 and 500 cases) when as many as 86% of the cases are randomly removed. A proportional reduction in the amount of time required to train the neural network model is achieved.

Entities:  

Mesh:

Year:  1998        PMID: 9929274      PMCID: PMC2232167     

Source DB:  PubMed          Journal:  Proc AMIA Symp        ISSN: 1531-605X


  2 in total

1.  Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models.

Authors:  R L Kennedy; A M Burton; H S Fraser; L N McStay; R F Harrison
Journal:  Eur Heart J       Date:  1996-08       Impact factor: 29.983

2.  The meaning and use of the area under a receiver operating characteristic (ROC) curve.

Authors:  J A Hanley; B J McNeil
Journal:  Radiology       Date:  1982-04       Impact factor: 11.105

  2 in total
  3 in total

1.  Classification algorithms applied to narrative reports.

Authors:  A Wilcox; G Hripcsak
Journal:  Proc AMIA Symp       Date:  1999

2.  Effects of data anonymization by cell suppression on descriptive statistics and predictive modeling performance.

Authors:  L Ohno-Machado; S A Vinterbo; S Dreiseitl
Journal:  Proc AMIA Symp       Date:  2001

3.  Mitigating Bias in Radiology Machine Learning: 1. Data Handling.

Authors:  Pouria Rouzrokh; Bardia Khosravi; Shahriar Faghani; Mana Moassefi; Diana V Vera Garcia; Yashbir Singh; Kuan Zhang; Gian Marco Conte; Bradley J Erickson
Journal:  Radiol Artif Intell       Date:  2022-08-24
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.