Literature DB >> 12463878

The effect of sample size and disease prevalence on supervised machine learning of narrative data.

Lawrence K McKnight1, Adam Wilcox, George Hripcsak.   

Abstract

This paper examines the independent effects of outcome prevalence and training sample sizes on inductive learning performance. We trained 3 inductive learning algorithms (MC4, IB, and Naïve-Bayes) on 60 simulated datasets of parsed radiology text reports labeled with 6 disease states. Data sets were constructed to define positive outcome states at 4 prevalence rates (1, 5, 10, 25, and 50%) in training set sizes of 200 and 2,000 cases. We found that the effect of outcome prevalence is significant when outcome classes drop below 10% of cases. The effect appeared independent of sample size, induction algorithm used, or class label. Work is needed to identify methods of improving classifier performance when output classes are rare.

Entities:  

Mesh:

Year:  2002        PMID: 12463878      PMCID: PMC2244149     

Source DB:  PubMed          Journal:  Proc AMIA Symp        ISSN: 1531-605X


  8 in total

1.  Classification algorithms applied to narrative reports.

Authors:  A Wilcox; G Hripcsak
Journal:  Proc AMIA Symp       Date:  1999

2.  Mining free-text medical records.

Authors:  D T Heinze; M L Morsch; J Holbrook
Journal:  Proc AMIA Symp       Date:  2001

3.  Evaluation of negation phrases in narrative clinical reports.

Authors:  W W Chapman; W Bridewell; P Hanbury; G F Cooper; B G Buchanan
Journal:  Proc AMIA Symp       Date:  2001

4.  Diagnosing community-acquired pneumonia with a Bayesian network.

Authors:  D Aronsky; P J Haug
Journal:  Proc AMIA Symp       Date:  1998

5.  Bayesian modeling for linking causally related observations in chest X-ray reports.

Authors:  W W Chapman; P J Haug
Journal:  Proc AMIA Symp       Date:  1998

6.  A simulation study of the number of events per variable in logistic regression analysis.

Authors:  P Peduzzi; J Concato; E Kemper; T R Holford; A R Feinstein
Journal:  J Clin Epidemiol       Date:  1996-12       Impact factor: 6.437

7.  A general natural-language text processor for clinical radiology.

Authors:  C Friedman; P O Alderson; J H Austin; J J Cimino; S B Johnson
Journal:  J Am Med Inform Assoc       Date:  1994 Mar-Apr       Impact factor: 4.497

8.  Unlocking clinical data from narrative reports: a study of natural language processing.

Authors:  G Hripcsak; C Friedman; P O Alderson; W DuMouchel; S B Johnson; P D Clayton
Journal:  Ann Intern Med       Date:  1995-05-01       Impact factor: 25.391

  8 in total
  2 in total

1.  Facilitating surveillance of pulmonary invasive mold diseases in patients with haematological malignancies by screening computed tomography reports using natural language processing.

Authors:  Michelle R Ananda-Rajah; David Martinez; Monica A Slavin; Lawrence Cavedon; Michael Dooley; Allen Cheng; Karin A Thursky
Journal:  PLoS One       Date:  2014-09-24       Impact factor: 3.240

Review 2.  Structural neuroimaging as clinical predictor: A review of machine learning applications.

Authors:  José María Mateos-Pérez; Mahsa Dadar; María Lacalle-Aurioles; Yasser Iturria-Medina; Yashar Zeighami; Alan C Evans
Journal:  Neuroimage Clin       Date:  2018-08-10       Impact factor: 4.881

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.