Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 The effect of sample size and disease prevalence on supervised machine learning of narrative data.

Literature DB >> 12463878

The effect of sample size and disease prevalence on supervised machine learning of narrative data.

Lawrence K McKnight¹, Adam Wilcox, George Hripcsak.

Abstract

This paper examines the independent effects of outcome prevalence and training sample sizes on inductive learning performance. We trained 3 inductive learning algorithms (MC4, IB, and Naïve-Bayes) on 60 simulated datasets of parsed radiology text reports labeled with 6 disease states. Data sets were constructed to define positive outcome states at 4 prevalence rates (1, 5, 10, 25, and 50%) in training set sizes of 200 and 2,000 cases. We found that the effect of outcome prevalence is significant when outcome classes drop below 10% of cases. The effect appeared independent of sample size, induction algorithm used, or class label. Work is needed to identify methods of improving classifier performance when output classes are rare.

Entities: Disease

Mesh：

Year: 2002 PMID： 12463878 PMCID： PMC2244149

Source DB: PubMed Journal: Proc AMIA Symp ISSN： 1531-605X

8 in total

1. Classification algorithms applied to narrative reports.

Authors: A Wilcox; G Hripcsak
Journal: Proc AMIA Symp Date: 1999

2. Mining free-text medical records.

Authors: D T Heinze; M L Morsch; J Holbrook
Journal: Proc AMIA Symp Date: 2001

3. Evaluation of negation phrases in narrative clinical reports.

Authors: W W Chapman; W Bridewell; P Hanbury; G F Cooper; B G Buchanan
Journal: Proc AMIA Symp Date: 2001

4. Diagnosing community-acquired pneumonia with a Bayesian network.

Authors: D Aronsky; P J Haug
Journal: Proc AMIA Symp Date: 1998

5. Bayesian modeling for linking causally related observations in chest X-ray reports.

Authors: W W Chapman; P J Haug
Journal: Proc AMIA Symp Date: 1998

6. A simulation study of the number of events per variable in logistic regression analysis.

Authors: P Peduzzi; J Concato; E Kemper; T R Holford; A R Feinstein
Journal: J Clin Epidemiol Date: 1996-12 Impact factor: 6.437

7. A general natural-language text processor for clinical radiology.

Authors: C Friedman; P O Alderson; J H Austin; J J Cimino; S B Johnson
Journal: J Am Med Inform Assoc Date: 1994 Mar-Apr Impact factor: 4.497

8. Unlocking clinical data from narrative reports: a study of natural language processing.

Authors: G Hripcsak; C Friedman; P O Alderson; W DuMouchel; S B Johnson; P D Clayton
Journal: Ann Intern Med Date: 1995-05-01 Impact factor: 25.391

8 in total

2 in total

1. Facilitating surveillance of pulmonary invasive mold diseases in patients with haematological malignancies by screening computed tomography reports using natural language processing.

Authors: Michelle R Ananda-Rajah; David Martinez; Monica A Slavin; Lawrence Cavedon; Michael Dooley; Allen Cheng; Karin A Thursky
Journal: PLoS One Date: 2014-09-24 Impact factor: 3.240

Review 2. Structural neuroimaging as clinical predictor: A review of machine learning applications.

Authors: José María Mateos-Pérez; Mahsa Dadar; María Lacalle-Aurioles; Yasser Iturria-Medina; Yashar Zeighami; Alan C Evans
Journal: Neuroimage Clin Date: 2018-08-10 Impact factor: 4.881

2 in total