| Literature DB >> 24303254 |
Dingcheng Li1, Gyorgy Simon, Christopher G Chute, Jyotishman Pathak.
Abstract
The increasing adoption of electronic health records (EHRs) due to Meaningful Use is providing unprecedented opportunities to enable secondary use of EHR data. Significant emphasis is being given to the development of algorithms and methods for phenotype extraction from EHRs to facilitate population-based studies for clinical and translational research. While preliminary work has shown demonstrable progress, it is becoming increasingly clear that developing, implementing and testing phenotyping algorithms is a time- and resource-intensive process. To this end, in this manuscript we propose an efficient machine learning technique-distributional associational rule mining (ARM)-for semi-automatic modeling of phenotyping algorithms. ARM provides a highly efficient and robust framework for discovering the most predictive set of phenotype definition criteria and rules from large datasets, and compared to other machine learning techniques, such as logistic regression and support vector machines, our preliminary results indicate not only significantly improved performance, but also generation of rule patterns that are amenable to human interpretation .Entities:
Year: 2013 PMID: 24303254 PMCID: PMC3845788
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Example of ARM
| Diabetes | HTN | OB |
|---|---|---|
| True | Yes | Yes |
| False | No | No |
| False | No | Yes |
Demographic Characteristics-Peripheral Arterial Disease
|
|
|
| |
|---|---|---|---|
| Age, years | 66±11 | 61±8 | <0.0001 |
| Men | 1073 (64%) | 1035 (60%) | 0.0303 |
| Race | NS | ||
| White | 1592 (92%) | 1566 (93%) | – |
| Black or African American | 5 (0.3%) | 11 (0.6%) | – |
| Native American Indian or Alaskan | 2 (0.1%) | 5 (0.2%) | – |
| Asian | 4 (0.2%) | 0 (0%) | – |
| Other | 7 (0.4%) | 10 (0.6%) | – |
| Unknown | 110 (6.4%) | 89 (5.3%) | – |
| Missing | 5 (0.3%) | 6 (0.4%) | – |
| Geographical distribution | <0.0001 | ||
| Minnesota | 918 (54%) | 1047 (61%) | – |
| Iowa | 204 (12%) | 96 (5%) | – |
| Wisconsin | 125 (7%) | 77 (4%) | – |
| Illinois | 109 (6%) | 120 (7%) | – |
| Michigan | 67 (4%) | 62 (4%) | – |
| Other | 264 (16%) | 323 (19%) |
Age is presented as mean±SD.
Categorical variables are presented as percentages (%).
PAD, peripheral arterial disease.
CM (Confusion Matrix) for V48 vs Label
|
|
|
|
|---|---|---|
|
| ||
|
|
|
|
|
|
|
|
Top Rules Ranking List
| NO. | Support | SupportD | Precision | Item set |
|---|---|---|---|---|
| 1 | 281 | 270 | 0.961 | V48 V86 V142 V245 V82080 |
| 2 | 280 | 269 | 0.96 | V48 V57 V74 V86 V245 V82080 |
| 3 | 274 | 263 | 0.95 | V48 V52 V57 V74 V244 V246 V82080 |
| 4 | 278 | 263 | 0.94 | V48 V52 V57 V87 V82080 |
| 5 | 278 | 263 | 0.94 | V48 V57 V86 V216 V221 V82080 |
Meaning and ranking of Items
| Items | Times | Diagnosis Meaning |
|---|---|---|
| V48 | 5 |
|
| V82080 | 5 |
|
| V57 | 4 | Deficiency and other anemia |
| V86 | 3 | Hypertension with complications and secondary hypertension |
| V74 | 2 | Retinal detachments; defects; vascular occlusion; and retinopathy |
| V52 | 2 | Gout and other crystal arthropathies |
| V245 | 2 | Residual codes; unclassified |
| V246 | 1 | Adjustment disorders |
| V221 | 1 | Open wounds of head; neck; and trunk |
| V216 | 1 | Other fractures |
| V244 | 1 | Other screening for suspected conditions (not mental disorders or infectious diseases) |
| V142 | 1 | Acute and unspecified renal failure |
Measure Metrics for All Models
| Model | ARM | DT | LR | SVM | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| cutoff | 0.95 | 0.93 | 0.92 | 0.88 | 0.75 | 0.70 | 0.95 | 0.7 | 0.6 | 0.7 | 0.6 | 0.55 |
| Precision | 0.9 | 0.887 | 0.868 | 0.903 | 0.888 | 0.881 | 0.904 | 0.889 | 0.883 | 0.904 | 0.893 | 0.881 |
| Recall | 0.112 | 0.894 | 0.966 | 0.812 | 0.889 | 0.925 | 0.693 | 0.796 | 0.819 | 0.784 | 0.858 | 0.878 |
| F-score | 0.199 | 0.895 | 0.914 | 0.855 | 0.8885 | 0.902 | 0.785 | 0.840 | 0.849 | 0.839 | 0.875 | 0.879 |
Figure 1
ROC curve for ARM and SVM classifiers