| Literature DB >> 26911811 |
Elizabeth Ford1, John A Carroll2, Helen E Smith3, Donia Scott2, Jackie A Cassell3.
Abstract
BACKGROUND: Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality.Entities:
Keywords: case detection; data quality; electronic health records; review; text mining
Mesh:
Year: 2016 PMID: 26911811 PMCID: PMC4997034 DOI: 10.1093/jamia/ocv180
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1:Flow diagram of study selection.
Types of Conditions Studied
| Type of condition | No. of studies |
Conditions included (
|
|---|---|---|
| Chronic or noncommunicable conditions | 42 (59%) | Obesity (7), cancer (4), rheumatoid and psoriatic arthritis (5), diabetes (3), inflammatory bowel disease (incl. celiac) (3), asthma (3), COPD (2), pancreatic cyst (2), heart failure (2) hypertension, angina pectoris, atrial fibrillation, disorders of sex development, multiple sclerosis, hepatobiliary disease, cataract, priapism, facial pain, peripheral arterial disease, coronary artery disease |
| Infectious diseases | 18 (25%) | Acute respiratory infection (2), pneumonia (4) influenza or influenza-like illness (5) MRSA (2), gastrointestinal infection, genital chlamydia, chicken pox, fever, hospital acquired urinary tract infection |
| Psychological disorders | 4 (6%) | Depression (2), binge eating disorder, bipolar disorder |
| Injuries and events | 7 (10%) | Venous thromboembolism (2) acute myocardial infarction, upper GI bleeding, ischemic stroke, acute renal failure, acute orbital fracture |
Types of Case-Detection Algorithms
| Type of case-detection | No. of studies (%) | Detail |
|---|---|---|
| No additional algorithm (manual review of information) | 3 (4) | |
| Single keyword or code sufficient to define case | 4 (6) | |
| Same NLP algorithm as extracted info also detected cases (text only) | 15 (23) | |
| New rule-based CDA (text only) | 11 (16) | |
| Logistic regression or machine learning CDA (text only) | 5 (4) |
Logistic regression
|
| New rule-based CDA (combining text with codes, labs, or medication) | 12 (18) | |
| Logistic regression CDA (combining text with codes, labs or medication) | 8 (12) | |
| Machine learning algorithm (combining text with codes, labs, or medication) | 6 (9) |
Ripper
|
|
Support vector machines (SVM)
| ||
|
Decision tree, vs SVM vs Ripper vs metacost
| ||
|
Naïve Bayes vs SVM vs random forest vs logistic regression
| ||
|
Bayesian network model vs EM-MAP model
| ||
|
Random forest
| ||
| Comparison of rule based CDA with machine learning and logistic regression CDAs (combining text with codes, labs, or medication) | 3 (4) |
Rule based vs SVM vs random forest vs Ripper vs logistic regression
|
|
Rule based vs logistic regression
| ||
|
Rule based vs decision tree
|
Median accuracy by algorithm type and condition
| No. of Studies | Sensitivity (Recall) | Specificity | PPV (Precision) | Negative predictive value | F measure | AUROC | |
|---|---|---|---|---|---|---|---|
|
| |||||||
| Single algorithm for NLP and case detection | 15 | 96.2 | 97.4 | 85.35 | 96.6 | 49 | – |
| Rule-based secondary case detection algorithm | 20 | 91.2 | 95.45 | 77.5 | 98.95 | 97.57 | 94.4 |
| Probabilistic secondary case detection algorithm (Logistic Regression; Bayesian; machine learning) | 21 | 80 | 95 | 86 | 95.4 | 77 | 94 |
|
| |||||||
| Respiratory infections | 11 | 92.9 | 95.45 | 54 | 99.9 | – | 95.85 |
| Bowel disease | 4 | 79.45 | 94.45 | 57.5 | 100 | – | 87.5 |
| Inflammatory arthritis | 5 | 70 | 96 | 93.7 | – | – | 94.4 |
| Cancer | 3 | 93 | 92.9 | 95 | – | 93.5 | – |
| Diabetes | 2 | 96.2 | 98 | – | – | 98.65 | – |
| Obesity | 5 | 48.4 | – | 76.3 | – | 49 | – |
| Mental health | 3 | 73.1 | 90 | 87.85 | 96.6 | – | 80 |
| MRSA | 2 | 99.2 | 99.4 | 97.9 | – | 99 | – |
| Cardiovascular | 7 | 82 | 96 | 84.7 | 93 | 74.85 | 92.9 |
Accuracy of case-detection algorithms comparing codes and text
|
Codes only
|
Text only
|
Combination of codes + text
| ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Study | Condition | Sensitivity (Recall) | PPV (precision) | AUROC | Sensitivity (Recall) | PPV (precision) | AUROC | Sensitivity (Recall) | PPV (Precision) | AUROC |
|
Gundlapalli (2008)
| Inflammatory bowel disease | 27 | 50 | 64 | 86 | 43 | 90 | 100 | 40 | 99 |
|
Graiser (2007)
| Lymphoma | 42.9 | 90.0 | 81.2 | ||||||
|
Valkhoff (2014)
| Upper GI bleed | |||||||||
| ICD-9 (ARS) | 72 | |||||||||
| ICD-9 (HSD) | 78 | |||||||||
| ICD-10 (Aarhus) | 77 | 47 | ||||||||
| ICPC codes | 21 | 22 | ||||||||
|
DeLisle (2013)
| Pneumonia | 52 | 52.8 | 74.8 | 63.6 | |||||
|
Li (2008)
| Ischemic stroke | 90 | 56 | |||||||
|
Ludvigsson (2014)
| Celiac | 53.8 | 78.1 | |||||||
|
Pakhomov (2007)
| Angina | 88 | 88 | |||||||
|
Ananthakrishnan (2013)
| Inflammatory bowel disease: Crohn’s | 89 | 95 | |||||||
| Ulcerative colitis | 86 | 94 | ||||||||
|
Carroll (2012)
| Rheumatoid arthritis | 49 | 80 | 88 | 71 | 86 | 97 | |||
|
Liao (2010)
| Rheumatoid arthritis | 51 | 88 | 56 | 89 | 63 | 94 | |||
|
Xia (2014)
| Multiple sclerosis | 76.4 | 91.6 | 93.7 | 75.8 | 91.4 | 94.1 | 82.7 | 92.1 | 95.8 |
|
DeLisle (2010)
| Acute respiratory infection | 79 | 31.5 | 88 | 88 | 18 | 94 | 73 | 52 | 86 |
|
Zheng (2014)
| Acute respiratory infection | 79 | 31 | 78 | 88 | 18 | 90 | 75 | 49 | 87 |
|
Carroll (2011)
| Rheumatoid arthritis | 78.1 | 93.2 | 95.5 | 68.8 | 91.8 | 89.5 | 85.8 | 93.7 | 96.6 |
|
Karnik (2012)
| Atrial fibrillation | 61.7 | 59.8 | 62.7 | 58 | 60 | 60 | |||
|
Castro (2015)
| Bipolar disorder | 79 | 85 | |||||||
|
McPeek (2013)
| Venous thromboembolism | 69 | 95 | 90 | ||||||
|
Wu (2013)
| Asthma | 30.8 | 57.1 | 84.6 | 88.0 | |||||
|
Zeng 2006
| Asthma and COPD | 72.5 | 82.3 | 76.7 | 82.3 | 92.4 | 87.4 | |||
|
|
|
|
|
|
|
|
|
|
| |