| Literature DB >> 27107443 |
Yoni Halpern1, Steven Horng2, Youngduck Choi3, David Sontag1.
Abstract
BACKGROUND: Electronic medical records (EMRs) hold a tremendous amount of information about patients that is relevant to determining the optimal approach to patient care. As medicine becomes increasingly precise, a patient's electronic medical record phenotype will play an important role in triggering clinical decision support systems that can deliver personalized recommendations in real time. Learning with anchors presents a method of efficiently learning statistically driven phenotypes with minimal manual intervention.Entities:
Keywords: clinical decision support systems; electronic health records; knowledge representation; machine learning; natural language processing
Mesh:
Year: 2016 PMID: 27107443 PMCID: PMC4926745 DOI: 10.1093/jamia/ocw011
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Features used to build binary patient description vectors
| Representation | Dimension | |
|---|---|---|
| Age | Binned by decade | 11 |
| Sex | M/F | 2 |
| Medication History | Indicators by medication generic sequence number | 1947 |
| Medication Dispensing Record | 279 | |
| Triage Vitals | Binned by decision tree | 77 |
| Lab Results | 2805 | |
| Triage Assessment | Binary bag-of-words | 7073 |
| MD Comments | 8909 |
Phenotype variables used for evaluation
| Phenotype | Disposition Question | N | Pos |
|---|---|---|---|
| Cardiac – acute | In the workup of this patient, was a cardiac etiology suspected? | 17 258 | 0.068 |
| Infection – acute | Do you think this patient has an infection? (Suspected or proven viral, fungal, protozoal, or bacterial infection) | 62 589 | 0.213 |
| Pneumonia – acute | Do you think this patient has pneumonia? | 9934 | 0.073 |
| Septic shock – acute | Is the patient in septic shock? | 6867 | 0.020 |
| Nursing home – history | Is the patient from a nursing home or similar facility? (Interpret as if you would be giving broad-spectrum antibiotics) | 36 256 | 0.045 |
| Anticoagulated – history | Prior to this visit, was the patient on anticoagulation? (Excluding antiplatelet agents like aspirin or Plavix) | 1082 | 0.047 |
| Cancer – history | Does the patient have an active malignancy? (Malignancy not in remission, and recent enough to change clinical thinking) | 4091 | 0.042 |
| Immunosuppressed – history | Is the patient currently immunocompromised? | 12 857 | 0.040 |
A selection of the 42 phenotypes built as part of this ongoing project. Each phenotype is defined by its anchors, which can be specified as ICD9 codes, medications (history or dispensed), or free text. When a large number of anchors are specified, only a selection are shown. For display, medications are grouped by extended therapeutic class.
Each phenotype is defined by its anchors, which can be specified as ICD9 codes, medications (history or dispensed), or free text. When a large number of anchors are specified, only a selection are shown. For display, medications are grouped by extended therapeutic class.
Medication dispensing record Medication history ICD9 codes Medical Text
Top 20 weighted terms in the classifiers for 3 of the learned phenotypes. These classifiers are learned using medical records as they appear at time of disposition from the emergency department.
Triage Assessment MD Comments Medication History Medication Dispensing Record Triage Vitals Lab Results
Figure 1Comparison of performance of phenotypes learned with 200 000 unlabeled patients using the semi-supervised anchor based method, and phenotypes learned with supervised classification using 5000 gold-standard labels. Error bars indicate 2 * standard error. For anticoagulated and cancer, there were not a sufficient number of gold-standard labels to learn with 5000 patients, so the fully supervised baseline is omitted.
Figure 2Changes to patient records over time. The time of every change to the patient record is recorded (measured in minutes from arrival) and a non-parametric kernel density estimator is used to plot the distribution of times at which changes occur.
Figure 3Influence and highly changing features for the pneumonia phenotype extractor as a function of time.
Figure 4Additive change in AUC from baseline for phenotype extraction as a function of the features used. The baseline phenotype extraction uses only features from age, sex, and triage vitals and its value is indicated for each phenotype on the y-axis label. In each plot, the bars on the left use structured data while the center bars use free-text data. Hatched lines represent a combination of features. A star is placed below the single feature that has the highest performance. From left to right, the classifiers used: Med – Medication history (prior to visit) Pyx – Medication dispensing record (during visit) Lab – Laboratory values Strct – All structured data (Med + Pyx + Labs) Tri – Triage nursing text MD – Physician comments Txt – All Text (Tri + MD) All – All features (Structured + Text)