| Literature DB >> 29888035 |
Willie Boag1, Dustin Doss1, Tristan Naumann1, Peter Szolovits1.
Abstract
Electronic Health Records (EHRs) have seen a rapid increase in adoption during the last decade. The narrative prose contained in clinical notes is unstructured and unlocking its full potential has proved challenging. Many studies incorporating clinical notes have applied simple information extraction models to build representations that enhance a downstream clinical prediction task, such as mortality or readmission. Improved predictive performance suggests a "good" representation. However, these extrinsic evaluations are blind to most of the insight contained in the notes. In order to better understand the power of expressive clinical prose, we investigate both intrinsic and extrinsic methods for understanding several common note representations. To ensure replicability and to support the clinical modeling community, we run all experiments on publicly-available data and provide our code.Entities:
Year: 2018 PMID: 29888035 PMCID: PMC5961801
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1.An example clinical note. The age, gender, and admitting diagnosis have been highlighted. Also note, thatdescriptions such as “status worsening” suggest deterioration and possible in-hospital mortality.
Figure 2.A patient’s time in the ICU generates a sequence of timestamped notes. Each of the methods described transforms the sequence of notes into a fixed-length vector representing the ICU stay.
Figure 3.How the embedding for a single document is built by combining constituent word embeddings.
AUCs for the binary classification tasks.
| in-hospital mortality | admission type | gender | ethnicity | |
|---|---|---|---|---|
| BoW | ||||
| Embeddings | 0.814 | 0.873 | 0.836 | 0.580 |
| LSTM | 0.777 | 0.870 | 0.837 | 0.533 |
Macro-average F1 scores for the multi-way classification tasks.
| diagnosis | length of stay | age | |
|---|---|---|---|
| BoW | 0.828 | 0.724 | |
| Embeddings | 0.828 | 0.730 | 0.544 |
| LSTM | 0.450 |
Figure 5.PCA 2-D projection of the word embeddings. Vectors of the special age tokens are colored red. Note thatthese tokens cluster close together in the embedding.
Most predictive words for gender
| (a) Male | |
|---|---|
| man | 1.4012 |
| he | 1.0589 |
| wife | 0.9953 |
| male | 0.7956 |
| his | 0.6772 |
| prostate | 0.2435 |
| prop | 0.1965 |
| ofm | 0.1850 |
| hematuria | 0.1816 |
| esophagectomy | 0.1812 |
| distention | 0.1756 |
| trauma | 0.1748 |
Most predictive words for admission types
| (a) ‘Urgent’ admissions | |
|---|---|
| ew | 0.2639 |
| er | 0.2495 |
| fracture | 0.2258 |
| fx | 0.2248 |
| osh | 0.2235 |
| b | 0.2194 |
| disease | 0.2138 |
| vertebral | 0.2061 |
| cabg | 0.2029 |
| fractures | 0.1971 |
| fall | 0.1893 |
| arteriogram | 0.1877 |
Most predictive words for length-of-stay
| (a) Short stay (0 - 1.5 days) | |
|---|---|
| ml | 0.5295 |
| pt | 0.5086 |
| to | 0.3570 |
| b | 0.3403 |
| sensitivity | 0.2489 |
| meq | 0.2090 |
| atrial | 0.1934 |
| tamponade | 0.1784 |
| valuables | 0.1770 |
| vomited | 0.1738 |
| s | 0.1708 |
| weaning | 0.1676 |