| Literature DB >> 30474078 |
Travis R Goodwin1, Sanda M Harabagiu1.
Abstract
OBJECTIVE: We explored how judgements provided by physicians can be used to learn relevance models that enhance the quality of patient cohorts retrieved from Electronic Health Records (EHRs) collections.Entities:
Keywords: information storage and retrieval; machine learning; medical informatics; search engine
Year: 2018 PMID: 30474078 PMCID: PMC6241510 DOI: 10.1093/jamiaopen/ooy010
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Figure 1.Architecture of a typical patient cohort retrieval system evaluated in TRECMed.
Examples of cohort descriptions used to train and evaluate the learning cohort retrieval system
| EEG reports | TRECMed 2011 | TRECMed 2012 |
|---|---|---|
| Patients experiencing seizures and generalized shaking | Patients with complicated GERD who receive endoscopy | Adult patients with Alzheimer’s disease admitted from nursing homes with pressure ulcers |
| Multiple sclerosis and seizure | Women with osteopenia | Elderly patients with subdural hematoma |
| Patients under 18-year-old with absence seizures | Female patient with breast cancer with mastectomies during admission | Patients admitted with Hepatitis C and IV drug use |
| Patients over age 18 with history of developmental delay and EEG with electrographic seizures | Adult patients who are admitted with asthma exacerbation | Patients treated for post-partum problems including depression, hypercoagulability, or cardiomyopathy |
| Patients evaluated for seizures vs stroke | Patients with CAD who presented to the Emergency Department with Acute Coronary Syndrome and were given Plavix | Patients with inflammatory disorders receiving TNF-inhibitor treatment |
| Brain tumor and sharp waves, spike/polyspike, and wave or spikes | Children admitted with cerebral palsy who received physical therapy | Adults under age 60 undergoing alcohol withdrawal |
| EEG showing triphasic waves | Patients co-infected with hepatitis C and HIV | Patients with AIDS who develop pancytopenia |
| Patients with anoxic brain injury and EEG reports denoting brain death | Adult patients who presented to the emergency room with anion gap acidosis secondary to insulin dependent diabetes | Patients with hypertension on anti-hypertensive medication |
| EEGs without sharp waves, spikes, or spike/polyspike and wave activity in patient’s diagnosed with epilepsy | Patients with dementia | Patients taking atypical antipsychotics without a diagnosis schizophrenia or bipolar depression |
| EEG showing generalized periodic epileptiform discharges | Cancer patients with liver metastasis treated in the hospital who underwent a procedure | Patients who develop thrombocytopenia in pregnancy |
Figure 2.Architecture of the learning patient cohort retrieval system.
Figure 3.Overview of the different approaches for (a) query construction and (b) query expansion used for feature extraction in the learning patient cohort retrieval system.
Figure 4.Indexed Streams from EEG reports (left) and hospital records (right). EEG: electroencephalography.
Features extracted for a cohort description and hospital visit
| Feature description | Domain of values | |
|---|---|---|
| Number of | ||
| Number of | ||
| Number of | ||
| Distribution of | ||
| Whether the | ||
| Whether the | ||
| Whether the | ||
| | ||
Additional details for each feature are provided in Supplementary Material Appendix E. represents the natural numbers, represents the real numbers, and the exponent (if provided) indicates the dimensionality, or number of values produced by that feature in the resultant feature vector).
Patient cohort retrieval performance on (a) EEG reports and (b) TRECMed
| Setting | MAP | NDCG | BPref | rPrec | P@10 |
|---|---|---|---|---|---|
| (a) Retrieval performance when retrieving patient cohorts from EEG reports | |||||
| BM25 baseline: 10-fold CV | 0.4996 | 0.6144 | 0.4064 | 0.5213 | 0.6 |
| L-PCR: 10-fold CV | 0.6634 | 0.7171 | 0.5900 | 0.6088 | 0.6 |
| MERCuRY (text-only): 10-fold CV | 0.5220 | 0.5441 | 0.4483 | 0.5081 | 0.5 |
| (b) Retrieval performance when retrieving the patient cohorts using in TRECMed from Hospital Records. | |||||
| BM25 baseline: evaluated on 2011 | 0.4052 | 0.5202 | 0.5082 | 0.4112 | 0.600 |
| BM25 baseline: evaluated on 2012 | 0.2930 | 0.3462 | 0.3462 | 0.3135 | 0.464 |
| L-PCR: 10-fold CV on 2011 | 0.6316 | 0.8816 | 0.5788 | 0.5859 | 0.706 |
| L-PCR: 10-Fold CV on 2012 | 0.5100 | 0.8194 | 0.4703 | 0.5028 | 0.589 |
| L-PCR: trained on 2012 and evaluated on 2011 | 0.6127 | 0.8675 | 0.5638 | 0.5763 | 0.674 |
| L-PCR: trained on 2011 and evaluated on 2012 | 0.5145 | 0.8167 | 0.4735 | 0.5072 | 0.596 |
| Best submitted to TRECMed 2011 | — | — | 0.5502 | 0.4400 | 0.656 |
| Best submitted to TRECMed 2012 | 0.2860 | 0.5780 | — | — | 0.592 |
Figure 5.Average feature importance as measured using EEG reports and TRECMed hospital records. EEG: electroencephalography.