| Literature DB >> 30961587 |
Xuesi Zhou1, Haoqi Xiong2, Sihan Zeng1, Xiangling Fu3, Ji Wu4,5.
Abstract
BACKGROUND: Medical event detection in narrative clinical notes of electronic health records (EHRs) is a task designed for reading text and extracting information. Most of the previous work of medical event detection treats the task as extracting concepts at word granularity, which omits the overall structural information of the clinical notes. In this work, we treat each clinical note as a sequence of short sentences and propose an end-to-end deep neural network framework.Entities:
Keywords: Medical event detection; Neural network models; Sequence labelling
Mesh:
Year: 2019 PMID: 30961587 PMCID: PMC6454668 DOI: 10.1186/s12911-019-0756-5
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Medical event categories and examples
| Category | Instance |
|---|---|
| Description of symptoms | There was no obvious cause of cough in the child for 6 days. The first cough was not severe, the next day there was fever, the body temperature was more than 39, no rash, no chills. |
| History of treatment | Taking retinoic acid induced differentiation therapy from 2014.4.30, 2014.5.3 using daunorubicin 40 mg×3 days chemotherapy |
| History of diagnosis | The patient found hbv infection 4 years ago |
| Admission status | Now I am seeking to further diagnose and treat the “coronary heart disease” to receive our hospital. |
| General status | During the course of the disease, the patient’s sleep diet may be normal, and the weight will not change significantly in the near future. |
| Imaging examination | Outpatient examination x-ray shows: right knee osteoarthritis |
| Laboratory examination | Outpatient OGTT examination showed fasting blood glucose 13.69 mmol/L, 2 hours blood glucose 23.77 mmol/L |
| Electrocardiogram examination | ECG prompts st-t segment depression, t-wave anomaly |
| Endoscopy | Electronic laryngoscopy: right vocal polyp |
| Pathological examination | Lively cut a piece to send pathology to show chronic inflammation of the rectal mucosa with polypoid hyperplasia |
The distribution of 11 labels in the dataset
| Train | Dev. | Test | |
|---|---|---|---|
| Description of symptoms | 13633 | 2921 | 2881 |
| History of treatment | 1845 | 341 | 371 |
| History of diagnosis | 485 | 59 | 105 |
| Admission status | 3071 | 672 | 653 |
| General status | 7100 | 1475 | 1446 |
| Imaging examination | 1468 | 389 | 238 |
| Laboratory examination | 1114 | 260 | 211 |
| Electrocardiogram examination | 41 | 21 | 16 |
| Endoscopy | 19 | 6 | 7 |
| Pathological examination | 128 | 53 | 25 |
| Others | 1927 | 384 | 424 |
| Total | 30831 | 6581 | 6377 |
Medical event detection accuracy of our classification models on the test set
| Model | Acc. |
|---|---|
| bi-LSTM(1 layer) | 82.4% |
| bi-LSTM(2 layers) | 84.2% |
| bi-LSTM(3 layers) | 83.1% |
|
|
|
The bold font means the best result
Medical event detection accuracy of our sequence labelling models on the test set
| Model | Acc. | |
|---|---|---|
| Text classification models | bi-LSTM(1 layer) | 82.4% |
| bi-LSTM(2 layers) | 84.2% | |
| bi-LSTM(3 layers) | 83.1% | |
| text-CNN | 85.8% | |
| Sequence labelling models | bi-LSTM(2 layers) feature extractor + bi-LSTM decoder | 87.3% |
| bi-LSTM(2 layers) feature extractor + smoothed Viterbi decoder | 87.4% | |
| bi-LSTM(2 layers) feature extractor + CRF decoder | 92.1% | |
| CNNs feature extractor + bi-LSTM decoder | 89.9% | |
| CNNs feature extractor + smoothed Viterbi decoder | 90.7% | |
|
|
|
The bold font means the best result
Fig. 1Effect of the smoothed factor c. The points denote the performance of the CNNs feature extractor+smoothed Viterbi decoder model on the test set. The optimal performance is obtained when c=0.3
Fig. 2Heat-maps of confusion matrices. The three confusion matrices correspond to the text-CNN model, the CNNs feature extractor+the smoothed Viterbi decoder and the CNNs feature extractor+the sequential CRF decoder, respectively. Rows are reference and columns are predictions. The value in cell (i,j) denotes the percentage of short sentences in label i that were predicted as label j