| Literature DB >> 31801529 |
Jun Xu1, Zhiheng Li2, Qiang Wei1, Yonghui Wu3, Yang Xiang1, Hee-Jin Lee1, Yaoyun Zhang1, Stephen Wu1, Hua Xu4.
Abstract
BACKGROUND: To detect attributes of medical concepts in clinical text, a traditional method often consists of two steps: named entity recognition of attributes and then relation classification between medical concepts and attributes. Here we present a novel solution, in which attribute detection of given concepts is converted into a sequence labeling problem, thus attribute entity recognition and relation classification are done simultaneously within one step.Entities:
Keywords: Clinical notes; Information extraction; Natural language processing
Mesh:
Year: 2019 PMID: 31801529 PMCID: PMC6894107 DOI: 10.1186/s12911-019-0937-2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Medical concepts and their attributes
| Concept | Attributes | Examples | Comments |
|---|---|---|---|
| Disorder | Negation, Severity, Body location, etc. | Denied any [chest pain]Disorder. | The disorder ‘chest pain’ has associated negation attribute “Denied” and body location attribute ‘chest’. |
| Medication | Dosage, Frequency, Mode, etc. | [insulin Lente]Medication 12 units subcu q p.m. | The dosage attribute is ‘12 units’, the mode attribute is ‘subcu’ and the frequency attribute is ‘q p.m.’. |
| Lab Test | Lab value | [blood pressure]LabTest 134/75 [URINE BLOOD]LabTest - NEG | The ‘blood pressure’ has a numerical value ‘134/75’ and the ‘URINE BLOOD’ has a textual value ‘NEG’. |
Concepts and attributes types included in this study, as well as their distribution in the corpora
| Dataset | # Target Concepts | #Attribute Mentions | |
|---|---|---|---|
| ShARe-Disorder | 17,368 | NEG | 3599 |
| SUB | 191 | ||
| CON | 927 | ||
| SEV | 1286 | ||
| COU | 901 | ||
| UNC | 1348 | ||
| BDL | 8053 | ||
| i2b2-Medication | 8251 | DOS | 3673 |
| MOD | 2752 | ||
| FRE | 3014 | ||
| DUR | 259 | ||
| REA | 537 | ||
| i2b2-LabTest | 7937 | VAL | 6644 |
Fig. 1An illustration of the concept-focused sequence (CFS) transformation, where each separate sequence encodes all attributes for each target concept (Disorder)
The overall performance of different approaches on the share-disorder dataset in detecting 7 attributes of given disorders: negation (neg), subject (sub), conditional (con), severity (sev), course (cou), uncertainty (unc), body location (bdl). best results are shown in boldface
| Attribute | NEG | SUB | CON | SEV | COU | UNC | BDL | |
|---|---|---|---|---|---|---|---|---|
1.1.1.Baseline (Bi-LSTM-CRF + SVM) | Acc. | 0.9323 | 0.9929 | 0.9669 | 0.9655 | 0.9576 | 0.9445 | 0.7524 |
| P | 0.7931 | 0.7374 | 0.6990 | 0.6421 | 0.5068 | 0.4091 | 0.5887 | |
| R | 0.7768 | 0.6348 | 0.5987 | 0.7568 | 0.6437 | 0.4172 | 0.7516 | |
| F | 0.7849 | 0.6822 | 0.6449 | 0.6948 | 0.5671 | 0.4131 | 0.6602 | |
1.1.1.Baseline (Bi-LSTM-CRF + Bi-LSTM) | Acc. | 0.9146 | 0.9900 | 0.9632 | 0.9707 | 0.9597 | 0.9308 | 0.7859 |
| P | 0.8387 | 0.8158 | 0.7872 | 0.7609 | 0.6340 | 0.4380 | 0.7218 | |
| R | 0.7277 | 0.5391 | 0.6054 | 0.8213 | 0.6322 | 0.3819 | 0.784 | |
| F | 0.7793 | 0.6492 | 0.6844 | 0.7900 | 0.6331 | 0.4080 | 0.7516 | |
| 1.1.1.Sequence Labeling | Acc. | |||||||
| P | 0.8142 | 0.8222 | 0.7583 | 0.7812 | 0.6150 | 0.4854 | 0.7887 | |
| R | 0.8310 | 0.6435 | 0.6682 | 0.8859 | 0.7529 | 0.4393 | 0.7991 | |
| F | ||||||||
The overall performance of different approaches on the i2b2-medication dataset in detecting 5 attributes of given medications: dosage (dos), mode (mod), frequency (fre), duration (dur), reason (rea). best results are shown in boldface
| Attribute | DOS | MOD | FRE | DUR | REA | |
|---|---|---|---|---|---|---|
Baseline (Bi-LSTM-CRF + SVM) | Acc. | 0.9201 | 0.9584 | 0.9353 | 0.9783 | 0.9473 |
| P | 0.8794 | 0.9110 | 0.8762 | 0.5945 | 0.5373 | |
| R | 0.9292 | 0.9597 | 0.9390 | 0.6680 | 0.6704 | |
| F | 0.9036 | 0.9347 | 0.9065 | 0.6291 | ||
Baseline (Bi-LSTM-CRF + Bi-LSTM) | Acc. | 0.9250 | 0.9559 | 0.9302 | 0.9680 | 0.9269 |
| P | 0.9305 | 0.9372 | 0.9198 | 0.6168 | 0.5984 | |
| R | 0.9434 | 0.9658 | 0.9399 | 0.6525 | 0.5717 | |
| F | 0.9369 | 0.9513 | 0.5848 | |||
| Sequence Labeling | Acc. | |||||
| P | 0.9728 | 0.9773 | 0.9503 | 0.7785 | 0.7409 | |
| R | 0.9159 | 0.9528 | 0.9078 | 0.4479 | 0.4953 | |
| F | 0.9286 | 0.5686 | 0.5938 | |||
| Usyd [ | P | 0.9189 | 0.9073 | 0.9142 | 0.5604 | 0.6687 |
| R | 0.8678 | 0.8915 | 0.8795 | 0.3709 | 0.3319 | |
| F | 0.8926 | 0.8994 | 0.8965 | 0.4464 | 0.4436 | |
The overall performance of different approaches on the i2b2-labtest dataset in detecting values (val) of given lab tests. Best results are shown in boldface
| Attribute | VAL | |
|---|---|---|
Baseline (Bi-LSTM+SVM) | Acc. | 0.4415 |
| P | 0.7160 | |
| R | 0.4193 | |
| F | 0.5289 | |
Baseline (Bi-LSTM+Bi-LSTM) | Acc. | 0.8993 |
| P | 0.9248 | |
| R | 0.9288 | |
| F | 0.9268 | |
| Sequence Labeling | Acc. | |
| P | 0.9526 | |
| R | 0.9582 | |
| F | ||
Examples of attribute detection errors
| Error Type | Frequency | Example | |||
|---|---|---|---|---|---|
| Matching partially | 26/130 | … were negative for [infection]disorder | NEG | negative | negative for |
| Relating with wrong target concept | 21/130 | … multiple small [collections of blood] in your head. | BDL | head | blood, head |
| Missing one of attribute cues | 5/130 | [Ultralente]medication 14 mg q.a.m., 4 mg | DOS | 14 mg; 4 mg | 14 mg |
| Annotation errors | 13/130 | Father died from [CHF]disorder at 54 | SUB | Father | |
| Others | 65/130 | [Mucomyst]medication precath with good effect | MOD | precath | |