| Literature DB >> 24091648 |
Dmitriy Dligach1, Steven Bethard, Lee Becker, Timothy Miller, Guergana K Savova.
Abstract
OBJECTIVE: To research computational methods for discovering body site and severity modifiers in clinical texts.Entities:
Keywords: biomedical informatics; information extraction; natural language processing; relation extraction
Mesh:
Year: 2013 PMID: 24091648 PMCID: PMC3994852 DOI: 10.1136/amiajnl-2013-001766
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Description and statistics of the SHARP and ShARe corpora
| Corpus | SHARP | ShARe |
|---|---|---|
| Type of notes | Radiology, pathology, oncology | ICU notes, discharge summaries |
| Tokens | 70 704 | 104 918 |
| Sentences | 4801 | 8058 |
| Entity mentions | 11 781 | 5541 |
| Entity mention pairs | 36 865 | 6441 |
| LocationOf relations | 5025 | 2190 |
| DegreeOf relations | 729 | 702 |
| LocationOf agreement | 0.74 | 0.80 |
| DegreeOf agreement | 0.87 | 0.66 |
ShARe, Shared Annotated Resource; SHARP, Strategic Health Advanced Research Project.
Figure 1Some of the features used to predict the LocationOf relation in an example sentence.
Model performance for on the SHARP and ShARe test sets
| Relation | Test corpus | Model | Precision | Recall | F1 |
|---|---|---|---|---|---|
| LocationOf | SHARP | Baseline 1 | 0.900 | 0.096 | 0.174 |
| Baseline 2 | 0.910 | 0.198 | 0.325 | ||
| Baseline 3 | 0.858 | 0.431 | 0.574 | ||
| Baseline 4 | 0.551 | 0.522 | 0.536 | ||
| Baseline 5 | 0.758 | 0.340 | 0.470 | ||
| SVM trained on SHARP | 0.786 | 0.699 | 0.740 | ||
| Composite (TK+features) | 0.828 | 0.661 | 0.735 | ||
| Human agreement | – | – | 0.744 | ||
| ShARe | Baseline 1 | 1.000 | 0.356 | 0.525 | |
| Baseline 2 | 1.000 | 0.381 | 0.552 | ||
| Baseline 3 | 0.971 | 0.777 | 0.863 | ||
| Baseline 4 | 0.521 | 0.700 | 0.598 | ||
| Baseline 5 | 0.941 | 0.556 | 0.699 | ||
| SVM trained on ShARe | 0.953 | 0.867 | 0.908 | ||
| SVM trained on SHARP | 0.916 | 0.883 | 0.899 | ||
| Human agreement | – | – | 0.800 | ||
| DegreeOf | SHARP | Baseline 1 | 1.000 | 0.044 | 0.084 |
| Baseline 2 | 1.000 | 0.044 | 0.084 | ||
| Baseline 3 | 0.907 | 0.857 | 0.881 | ||
| Baseline 4 | 0.896 | 0.758 | 0.821 | ||
| Baseline 5 | 0.860 | 0.473 | 0.610 | ||
| SVM trained on SHARP | 0.869 | 0.945 | 0.905 | ||
| Composite (TK+features) | 0.840 | 0.923 | 0.880 | ||
| Human agreement | – | – | 0.871 | ||
| ShARe | Baseline 1 | 0.944 | 0.121 | 0.214 | |
| Baseline 2 | 0.947 | 0.128 | 0.225 | ||
| Baseline 3 | 0.977 | 0.887 | 0.929 | ||
| Baseline 4 | 0.929 | 0.745 | 0.827 | ||
| Baseline 5 | 0.404 | 0.979 | 0.571 | ||
| SVM trained on ShARe | 0.929 | 0.929 | 0.929 | ||
| SVM trained on SHARP | 0.926 | 0.887 | 0.906 | ||
| Human agreement | – | – | 0.664 |
ShARe, Shared Annotated Resource; SHARP, Strategic Health Advanced Research Project.
Performance of models with various features removed on the SHARP development set
| Included features | LocationOf | DegreeOf | ||
|---|---|---|---|---|
| F1 | ΔF1 | F1 | ΔF1 | |
| All | 0.776 | 0.972 | ||
| No token features | 0.742 | −0.034 | 0.909 | −0.063 |
| No POS features | 0.768 | −0.008 | 0.963 | −0.009 |
| No chunking features | 0.766 | −0.010 | 0.972 | 0 |
| No named entity features | 0.712 | −0.064 | 0.904 | −0.068 |
| No dependency tree features | 0.757 | −0.019 | 0.944 | −0.028 |
| No dependency path features | 0.755 | −0.021 | 0.954 | −0.018 |
SHARP, Strategic Health Advanced Research Project.