| Literature DB >> 25432799 |
Zubair Afzal1, Ewoud Pons2, Ning Kang3, Miriam C J M Sturkenboom4, Martijn J Schuemie5, Jan A Kors6.
Abstract
BACKGROUND: In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. We created a Dutch clinical corpus containing four types of anonymized clinical documents: entries from general practitioners, specialists' letters, radiology reports, and discharge letters. Using a Dutch list of medical terms extracted from the Unified Medical Language System, we identified medical terms in the corpus with exact matching. The identified terms were annotated for negation, temporality, and experiencer properties. To adapt the ConText algorithm, we translated English trigger terms to Dutch and added several general and document specific enhancements, such as negation rules for general practitioners' entries and a regular expression based temporality module.Entities:
Mesh:
Year: 2014 PMID: 25432799 PMCID: PMC4264258 DOI: 10.1186/s12859-014-0373-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Statistics of the four document types in the EMC clinical corpus
|
|
|
|
| |
|---|---|---|---|---|
| GP entries | 2000 | 3626 | 23 | (14–38) |
| Specialist letters | 2000 | 2748 | 39 | (16–113) |
| Radiology reports | 1500 | 3684 | 66 | (46–94) |
| Discharge letters | 2000 | 2830 | 163 | (95–201) |
*Median (interquartile range).
Inter-annotator agreement on contextual properties in the EMC clinical corpus
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| GP entries | 0.90 | 0.86 | 0.57 | 0.48 | 0.92 |
| Specialist letters | 0.90 | 0.93 | 0.62 | 0.46 | 0.98 |
| Radiology reports | 0.93 | 0.61 | 0.63 | 0.57 | 0.53 |
| Discharge letters | 0.94 | 0.95 | 0.56 | n/a | 0.98 |
Distribution of the contextual property values in different types of clinical documents
|
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| ||
| GP entries | 3626 | 12% | 88% | 97% | 2% | 1% | 98% | 2% |
| Specialist letters | 2748 | 15% | 85% | 90% | 8% | 2% | 99% | 1% |
| Radiology reports | 3684 | 16% | 84% | 96% | 3% | 1% | 99.9% | 0.1% |
| Discharge letters | 2830 | 13% | 87% | 94% | 6% | 0% | 98% | 2% |
Number of English and Dutch trigger terms for each contextual property
|
|
|
|
|---|---|---|
| Negation | 160 | 395 |
| Temporality | 42 | 62 |
| Experiencer | 44 | 52 |
| Total | 246 | 509 |
Results on the evaluation set using only the translated terms from English to Dutch (baseline) and the final ContextD results with modifications (final)
|
|
|
| |||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| Negated | |||||||
| GP entries | 175 | 0.96 | 0.88 | 0.66 | 0.90 | 0.78 | 0.89 |
| Specialist letters | 177 | 0.93 | 0.84 | 0.63 | 0.90 | 0.75 | 0.87 |
| Radiology reports | 287 | 0.96 | 0.91 | 0.55 | 0.97 | 0.70 | 0.93 |
| Discharge letters | 180 | 0.98 | 0.92 | 0.67 | 0.93 | 0.79 | 0.92 |
|
| |||||||
| GP entries | 1365 | 0.97 | 0.98 | 0.98 | 0.94 | 0.98 | 0.96 |
| Specialist letters | 919 | 0.91 | 0.95 | 0.99 | 0.92 | 0.95 | 0.94 |
| Radiology reports | 1341 | 0.97 | 0.98 | 0.98 | 0.96 | 0.97 | 0.97 |
| Discharge letters | 1140 | 0.93 | 0.97 | 0.98 | 0.91 | 0.95 | 0.94 |
|
| |||||||
| GP entries | 28 | 0.15 | 0.17 | 0.17 | 0.54 | 0.16 | 0.26 |
| Specialist letters | 66 | 0.47 | 0.41 | 0.10 | 0.76 | 0.17 | 0.54 |
| Radiology reports | 52 | 0.30 | 0.37 | 0.30 | 0.67 | 0.30 | 0.48 |
| Discharge letters | 90 | 0.36 | 0.39 | 0.13 | 0.78 | 0.19 | 0.52 |
|
| |||||||
| GP entries | 17 | 0 | 0 | 0 | 0 | 0 | 0 |
| Specialist letters | 29 | 0 | 0.67 | 0 | 0.07 | 0 | 0.13 |
| Radiology reports | 6 | 0 | 0.67 | 0 | 0.33 | 0 | 0.44 |
| Discharge letters | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| |||||||
| GP entries | 1379 | 0.98 | 0.98 | 1.00 | 0.99 | 0.99 | 0.99 |
| Specialist letters | 999 | 0.99 | 0.99 | 1.00 | 0.99 | 0.99 | 0.99 |
| Radiology reports | 1398 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| Discharge letters | 1220 | 0.98 | 0.99 | 1.00 | 1.00 | 0.99 | 0.99 |
Error analysis of false negatives in the evaluation set
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Missing trigger | 15 | 7 | 7 | 11 | 40 |
| Complex trigger | 1 | 8 | 2 | 8 | 19 |
| Complex sentence | 1 | - | 15 | 1 | 17 |
| Trigger variation | - | 7 | - | 1 | 8 |
| Other | 9 | 3 | 1 | 4 | 16 |
|
| 25 | 25 | 25 | 25 | 100 |
Error analysis of false positives in the evaluation set
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Trigger does not apply to condition | 9 | 7 | 8 | 8 | 32 (37) |
| Annotation error | 2 | 8 | 2 | - | 12 (14) |
| Ambiguous trigger | - | - | - | 12 | 12 (14) |
| Trigger problem | - | 10 | - | - | 10 (11) |
| Missing pseudo trigger | 8 | - | - | - | 8 (9) |
| Other | 6 | - | 3 | 4 | 13 (15) |
|
| 25 | 25 | 13 | 24 | 87 (100) |
Comparison of the original ConText algorithm for English with the adapted ContextD algorithm for Dutch
|
|
| ||||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| Negation | Radiology reports | 1.00 | 0.86 | 0.93 | 0.91 | 0.97 | 0.93 |
| Discharge letters | 0.84 | 0.89 | 0.86 | 0.92 | 0.93 | 0.92 | |
| Historical | Radiology reports | - | - | - | 0.37 | 0.67 | 0.48 |
| Discharge letters | 0.68 | 0.77 | 0.73 | 0.39 | 0.78 | 0.52 | |
| Hypothetical | Radiology reports | - | - | - | 0.67 | 0.33 | 0.44 |
| Discharge letters | 1.00 | 0.92 | 0.96 | - | - | - | |
| Experiencer | Radiology reports | - | - | - | 1.00 | 1.00 | 1.00 |
| Discharge letters | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 | 0.99 | |
For ConText, the results are taken from [13]. Only the similar document types in both studies are selected for comparison.