| Literature DB >> 21565856 |
Berry de Bruijn1, Colin Cherry, Svetlana Kiritchenko, Joel Martin, Xiaodan Zhu.
Abstract
OBJECTIVE: As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrative. In this paper, the authors describe the design and performance of three state-of-the-art text-mining applications from the National Research Council of Canada on evaluations within the 2010 i2b2 challenge.Entities:
Mesh:
Year: 2011 PMID: 21565856 PMCID: PMC3168309 DOI: 10.1136/amiajnl-2011-000150
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Test set performance for the three tasks
| True positive | False negative | False positive | Recall | Precision | F-score | |
| Task 1: Concepts Task | ||||||
| System 1.1 | 37 646 | 7363 | 5683 | 0.8364 | 0.8688 | |
| System 1.2 | 36 776 | 8233 | 6125 | 0.8170 | 0.8572 | 0.8366 |
| System 1.3 | 37 663 | 7346 | 5787 | 0.8367 | 0.8668 | 0.8515 |
| Task 2: Assertions Task | ||||||
| System 2.1 | 17 366 | 1184 | 1184 | 0.9362 | 0.9362 | |
| System 2.2 | 17 338 | 1212 | 1212 | 0.9347 | 0.9347 | 0.9347 |
| System 2.3 | 17 197 | 1353 | 1353 | 0.9271 | 0.9271 | 0.9271 |
| Task 3: Relations Task | ||||||
| System 3.1 | 6296 | 2809 | 1965 | 0.6902 | 0.7611 | 0.7239 |
| System 3.2 | 6269 | 2801 | 1896 | 0.6911 | 0.7677 | 0.7274 |
| System 3.3 | 6288 | 2782 | 1838 | 0.6932 | 0.7738 | |
Concepts Task feature contributions
| Feature set | Recall | Precision | F | ΔF |
| All feature sets included | 0.8364 | 0.8688 | 0.8523 | NA |
| w/o Begin/End, Outside Clusters, Semi-Markov | 0.8094 | 0.8369 | 0.8229 | −0.0294 |
| w/o Begin/End | 0.8214 | 0.8571 | 0.8389 | −0.0134 |
| w/o Outside Annotation | 0.8338 | 0.8660 | 0.8496 | −0.0027 |
| w/o Clusters | 0.8348 | 0.8677 | 0.8509 | −0.0014 |
| w/o Semi-Markov | 0.8360 | 0.8684 | 0.8519 | −0.0004 |
System 2.1 and 2.3 (top/bottom row for each cell) prediction confusion matrix for the Assertions Task, as counts for predictions (rows) and truths (columns)
| Prediction | Truth | Absent | Associated with someone else | Conditional | Hypothetical | Possible | Present |
| Absent | 20 | 6 | 13 | 14 | 121 | ||
| 9 | 5 | 12 | 21 | 273 | |||
| Associated with someone else | 3 | 1 | 1 | ||||
| 4 | 1 | 2 | |||||
| Conditional | 0 | 0 | 1 | ||||
| 1 | 2 | 30 | |||||
| Hypothetical | 4 | 10 | 48 | ||||
| 4 | 11 | 53 | |||||
| Possible | 14 | 1 | 15 | 74 | |||
| 20 | 0 | 12 | 159 | ||||
| Present | 218 | 20 | 138 | 71 | 391 | ||
| 171 | 12 | 122 | 69 | 360 | |||
Correct predictions are in bold.
Performance for feature accumulations in the Relations Task
| Feature set | Recall | Precision | F-score |
| (a) Baseline | 0.646 | 0.718 | 0.680 |
| (b) +order/type-sensitive | 0.672 | 0.731 | 0.700 |
| (c) +rich word features | 0.681 | 0.753 | 0.715 |
| (d) +domain knowledge | 0.694 | 0.750 | 0.721 |
| (e) +syntax | 0.694 | 0.763 | 0.727 |
| (f) +unannotated data | 0.693 | 0.773 | 0.731 |