| Literature DB >> 23043124 |
Ning Kang1, Bharat Singh, Zubair Afzal, Erik M van Mulligen, Jan A Kors.
Abstract
BACKGROUND ANDEntities:
Keywords: Biomedical concept identification; Dictionary-based system; Natural language processing; Rule-based system; Text mining
Mesh:
Year: 2012 PMID: 23043124 PMCID: PMC3756254 DOI: 10.1136/amiajnl-2012-001173
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Performance (in %) of MetaMap and Peregrine, with and without the NLP module, on the AZDC test set for exact and inexact matching of concept boundaries and identifiers
| System | Exact matching | Inexact matching | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Boundaries | Identifiers | Boundaries | Identifiers | |||||||||
| Precision | Recall | F-score | Precision | Recall | F-score | Precision | Recall | F-score | Precision | Recall | F-score | |
| MetaMap | 60.9 | 61.1 | 61.0 | 55.0 | 55.2 | 55.1 | 79.3 | 79.7 | 79.5 | 65.1 | 65.5 | 65.3 |
| MetaMap+NLP | 76.1 | 70.7 | 73.3 | 68.7 | 63.9 | 66.2 | 89.1 | 82.2 | 85.5 | 76.1 | 71.3 | 73.6 |
| Peregrine | 63.5 | 64.3 | 63.9 | 56.6 | 57.3 | 56.9 | 77.1 | 78.4 | 77.7 | 64.6 | 65.3 | 64.9 |
| Peregrine+NLP | 82.2 | 74.2 | 78.0 | 73.5 | 66.4 | 69.8 | 89.6 | 81.6 | 85.4 | 76.7 | 70.2 | 73.3 |
AZDC, Arizona Disease Corpus; NLP, natural language processing.
Performance (in %) of MetaMap and Peregrine with incremental contributions of the NLP submodules on the AZDC test set for exact boundary matching
| NLP submodules | MetaMap | Peregrine | ||||
|---|---|---|---|---|---|---|
| Precision | Recall | F-score | Precision | Recall | F-score | |
| Baseline | 60.9 | 61.1 | 61.0 | 63.5 | 64.3 | 63.9 |
| +Coordination | 63.4 | 64.4 | 64.0 | 67.5 | 67.0 | 67.2 |
| +Abbreviation | 66.7 | 67.7 | 67.2 | 70.7 | 70.8 | 70.7 |
| +Term variation | 68.9 | 69.3 | 69.1 | 74.3 | 71.8 | 73.0 |
| +Boundary correction | 73.7 | 70.7 | 72.2 | 78.8 | 74.2 | 76.4 |
| +Filtering | 76.1 | 70.7 | 73.3 | 82.2 | 74.2 | 78.0 |
AZDC, Arizona Disease Corpus; NLP, natural language processing.
Distribution across five error types of 100 randomly selected errors of each system on the AZDC test set
| System | Coordination | Abbreviation | Term variation | Boundary | Filtering |
|---|---|---|---|---|---|
| MetaMap+NLP | 12 | 13 | 28 | 24 | 23 |
| Peregrine+NLP | 11 | 14 | 28 | 26 | 21 |
AZDC, Arizona Disease Corpus; NLP, natural language processing.