| Literature DB >> 24593054 |
Ning Kang1, Bharat Singh, Chinh Bui, Zubair Afzal, Erik M van Mulligen, Jan A Kors.
Abstract
BACKGROUND: Many biomedical relation extraction systems are machine-learning based and have to be trained on large annotated corpora that are expensive and cumbersome to construct. We developed a knowledge-based relation extraction system that requires minimal training data, and applied the system for the extraction of adverse drug events from biomedical text. The system consists of a concept recognition module that identifies drugs and adverse effects in sentences, and a knowledge-base module that establishes whether a relation exists between the recognized concepts. The knowledge base was filled with information from the Unified Medical Language System. The performance of the system was evaluated on the ADE corpus, consisting of 1644 abstracts with manually annotated adverse drug events. Fifty abstracts were used for training, the remaining abstracts were used for testing.Entities:
Mesh:
Year: 2014 PMID: 24593054 PMCID: PMC3973995 DOI: 10.1186/1471-2105-15-64
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Number of abstracts, relations, and sentences in the ADE corpus
| Abstracts | 50 | 1594 | 1644 |
| Relations | 201 | 6620 | 6821 |
| Sentences with at least one relation | 130 | 4142 | 4272 |
| Sentences with no relation | 233 | 7327 | 7560 |
Performance (in %) of the baseline relation extraction system and the incremental contribution of different system modules, on the test set of the ADE corpus
| Baseline | 8.9 | 78.4 | 16.1 |
| + NLP module | 21.1 | 82.9 | 33.6 |
| + Knowledge base | 32.8 | 78.1 | 46.2 |
| + Relation-type filtering | 38.1 | 74.8 | 50.5 |
Performance (in %) of the relation extraction system on the test set of the ADE corpus for different distance thresholds in the knowledge base
| 1 | 43.2 | 1.6 | 3.1 |
| 2 | 41.8 | 15.2 | 22.3 |
| 3 | 40.6 | 64.1 | 49.7 |
| 4 | 38.1 | 74.8 | 50.5 |
| 5 | 37.0 | 76.5 | 49.9 |
Performance (in %) of the relation extraction system on the test set of the ADE corpus for different sizes of the training set
| 50 | 38.1 | 74.8 | 50.5 |
| 100 | 39.8 | 75.2 | 52.1 |
| 200 | 41.1 | 75.7 | 53.3 |
| 400 | 42.1 | 76.3 | 54.3 |
Performance (in %) of a machine-learning based (jSRE) relation extraction system [[32]] and the knowledge-based system on a subset of the ADE test corpus (see text)
| 0 | n/a | n/a | n/a | 88.5 | 88.6 | 88.5 |
| 10 | 58 | 6 | 55 | 89.1 | 88.2 | 88.6 |
| 50 | 79 | 87 | 82 | 91.8 | 86.1 | 88.8 |
Error analysis of 100 randomly selected errors on the ADE test set
| False-positive relations | |
| Entities correctly identified, with incorrect relation in the knowledge base | 64 |
| Entities incorrectly identified, with a relation in the knowledge base | 15 |
| False-negative relations | |
| Entities correctly identified, but relation filtered out | 8 |
| Entities not identified, no relation established | 13 |