| Literature DB >> 27080229 |
Halil Kilicoglu1, Graciela Rosemblat2, Marcelo Fiszman2, Thomas C Rindflesch2.
Abstract
BACKGROUND: Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature. The method addresses all entity types and relies on linguistic components of SemRep, a broad-coverage biomedical relation extraction system. It has been incorporated into SemRep, extending its core semantic interpretation capability from sentence level to discourse level.Entities:
Keywords: Biomedical literature; Natural language processing; Semantic relation extraction; Sortal anaphora resolution
Mesh:
Year: 2016 PMID: 27080229 PMCID: PMC4832532 DOI: 10.1186/s12859-016-1009-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1A sample annotation. Anaphora annotation in brat interface (PMID 10225377)
Fig. 2The sortal anaphora resolution pipeline. The high-level view of the sortal anaphora resolution pipeline and and its incorporation into SemRep
Inter-annotator agreement computed using F1 score
| Anaphoric mentions | Anaphora relations | |||
|---|---|---|---|---|
| Batch | Exact | Approximate | Exact | Approximate |
| 1 | 0.43 | 0.46 | 0.10 | 0.28 |
| 2 | 0.74 | 0.74 | 0.34 | 0.43 |
| 3 | 0.90 | 0.91 | 0.81 | 0.88 |
Annotation statistics
| Tokens |
| Anaphoric mentions | Anaphora relations | |
|---|---|---|---|---|
| Training (149) | 42,822 (287.4) | 211 (1.42) | 379 (2.54) | 427 (2.87) |
| Test (171) | 50,458 (295.1) | 265 (1.55) | 564 (3.30) | 754 (4.41) |
| TOTAL (320) | 93,280 (291.5) | 476 (1.49) | 943 (2.95) | 1181 (3.69) |
Member counts in set-membership relations
| Member count | Set-membership relations | % |
|---|---|---|
| 2 | 166 | 60.4 |
| 3 | 66 | 24.0 |
| 4 | 29 | 10.5 |
| 5 | 6 | 2.1 |
| 6 | 3 | 1.1 |
| 7 | 3 | 1.1 |
| 8 | 1 | 0.4 |
| 9 | 1 | 0.4 |
| TOTAL | 275 | 100.0 |
Anaphora resolution evaluation
| System | Precision | Recall | F1 score |
|---|---|---|---|
| Baseline | 35.2 | 7.1 | 11.9 |
| Anaphora resolution algorithm | 64.6 | 55.2 | 59.6 |
Ablation study results
| Removed component | Precision | Recall | F1 score |
|---|---|---|---|
| Anaphoricity filter | 53.4 | 58.1 | 55.6 |
| Taxonomy constraint | 55.1 | 13.6 | 21.8 |
| Headword constraint | 65.5 | 52.1 | 58.0 |
| Shared Headword constraint | 64.7 | 54.5 | 59.1 |
| Number constraint | 57.7 | 50.8 | 54.0 |
| Set-membership processing | 46.6 | 21.6 | 29.5 |
Performance on intra- vs. inter-sentential anaphora
| Processing | Precision | Recall | F1 score |
|---|---|---|---|
| Intra-sentential | 55.7 | 43.3 | 48.8 |
| Inter-sentential | 64.5 | 52.2 | 57.7 |
Anaphora resolution evaluation on the BioNLP protein coreference dataset (development portion)
| System | Precision | Recall | F1 score |
|---|---|---|---|
| D’Souza and Ng [ | 58.3 | 6.9 | 12.4 |
| Our approach | 11.5 | 14.5 | 12.8 |
Effect of anaphora resolution on semantic interpretation
| Change | Count | % |
|---|---|---|
| Partially correct → True positive | 150 | 50 |
| False positive → False positive | 102 | 34 |
| Partially correct → False positive | 42 | 14 |
| True positive → False positive | 4 | 1.4 |
| Partially correct → Partially correct | 1 | 0.3 |
| True positive → True positive | 1 | 0.3 |
Overall SemRep precision with and without anaphora resolution
| System | Precision |
|---|---|
|
| |
| Base SemRep | 58.0 |
| Enhanced with anaphora resolution | 59.3 |
|
| |
| Base SemRep | 57.1 |
| Enhanced with anaphora resolution | 59.2 |
The rates of change for the top 10 predicate types in 1 million Medline abstracts
| Predicate type | Change (%) |
|---|---|
| PROCESS_OF | +0.09 |
| LOCATION_OF | +0.20 |
| TREATS | +0.17 |
| PART_OF | +0.19 |
| ISA | +0.12 |
| AFFECTS | +0.33 |
| USES | +0.15 |
| COEXISTS_WITH | +0.26 |
| INTERACTS_WITH | +0.44 |
| ASSOCIATED_WITH | +0.45 |
Fig. 3Distribution of error types. Distribution of precision and recall errors