| Literature DB >> 28678823 |
Halil Kilicoglu1, Graciela Rosemblat1, Thomas C Rindflesch1.
Abstract
Biomedical knowledge claims are often expressed as hypotheses, speculations, or opinions, rather than explicit facts (propositions). Much biomedical text mining has focused on extracting propositions from biomedical literature. One such system is SemRep, which extracts propositional content in the form of subject-predicate-object triples called predications. In this study, we investigated the feasibility of assessing the factuality level of SemRep predications to provide more nuanced distinctions between predications for downstream applications. We annotated semantic predications extracted from 500 PubMed abstracts with seven factuality values (fact, probable, possible, doubtful, counterfact, uncommitted, and conditional). We extended a rule-based, compositional approach that uses lexical and syntactic information to predict factuality levels. We compared this approach to a supervised machine learning method that uses a rich feature set based on the annotated corpus. Our results indicate that the compositional approach is more effective than the machine learning method in recognizing the factuality values of predications. The annotated corpus as well as the source code and binaries for factuality assignment are publicly available. We will also incorporate the results of the better performing compositional approach into SemMedDB, a PubMed-scale repository of semantic predications extracted using SemRep.Entities:
Mesh:
Year: 2017 PMID: 28678823 PMCID: PMC5497973 DOI: 10.1371/journal.pone.0179926
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Examples of SemRep predications with their factuality values.
| Sentence | Predication | Factuality |
|---|---|---|
| Nifedipine-AUGMENTS-Renal Blood Flow | ||
| Tamoxifen-PREDISPOSES-Endometrial Carcinoma | ||
| Estrogen Antagonists-TREATS-Malignant neoplasm of cancer | ||
| Leukotriene Antagonists-TREATS-Hay fever | ||
| Losartan-CAUSES-Coughing | ||
| Plasmapheresis-TREATS-Collagen Diseases | ||
| Cyclic AMP-STIMULATES-CD40 Ligand |
Fig 1The factuality scale with proposed factuality values.
Fig 2Workflow diagram of the method for compositional factuality assessment of SemRep predications.
Composition example for a sentence in PMID 10652588.
| (1) | |
| (2) | |
| (3) | C0020740:Ibuprofen-TREATS-C0006142:Malignant neoplasm of breast |
| (4) |
In row (2), UMLS Metathesaurus concepts corresponding to entities are represented as CUI: Preferred Name (Semantic Types) tuples.
Fig 3Embedding categorization.
Fig 4Syntactic dependency graph of the sentence These results suggest that Ibuprofen may have potential in the chemoprevention and treatment of breast cancer. and its corresponding semantic dependency subgraph.
Dictionary entry for the modal auxiliary may.
| Lemma | ||
|---|---|---|
| Sense.1 | Category |
|
| Prior scalar modality value | 0.5 | |
| Semantic dependency types | ||
| Sense.2 | Category |
|
| Prior scalar modality value | 0.6 | |
| Semantic dependency types | ||
Mapping scalar modality values to factuality levels.
| Condition | Factuality value |
|---|---|
| MV |
|
| MV |
|
| MV |
|
| MV |
|
| MV |
|
| MV |
|
| MV |
|
| MV |
|
| MV |
|
| MV |
|
| MV |
|
| MV |
|
Mapping Certainty Level and Polarity to factuality values.
| Certainty Level | Polarity | Factuality |
|---|---|---|
| L3 | Positive |
|
| L2 | Positive |
|
| L1 | Positive |
|
| L1 OR L2 | Negative |
|
| L3 | Negative |
|
SemRep factuality corpus characteristics.
| # Training (%) | # Testing (%) | # Total (%) | |
|---|---|---|---|
| Abstracts | 300 | 200 | 500 |
| SemRep predications | 4,431 | 2,960 | 7,391 |
| True positive SemRep predications | 3,149 (71.1) | 2,179 (73.6) | 5,328 (72.1) |
|
| 2,754 (87.5) | 1,958 (89.9) | 4,713 (88.4) |
|
| 143 (4.5) | 67 (3.0) | 210 (4.0) |
|
| 66 (2.1) | 61 (2.8) | 127 (2.4) |
|
| 8 (0.3) | 6 (0.3) | 14 (0.3) |
|
| 57 (1.8) | 35 (1.6) | 92 (1.7) |
|
| 120 (3.8) | 52 (2.4) | 172 (3.2) |
|
| 1 (0.0) | 0 (0.0) | 1 (0.0) |
Evaluation results on the test set.
| Precision (%) | Recall (%) | F1 (%) | Accuracy (%) | |
|---|---|---|---|---|
| 89.9 | ||||
|
| 89.9 | 100.0 | 94.7 | |
| 86.7 | ||||
| 95.6 | 91.2 | 93.4 | ||
| 29.6 | 79.1 | 43.1 | ||
| 37.6 | 67.2 | 48.2 | ||
| 0.0 | 0.0 | 0.0 | ||
| 34.8 | 22.9 | 27.6 | ||
| 0.0 | 0.0 | 0.0 | ||
| 95.6 | 98.8 | 97.2 | ||
| 66.7 | 71.6 | 69.1 | ||
| 86.8 | 54.1 | 66.7 | ||
| 100.0 | 33.3 | 50.0 | ||
| 100.0 | 57.1 | 72.7 | ||
| 95.5 | 40.4 | 56.8 | ||
| 89.8 | ||||
| 90.4 | 99.5 | 94.7 | ||
| 50.0 | 5.9 | 10.5 | ||
| 40.0 | 3.3 | 6.1 | ||
| 0.0 | 0.0 | 0.0 | ||
| 100.0 | 2.9 | 5.6 | ||
| 20.0 | 3.9 | 6.6 | ||
| 92.9 | ||||
| 94.5 | 99.0 | 96.7 | ||
| 57.4 | 51.5 | 54.3 | ||
| 80.0 | 45.9 | 58.3 | ||
| 0.0 | 0.0 | 0.0 | ||
| 88.9 | 45.7 | 60.4 | ||
| 46.7 | 13.7 | 21.2 | ||
Ablation study for the enhanced compositional approach.
| Precision (%) | Recall (%) | F1 (%) | Accuracy (%) | |
|---|---|---|---|---|
| 95.6 | 98.8 | 97.2 | ||
| 66.7 | 71.6 | 69.1 | ||
| 86.8 | 54.1 | 66.7 | ||
| 100.0 | 33.3 | 50.0 | ||
| 100.0 | 57.1 | 72.7 | ||
| 95.5 | 40.4 | 56.8 | ||
| 94.3 | ||||
| 95.6 | 98.6 | 97.1 | ||
| 66.2 | 70.2 | 68.1 | ||
| 86.8 | 54.1 | 66.7 | ||
| 66.7 | 33.3 | 44.4 | ||
| 90.9 | 57.1 | 70.2 | ||
| 87.5 | 40.4 | 55.3 | ||
| 94.2 | ||||
| 95.4 | 98.8 | 97.1 | ||
| 66.7 | 71.6 | 69.1 | ||
| 80.5 | 54.1 | 64.7 | ||
| 100.0 | 33.3 | 50.0 | ||
| 100.0 | 40.0 | 57.2 | ||
| 95.5 | 40.4 | 56.8 | ||
| 93.5 | ||||
| 95.9 | 97.7 | 96.8 | ||
| 52.8 | 71.6 | 60.8 | ||
| 72.9 | 57.4 | 64.2 | ||
| 100.0 | 33.3 | 50.0 | ||
| 87.5 | 60.0 | 71.2 | ||
| 95.3 | 38.5 | 54.8 | ||
Distribution of error categories for the enhanced compositional method.
| Category | % |
|---|---|
| Factuality triggers | 29.9 |
| Mapping rules | 27.7 |
| Argument identification | 16.8 |
| Scalar modality value composition | 10.9 |
| Preprocessing | 5.8 |
| Graph transformation | 5.1 |
| Comparative structures | 2.2 |
| Syntactic parsing | 1.5 |