| Literature DB >> 29940927 |
Matthew Shardlow1, Riza Batista-Navarro1, Paul Thompson1, Raheel Nawaz1, John McNaught1, Sophia Ananiadou2.
Abstract
BACKGROUND: Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events, e.g. negation, speculation, certainty and knowledge type. However, most existing methods have focussed on the extraction of individual dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to extract new MK dimensions that encode Research Hypotheses (an author's intended knowledge gain) and New Knowledge (an author's findings). The method incorporates various features, including a combination of simple MK dimensions.Entities:
Keywords: Events; Hypothesis; Meta-knowledge; New knowledge; Text mining
Mesh:
Year: 2018 PMID: 29940927 PMCID: PMC6019216 DOI: 10.1186/s12911-018-0639-1
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1An example of two sentences, one containing events and the other containing one relation. The first sentence shows two events. The first event in the sentence concerns the term ‘activation’ which is a type of positive regulation. The theme of this event is ‘NF-kappaB’, indicating that this protein is being activated. The next event in the sentence is centered around ‘dependent’ which is a type of positive regulation. This event has the cause ‘oxidative stress’ and its theme is the first event in the sentence. The example of a relation between two entities is, in contrast to the event, clearly much more simple. The relation indicates that NPTN is related to Schizophrenia in a relation that can be categorised as ‘Target-Disorder’
Fig. 2The GENIA-MK annotation scheme. There are five Meta-Knowledge dimensions introduced by Thompson et al. as well as two further hyperdimensions
Examples of sentences containing research hypotheses and new knowledge
| ID | Example | Dimension |
|---|---|---|
| 1 | We | Research hypothesis |
| 2 | We tested the | Research hypothesis |
| 3 | These data | New knowledge |
| 4 | We | New knowledge |
| 5 | CTCF is a transcriptional repressor of the c-myc gene. | — |
Key words that help us to determine whether a sentence pertains to New Knowledge or Research Hypothesis are marked in bold. Some sentences may be neither Research Hypothesis nor New Knowledge, as shown in Sentence 5
Inter-annotator agreement across several rounds of corpus annotation as measured by Cohen’s Kappa
| Round 1 | Round 2 | Round 3 | ||
|---|---|---|---|---|
| Research Hypothesis | EU-ADR | 0.486 | 0.724 | 0.761 |
| GENIA-MK | 0.593 | 0.859 | 0.855 | |
| New Knowledge | EU-ADR | 0.627 | 0.825 | 0.842 |
| GENIA-MK | 0.772 | 0.895 | 0.895 |
We show that agreement increased throughout the annotation process as we discussed difficult cases with annotators. We undertook regular meetings with the annotators to quickly resolve any disagreements
Statistics comparing our versions of the GENIA-MK and EU-ADR corpora, both annotated with new knowledge and research hypothesis labels
| GENIA-MK | EU-ADR | |
|---|---|---|
| Base type for annotations | Events | Relations |
| Number of annotations | 6899 | 622 |
| Number of abstracts | 150 | 159 |
| Number of new knowledge annotations | 2356 (34.2%) | 406 (65.3%) |
| Number of research hypothesis annotations | 366 (5.31%) | 38 (6.11%) |
The GENIA-MK corpus is much more densely annotated than the EU-ADR corpus, with over ten times more annotated events in the former than annotated relations in the latter. Research Hypotheses are particularly sparse in both corpora, constituting just over 5% of all annotated relations and events in each case. There is a disparity in the proportion of New Knowledge between the two corpora, in part because the EU-ADR corpus appeared to favour the annotation of relationships denoting New Knowledge
Types of features used in training the Knowledge Type classification model
| Feature type | Features |
|---|---|
| Sentence | SE1: length in words; SE2: length in characters; SE3: mean number of characters per word; SE4: median number of characters per word; POS tag ratios (SE5: noun-to-verb, SE6: noun-to-adjective, SE7: noun-to-adverb, SE8: verb-to-adjective, SE9: verb-to-adverb; SE10: adjective-to-adverb) |
| Structural | ST1: whether any participant is an event; ST2: the sentence number containing this event; ST3: whether this event is a participant in another event; ST4: whether the event is a noun phrase; ST5: whether the event is an instance of “regulation”; ST6: total number of themes; ST7: total number of causes |
| Participant | PA1: POS tag of the first participant; PA2: POS tag of the first cause; PA3: whether any theme is an event; PA4: whether any cause is an event; PA5: POS tag of the word in a governing dependency over the theme; PA6: POS tag of the word in a governing dependency over the cause |
| Lexical | L1: distance between nearest clue and event trigger; L2: whether sentence contains at least one clue; L-N which clues (in a precompiled list) are matched within the sentence; features of matched clue (L3: surface form, L4: POS tag, L5: position relative to trigger, L6: whether in auxiliary form); L7: whether trigger contains a cue; features of nearest clue (L8: tense, L9: aspect, L10: voice); L11-L15: whether clue usually occurs in the context of each Knowledge Type; L16: number of matched clues; |
| Constituency | Relationships between clue and event trigger (C1: s-commands, C2: vp-commands, C3: np-commands); relationships between clue and any event participant (C4: s-commands, C5: vp-commands, C6: np-commands); C7: whether scope of any clue is within the same scope as the trigger |
| Dependency | Direct dependencies (D1: between clue and trigger, D2: between clue and any event participant); one-hop dependencies (D3: between clue and trigger, D4: between clue and any event participant); two-hop dependencies (D5: between clue and trigger, D6: between clue and any event participant) |
| Parse Tree | Distances: PT1: between theme and furthest leaf node; PT2: between cause and furthest leaf node; PT3: between theme and root node; PT4: between cause and root node |
A detailed explanation of each feature with examples is given in the Additional files
Effects of each feature subset on the final classification performance for Knowledge Type
| Feature Subset | Only This Feature | All Except This Feature | ||||
|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | |
| Constituency | — | — | — | 0.815 | 0.727 | 0.763 |
| Dependency | — | — | — | 0.823 | 0.728 | 0.765 |
| Parse Tree | 0.428 | 0.281 | 0.340 | 0.823 | 0.730 |
|
| Participant | 0.383 | 0.252 | 0.243 |
|
|
|
| Sentence | 0.474 | 0.442 | 0.453 | 0.785 | 0.705 | 0.738 |
| Lexical |
| 0.449 | 0.478 | 0.794 | 0.722 | 0.754 |
| Structural | 0.558 |
|
| 0.791 | 0.665 | 0.709 |
| All | 0.823 | 0.725 | 0.764 | 0.823 | 0.725 | 0.764 |
Results are only shown in cases where it was possible to produce a reliable model. The final row denotes the performance of the classifier when using all feature subsets
Values in bold represent the best performing feature subset for each column
A comparison of the Knowledge Type results produced by our classifier against the results of the most directly comparable work
| Knowledge Type | RF — our features | Miwa et al. 2012 [ | SVM — our features | ||||||
|---|---|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | P | R | F1 | |
| Observation | 0.781 | 0.853 | 0.815 | 0.721 | 0.723 | 0.722 | 0.658 | 0.744 | 0.698 |
| Fact | 0.847 | 0.648 | 0.734 | 0.637 | 0.680 | 0.658 | 0.506 | 0.310 | 0.384 |
| Other | 0.788 | 0.810 | 0.799 | 0.770 | 0.706 | 0.736 | 0.727 | 0.671 | 0.698 |
| Method | 0.832 | 0.535 | 0.651 | 0.534 | 0.543 | 0.538 | 0.641 | 0.455 | 0.532 |
| Investigation | 0.884 | 0.763 | 0.819 | 0.691 | 0.755 | 0.722 | 0.724 | 0.714 | 0.718 |
| Analysis | 0.852 | 0.826 | 0.838 | 0.704 | 0.784 | 0.742 | 0.718 | 0.793 | 0.754 |
To enable a more direct comparison, we have also provided our results when using a SVM (the classifier used by Miwa et al.) with our features
The top-10 most informative features for each Knowledge Type value
| # | Observation | Fact | Other | Method | Investigation | Analysis | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | C7 | 0.313 | ST3 | 0.173 | ST3 | 0.487 | ST5 | 0.203 | L-47 | 0.308 | C5 | 0.364 |
| 2 | L5 | 0.263 | ST2 | 0.154 | ST1 | 0.330 | ST1 | 0.135 | L-46 | 0.292 | L11 | 0.343 |
| 3 | L11 | 0.255 | ST5 | 0.110 | ST5 | 0.216 | L-48 | 0.100 | ST4 | 0.227 | C4 | 0.301 |
| 4 | C2 | 0.252 | C7 | 0.097 | L11 | 0.131 | ST3 | 0.075 | L13 | 0.221 | ST3 | 0.283 |
| 5 | C5 | 0.218 | L2 | 0.088 | C7 | 0.127 | C7 | 0.063 | ST2 | 0.209 | C2 | 0.258 |
| 6 | L16 | 0.211 | L5 | 0.076 | L5 | 0.119 | L14 | 0.060 | ST3 | 0.202 | C7 | 0.257 |
| 7 | C1 | 0.207 | L11 | 0.068 | D1 | 0.108 | L9 | 0.056 | SE5 | 0.195 | D1 | 0.234 |
| 8 | L2 | 0.196 | SE10 | 0.064 | SE3 | 0.096 | C5 | 0.051 | D1 | 0.151 | ST1 | 0.227 |
| 9 | C2 | 0.178 | L-35 | 0.063 | L-28 | 0.090 | SE1 | 0.046 | D2 | 0.144 | C1 | 0.203 |
| 10 | L15 | 0.173 | C1 | 0.061 | L2 | 0.087 | C4 | 0.045 | L11 | 0.141 | L5 | 0.197 |
These were calculated using Pearson’s correlation between each class label and each feature. The feature labels are expanded in Table 4, above
The 10 top ranked features, averaged across all classes for Knowledge Type
| # | Feature | Average Rank |
|---|---|---|
| 1 | C7 | 5.50 |
| 2 | L11 | 6.17 |
| 3 | ST3 | 8.33 |
| 4 | L5 | 9.17 |
| 5 | ST1 | 11.33 |
| 6 | C4 | 12.50 |
| 7 | D1 | 14.17 |
| 8 | ST5 | 14.67 |
| 9 | C5 | 15.33 |
| 10 | L-5 | 18.50 |
This shows which features are globally informative. The feature labels are expanded in Table 4, above
Results of 10-fold cross validation on both datasets for Research Hypothesis and New Knowledge
| P | R | F1 | |||
|---|---|---|---|---|---|
| GENIA-MK | Majority Baseline | New Knowledge | 0.000 | 0.000 | 0.000 |
| Other knowledge | 0.659 | 1.000 | 0.794 | ||
| Average | 0.329 | 0.500 | 0.397 | ||
| Hypothetical | 0.000 | 0.000 | 0.000 | ||
| Non-Hypothetical | 0.947 | 1.000 | 0.973 | ||
| Average | 0.473 | 0.500 | 0.486 | ||
| Rule-based Baseline | New Knowledge | 0.580 | 0.767 | 0.660 | |
| Other knowledge | 0.855 | 0.712 | 0.777 | ||
| Average | 0.717 | 0.739 | 0.719 | ||
| Hypothetical | 0.054 | 0.077 | 0.063 | ||
| Non-Hypothetical | 0.947 | 0.924 | 0.936 | ||
| Average | 0.500 | 0.500 | 0.499 | ||
| Random Forest | New Knowledge | 0.863 | 0.920 | 0.891 | |
| Other knowledge | 0.823 | 0.719 | 0.767 | ||
| Average | 0.843 | 0.819 | 0.829 | ||
| Hypothetical | 0.928 | 0.762 | 0.836 | ||
| Non-Hypothetical | 0.987 | 0.997 | 0.992 | ||
| Average | 0.958 | 0.880 | 0.914 | ||
| EU-ADR | Majority Baseline | New Knowledge | 0.644 | 1.000 | 0.784 |
| Other knowledge | 0.000 | 0.000 | 0.000 | ||
| Average | 0.322 | 0.5 | 0.392 | ||
| Hypothetical | 0.000 | 0.000 | 0.000 | ||
| Non-Hypothetical | 0.939 | 1.000 | 0.968 | ||
| Average | 0.469 | 0.500 | 0.484 | ||
| Random Forest | New Knowledge | 0.853 | 0.921 | 0.884 | |
| Other knowledge | 0.831 | 0.692 | 0.748 | ||
| Average | 0.842 | 0.807 | 0.816 | ||
| Hypothetical | 1.00 | 0.533 | 0.668 | ||
| Non-Hypothetical | 0.970 | 1.00 | 0.9848 | ||
| Average | 0.985 | 0.767 | 0.827 |
We report precision (P), recall (R) and F1-score. In each major row below, the first two sub-rows represent the macro average of 10-fold cross validation on each class. The third sub-row represents the average of the two classes above it. We have included a majority class baseline below for comparison. This was calculated by assigning every event to the majority class and then calculating the results of precision, recall and F1 score. The majority class is the negative class for both New Knowledge and Hypothesis in the GENIA-MK corpus. In the EU-ADR corpus, the majority class is the positive class for New Knowledge and the negative class for Hypothesis. In addition, we include results for the rule-based baseline from Thompson et al. [28], as described previously
|
| E1 |
|
| regulation |
|
| activation following viral binding |
|
| N/A |
|
| focused |
|
| E2 |
|
| effects |
|
| Cya, aspirin, and indomethacin |
|
| N/A |
|
| present study |
|
| E3 |
|
| result in decreased |
|
| recruitment of monocytes |
|
| Down-regulation of MCP-1 expression |
| by aspirin | |
|
| N/A |
|
| E4 |
|
| expression |
|
| megakaryotic genes |
|
| N/A |
|
| indicate |
|
| E5 |
|
| enhanced |
|
| expression of Spi-1 and… |
|
| E4 |
|
| N/A |