| Literature DB >> 21985429 |
Paul Thompson1, Raheel Nawaz, John McNaught, Sophia Ananiadou.
Abstract
BACKGROUND: Biomedical papers contain rich information about entities, facts and events of biological relevance. To discover these automatically, we use text mining techniques, which rely on annotated corpora for training. In order to extract protein-protein interactions, genotype-phenotype/gene-disease associations, etc., we rely on event corpora that are annotated with classified, structured representations of important facts and findings contained within text. These provide an important resource for the training of domain-specific information extraction (IE) systems, to facilitate semantic-based searching of documents. Correct interpretation of these events is not possible without additional information, e.g., does an event describe a fact, a hypothesis, an experimental result or an analysis of results? How confident is the author about the validity of her analyses? These and other types of information, which we collectively term meta-knowledge, can be derived from the context of the event.Entities:
Mesh:
Year: 2011 PMID: 21985429 PMCID: PMC3222636 DOI: 10.1186/1471-2105-12-393
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Meta-knowledge annotation scheme. The boxes with the grey background correspond to information that is common to most bio-event annotation schemes, i.e., the participants in the event, together with an indication of the class or type of the event. The boxes with the dark green backgrounds correspond to our proposed meta-knowledge annotation dimensions and their possible values, whilst the light green box shows the hyper-dimensions that can be derived by considering a combination of the annotated dimensions.
Inference table for New Knowledge hyper-dimension
| Source (Annotated) | KT (Annotated) | CL (Annotated) | New Knowledge (Inferred) |
|---|---|---|---|
| Other | X | X | No |
| X | X | L2 | No |
| X | X | L1 | No |
| Current | Observation | L3 | Yes |
| Current | Analysis | L3 | Yes |
| X | Fact | X | No |
| X | Method | X | No |
| X | Other | X | No |
| X | Investigation | X | No |
The symbol 'X' indicates a "don't care condition", meaning that this value does not have any impact on the result
Inference table for Hypothesis hyper-dimension
| KT (Annotated) | CL (Annotated) | Hypothesis (Inferred) |
|---|---|---|
| Fact | X | No |
| Method | X | No |
| Other | X | No |
| Observation | X | No |
| Analysis | L3 | No |
| Analysis | L2 | Yes |
| Analysis | L1 | Yes |
| Investigation | X | Yes |
The symbol 'X' indicates a "don't care condition", meaning that this value does not have any impact on the result.
Distribution of annotated categories for Knowledge Type (KT)
| Category | Freq | % of total events |
|---|---|---|
| Observation | 12821 | 34.7% |
| Other | 11537 | 31.3% |
| Analysis | 6578 | 17.8% |
| Fact | 2998 | 8.1% |
| Investigation | 1948 | 5.3% |
| Method | 976 | 2.6% |
Most common KT clue expressions
| suggest | 408 | examined | 207 | found | 361 |
| show | 353 | investigated | 205 | observed | 226 |
| demonstrate | 335 | analyzed | 119 | detected | 141 |
| demonstrated | 332 | studied | 94 | detectable | 48 |
| showed | 246 | to determine | 50 | seen | 32 |
| shown | 244 | tested | 39 | noted | 17 |
| may | 242 | measured | 25 | find | 11 |
| can | 232 | monitored | 25 | detect | 11 |
| associated | 215 | to investigate | 23 | findings | 11 |
| indicate | 211 | to examine | 21 | observations | 9 |
| revealed | 196 | to study | 21 | finding | 9 |
| suggesting | 140 | analysis | 20 | show | 6 |
| report | 114 | studies | 20 | report | 6 |
| identified | 112 | to identify | 16 | exhibit | 5 |
| thus | 108 | investigate | 15 | ||
Distribution of annotated categories for Certainty Level (CL)
| Category | Freq | % of total events |
|---|---|---|
| L3 (default) | 33876 | 91.9% |
| L2 | 2216 | 6.0% |
| L1 | 766 | 2.1% |
Most common CL clue expressions
| can | 407 | may | 516 |
| suggest | 285 | might | 75 |
| indicate | 150 | could | 55 |
| suggesting | 112 | possible | 32 |
| ability | 108 | potential | 23 |
| indicated | 99 | possibility | 10 |
| appears | 88 | possibly | 10 |
| able | 86 | potentially | 10 |
| indicating | 72 | perhaps | 5 |
| likely | 52 | propose | 4 |
Distribution of annotated categories for Polarity
| Polarity | Freq | % of total events |
|---|---|---|
| Positive (default) | 34595 | 93.9% |
| Negative | 2263 | 6.1% |
Distribution of negated events among KT categories
| KT Category | Negated events (% within category) |
|---|---|
| Observation | 1364 (10.6%) |
| Analysis | 577 (8.7%) |
| Fact | 105 (3.5%) |
| Other | 187 (1.6%) |
| Method | 10 (1.0%) |
| Investigation | 20 (1.0%) |
Most common clue expressions for Polarity = Negative
| Category | Freq |
|---|---|
| not | 1141 |
| no | 199 |
| independent | 113 |
| without | 65 |
| failed | 47 |
| nor | 47 |
| absence | 42 |
| neither | 38 |
| unaffected | 28 |
| lack | 23 |
| un | 23 |
| unable | 19 |
| independently | 18 |
| resistant | 15 |
| fails | 13 |
Distribution of annotated categories for Manner
| Manner | Freq | % of total events |
|---|---|---|
| Neutral (default) | 35143 | 95.3% |
| High | 1392 | 3.8% |
| Low | 323 | 0.8% |
Distribution of events with explicit Manner annotated among KT categories
| KT Category | Events with |
|---|---|
| Observation | 1141 (8.9%) |
| Analysis | 276 (4.2%) |
| Fact | 120 (4.0%) |
| Other | 171 (1.5%) |
| Investigation | 5 (0.2%) |
| Method | 2 (0.2%) |
Most common Manner clue expressions
| significantly | 140 | little | 22 |
| potent | 84 | low | 15 |
| markedly | 81 | little or no | 13 |
| rapidly | 73 | low levels | 11 |
| strongly | 72 | weak | 11 |
| rapid | 65 | limited | 10 |
| significant | 39 | low level | 9 |
| completely | 36 | weakly | 9 |
| strong | 30 | minimal | 8 |
| high | 28 | only a partial | 8 |
| high levels | 28 | no significant | 8 |
| overexpression | 26 | partially | 8 |
| highly | 23 | barely | 7 |
| marked | 23 | to a lesser extent | 6 |
| dramatically | 22 | not significant | 6 |
Distribution of annotated categories for Source
| Source | Freq | % of total events |
|---|---|---|
| Current (default) | 36313 | 98.5% |
| Other | 545 | 1.5% |
Most common clue expressions for Source = Other
| Clue | Freq |
|---|---|
| previously | 118 |
| has been | 89 |
| recently | 67 |
| have been | 39 |
| previous studies | 24 |
| recent studies | 17 |
| recent | 15 |
| previous | 14 |
| our previous studies | 10 |
| earlier | 6 |
Distribution of categories for the two hyper-dimensions
| Hyper-dimension | Category | Freq | % of total events |
|---|---|---|---|
| New Knowledge | Yes | 15985 | 43.4% |
| No | 20873 | 56.6% | |
| Hypothesis | Yes | 4924 | 13.4% |
| No | 31934 | 86.6% | |
Inter-annotator agreement rates
| Dimension | Kappa value |
|---|---|
| Polarity | 0.929 |
| Source | 0.878 |
| CL | 0.864 |
| Manner | 0.864 |
| KT | 0.843 |