| Literature DB >> 20406499 |
Isabel Segura-Bedmar1, Mario Crespo, César de Pablo-Sánchez, Paloma Martínez.
Abstract
BACKGROUND: Drug-drug interactions are frequently reported in the increasing amount of biomedical literature. Information Extraction (IE) techniques have been devised as a useful instrument to manage this knowledge. Nevertheless, IE at the sentence level has a limited effect because of the frequent references to previous entities in the discourse, a phenomenon known as 'anaphora'. DrugNerAR, a drug anaphora resolution system is presented to address the problem of co-referring expressions in pharmacological literature. This development is part of a larger and innovative study about automatic drug-drug interaction extraction.Entities:
Mesh:
Year: 2010 PMID: 20406499 PMCID: PMC3288782 DOI: 10.1186/1471-2105-11-S2-S1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Architecture for drug-drug interactions extraction This figure shows the pipeline architecture of our drug-drug interaction prototype. Firstly, texts are processed by the MMTx program. This tool performs sentence splitting, tokenization, POS-tagging, chunking, and linking of phrases with UMLS concepts. Then, the drugs found in such documents are classified into drug families by a set of nomenclature rules (WHOINN affixes) recommended by the World Health Organization (WHO) International Nonproprietary Names (INNs) Program to identify and classify pharmaceutical substances. Over this basis, anaphora resolution is carried out to account for both nominal phrases referring to drugs and pronouns. Finally, the output of the previous modules is sent to the relation extraction module that exploits this information in order to account for drug interactions in biomedical documents.
Summary of the main approaches to biomedical anaphora resolution
| Authors | Approach | Corpus | Results |
|---|---|---|---|
| Castano et al. [ | Scoring method | 46 medline abstract | F=0.74 |
| Lin et al. [ | Scoring method | 32 MedLine abstract (MedStract) | F=0.92 pronominal, F=0.78 nominal |
| Kim et al. [ | Centering theory for pronominal anaphors and scoring method for nominal anaphors | 120 biological interactions | F=0.64 pronominal, F=0.59 nominal |
| Liang and Lin [ | Scoring method | MedStract + 100 Med- Line abstract | F=0.87 pronominal, F=0.80 nominal |
| Segura-Bedmar et al., [ | Scoring method and a set of semantic and morphological restrictions | 49 MedLine abstracts | F=0.85 pronominal, F=0.50 nominal |
| Nguyen and Kim [ | Maximum Entropy ranker model | Genia | Success rate: 79.55% |
Figure 2Example of sentence processed by MMTx and DrugNer and annotated with anaphoric expressions This example contains a pronominal anaphoric expression (phrase s28.p378, 'it' ) whose antecedent is annotated by the attribute ID-ANTENCENT. In this case, the antecedent is the phrase s28.p371 ('of fluvoxamine' ).
Distribution of pronominal anaphors in the corpus
| Pronominal Anaphors | Num |
|---|---|
| Personal (it, they) | 23 |
| Reflexive (itself, themselves) | 1 |
| Relative (which, that) | 113 |
| Distributive (both each, either, neither | 8 |
| Demonstrative (these, this, those, that) | 12 |
| Indefinite (all, some, many, one) | 8 |
| Total Phrases: | 165 |
Distribution of nominal anaphors in the corpus.
| Nominal Anaphors | Num |
|---|---|
| Definite (the) | 37 |
| Possessive (its, theirs) | 52 |
| Distributive (both, each, either, neither) | 11 |
| Demonstrative (these, this, those, that) | 58 |
| Indefinite (other, another, all) | 8 |
| Total Phrases: | 166 |
Rules to recognize pleonastic-it expressions.
| Rules | Examples |
|---|---|
| IT [MODALVERB [NOT]?]? BE [NOT]? [AJD|ADV| VP]* [THAT|WHETHER] | |
| IT [MODALVERB [NOT]?]? BE [NOT]? ADJ [FOR np] TO VP | If |
| IT [MODALVERB [NOT]]? [SEEM|APPEAR|MEAN|FOLLOW] [THAT] * |
Regular expressions to detect correlative expressions.
| Rule | Example |
|---|---|
| [BOTH|EITHER|NEITHER] [N P|P P|U NK] [AND|OR|NOR] [NP|PP|UNK] | These pharmacokinetic effects seen during diltiazem coadministration can result in increased clinical effects (e.g., prolonged sod ation)of |
Lexical patterns to determine grammatical number.
| Number | Lexical pattern |
|---|---|
| Plural: | [A-Z]+(S|IES|OES|XES|SHES|CHES|SES|ZES) |
| Exception for singular: | [A-Z]+(U|S)S |
Rules to detect coordinative structures.
| Rule | Example |
|---|---|
| ( [NP|PP|UNK],)* [NP|PP|UNK] [AND|OR|NOR] [NP|PP|UNK] | While all |
Global results for the baseline and the approach.
| Baseline | Centering Approach | ||||||
|---|---|---|---|---|---|---|---|
| Total | Precision | Recall | F-baseline | Precision | Recall | F-approach | Inc |
| 331 | 0.49 | 0.40 | 0.44 | 0.84 | 0.7 | 0.76 | 0.73 |
Increment (Inc) is defined as follows
Results for pronominal anaphora resolution.
| Baseline | Approach | |||||||
|---|---|---|---|---|---|---|---|---|
| Type | Total | P | R | F | P | R | F | Inc |
| Personal | 23 | 0.26 | 0.26 | 0.26 | 0.91 | 1 | 0.95 | 2.65 |
| Reflexive | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 |
| Relative | 120 | 0.83 | 0.81 | 0.82 | 1 | 0.99 | 0.99 | 0.21 |
| Distributive | 8 | 0.33 | 0.12 | 0.18 | 0.85 | 0.87 | 0.86 | 3.78 |
| Demonstrative | 11 | 0 | 0 | 0 | 0.33 | 0.27 | 0.29 | ∞ |
| Indefinite | 8 | 0.25 | 0.12 | 0.16 | 0.57 | 0.62 | 0.59 | 2.69 |
| Global results | 164 | 0.67 | 0.65 | 0.66 | 0.92 | 0.904 | 0.91 | 0.38 |
Results for nominal anaphora resolution.
| Baseline | Approach | |||||||
|---|---|---|---|---|---|---|---|---|
| Type | Total | P | R | F | P | R | F | Inc |
| Definite | 37 | 0 | 0 | 0 | 0.54 | 0.59 | 0.56 | TO |
| Possessive | 52 | 0.53 | 0.42 | 0.47 | 0.76 | 1 | 0.86 | 0.83 |
| Distributive | 11 | 0.20 | 0.27 | 0.23 | 0.77 | 0.90 | 0.82 | 2.57 |
| Demonstrative | 58 | 0.03 | 0.01 | 0.02 | 0.81 | 0.48 | 0.60 | 29 |
| Indefinite | 8 | 0 | 0 | 0 | 0.40 | 0.37 | 0.38 | to |
| Global results | 166 | 0.23 | 0.15 | 0.18 | 0.71 | 0.47 | 0.56 | 2.11 |