| Literature DB >> 17134475 |
Sampo Pyysalo1, Tapio Salakoski, Sophie Aubin, Adeline Nazarenko.
Abstract
BACKGROUND: We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches.Entities:
Mesh:
Year: 2006 PMID: 17134475 PMCID: PMC1764446 DOI: 10.1186/1471-2105-7-S3-S2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example Link Grammar Parser linkages.
Biomedical suffixes involved in the extension of the morpho-guessing rules
| noun | synthase, kinase | noun | actin, kanamycin | ||
| noun | chronicity, hypochromicity | noun | septation, reguion | ||
| noun | replicon, intron | noun | glycosylphosphatidylinositol | ||
| noun | isomaltotetraose, isomaltotriose | noun | cofactor, repressor/activator | ||
| noun | hydroxyethyl, hydroxymethyl | noun | 5-(hydroxymethyl)-2'-deoxyuridine | ||
| noun | iodide, oligodeoxynucleotide | noun | casei, lactococci, termini | ||
| adjective | glycolytic, ribonucleic, uronic | adjective | ribosomal, ribsosomal | ||
| adjective | nonpermissive, thermosensitive | adjective | intermolecular, intramolecular | ||
| adjective | inducible, metastable | adjective | exogenous, heterologous | ||
| latin adj. | influenzae, tarentolae | latin adj. | pentosaceus, luteus, carnosus | ||
| latin adj. | japonicum, tabacum, xylinum | latin adj. | brevis, israelensis | ||
| adj./adv. | 10-fold, 4.5-fold, five-fold |
POS tag mapping to LGP rules
| NN | common noun, sing. | words.n.4 | singular nouns that can be mass or countable |
| NNS | common noun, pl. | words.n.2.s | plural nouns ending in |
| NNP | proper noun, sing. | CAPITALIZED-WORDS | generic category for words with a capitalized first character |
| NNPS | proper noun, pl. | PL-CAPITALIZED-WORDS | capitalized words ending in |
| JJ | adjective, base | UNKNOWN-WORD.a | MG rule for adjectives |
| JJR | adjective, comparative | words.adj.2 | comparative adjectives |
| JJS | adjective, superlative | words.adj.3 | superlative adjectives |
| VB | verb, base | words.v.6.1 | optionally transitive verbs (base form) |
| VBD | verb, past tense | words.v.6.3 | optionally transitive verbs ( |
| VBZ | verb, present 3rd pers. | words.v.2.2 | optionally transitive verbs ( |
| VBP | verb, present non-3rd | words.v.6.1 | optionally transitive verbs (base form) |
| VBG | verb, gerund | ING-WORDS | MG rule for words ending with |
| VBN | verb, past participle | ED-WORDS | MG rule for words ending with |
| CD | number | NUMBERS | MG rule for numbers |
| RB | adverb, base | words.adv.1 | ordinary manner adverbs |
Figure 2Vocabulary handling in the interaction and transcript corpora.
Ambiguity for single extensions
| Metric | Orig | UMLS | xMG | Brill | GT |
| Time | 15.4s | 9.9s | 10.8s | 8.8s | 8.6s |
| Lkg. ratio | 1 | 0.67 | 0.68 | 0.70 | 0.66 |
Time is average parsing time per sentence, linkage ratio is average of per-sentence linkage number ratios.
Performance for single extensions
| Orig | UMLS | Δ | xMG | Δ | Brill | Δ | GT | Δ | |
| All, first linkage | 74.2 | 75.4 | 4.7 | 76.0 | 7.0 | 75.4 | 4.7 | 76.8 | 10.1 |
| All, best linkage | 82.7 | 83.5 | 4.6 | 84.5 | 10.4 | 83.7 | 5.8 | 85.3 | 15.0 |
| NT, first linkage | 78.0 | 78.1 | 0.5 | 78.9 | 4.1 | 78.0 | 0.0 | 79.4 | 6.4 |
| NT, best linkage | 87.4 | 86.9 | -4.0 | 88.0 | 4.8 | 86.7 | -5.6 | 88.3 | 7.1 |
| N/A |
First linkage denotes the linkage ordered first by the parser heuristics and best linkage the best performance achieved by any linkage returned by the parser. Results marked NT are for the subset of sentences where no timeouts occurred for any of the modifications. Δ columns give relative decrease in error with respect to the original LGP, and p values are for "All, first linkage" performance.
Ambiguity for combinations of the extensions
| Metric | Orig | UMLS & xMG | xMG & POS | UMLS & POS | All 3 |
| Time | 15.4s | 9.5s | 8.7s | 8.3s | 8.4s |
| Lkg. ratio | 1 | 0.67 | 0.59 | 0.62 | 0.66 |
Performance for combinations of the extensions
| Orig | UMLS & xMG | Δ | xMG & POS | Δ | UMLS & POS | Δ | All 3 | Δ | |
| All, first linkage | 74.2 | 75.7 | 5.8 | 76.8 | 10.1 | 76.0 | 7.0 | 76.1 | 7.4 |
| All, best linkage | 82.7 | 83.7 | 5.8 | 85.3 | 15.0 | 84.2 | 8.7 | 84.2 | 8.7 |
| NT, first linkage | 78.0 | 78.4 | 1.8 | 79.3 | 5.9 | 78.6 | 2.7 | 78.7 | 3.2 |
| NT, best linkage | 87.4 | 87.0 | -3.2 | 88.2 | 6.3 | 87.2 | -1.6 | 87.1 | -2.4 |
| N/A |