Literature DB >> 15897005

Effects of information and machine learning algorithms on word sense disambiguation with small datasets.

Gondy Leroy1, Thomas C Rindflesch.   

Abstract

Current approaches to word sense disambiguation use (and often combine) various machine learning techniques. Most refer to characteristics of the ambiguity and its surrounding words and are based on thousands of examples. Unfortunately, developing large training sets is burdensome, and in response to this challenge, we investigate the use of symbolic knowledge for small datasets. A naïve Bayes classifier was trained for 15 words with 100 examples for each. Unified Medical Language System (UMLS) semantic types assigned to concepts found in the sentence and relationships between these semantic types form the knowledge base. The most frequent sense of a word served as the baseline. The effect of increasingly accurate symbolic knowledge was evaluated in nine experimental conditions. Performance was measured by accuracy based on 10-fold cross-validation. The best condition used only the semantic types of the words in the sentence. Accuracy was then on average 10% higher than the baseline; however, it varied from 8% deterioration to 29% improvement. To investigate this large variance, we performed several follow-up evaluations, testing additional algorithms (decision tree and neural network), and gold standards (per expert), but the results did not significantly differ. However, we noted a trend that the best disambiguation was found for words that were the least troublesome to the human evaluators. We conclude that neither algorithm nor individual human behavior cause these large differences, but that the structure of the UMLS Metathesaurus (used to represent senses of ambiguous words) contributes to inaccuracies in the gold standard, leading to varied performance of word sense disambiguation techniques.

Entities:  

Mesh:

Year:  2005        PMID: 15897005     DOI: 10.1016/j.ijmedinf.2005.03.013

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  17 in total

1.  An evaluation of the UMLS in representing corpus derived clinical concepts.

Authors:  Jeff Friedlin; Marc Overhage
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

2.  Knowledge-based method for determining the meaning of ambiguous biomedical terms using information content measures of similarity.

Authors:  Bridget T McInnes; Ted Pedersen; Ying Liu; Genevieve B Melton; Serguei V Pakhomov
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

3.  Using a statistical natural language Parser augmented with the UMLS specialist lexicon to assign SNOMED CT codes to anatomic sites and pathologic diagnoses in full text pathology reports.

Authors:  Henry J Lowe; Yang Huang; Donald P Regula
Journal:  AMIA Annu Symp Proc       Date:  2009-11-14

4.  Population pharmacokinetic and pharmacodynamic models of remifentanil in healthy volunteers using artificial neural network analysis.

Authors:  S H Kang; M R Poynton; K M Kim; H Lee; D H Kim; S H Lee; K S Bae; O Linares; S E Kern; G J Noh
Journal:  Br J Clin Pharmacol       Date:  2007-02-23       Impact factor: 4.335

5.  Using UMLS Concept Unique Identifiers (CUIs) for word sense disambiguation in the biomedical domain.

Authors:  Bridget T McInnes; Ted Pedersen; John Carlis
Journal:  AMIA Annu Symp Proc       Date:  2007-10-11

6.  Automated disambiguation of acronyms and abbreviations in clinical texts: window and training size considerations.

Authors:  Sungrim Moon; Serguei Pakhomov; Genevieve B Melton
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

7.  Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations.

Authors:  Hua Xu; Peter D Stetson; Carol Friedman
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

8.  Automatic acquisition of sublanguage semantic schema: towards the word sense disambiguation of clinical narratives.

Authors:  Olga Patterson; Sean Igo; John F Hurdle
Journal:  AMIA Annu Symp Proc       Date:  2010-11-13

9.  Resolution of redundant semantic type assignments for organic chemicals in the UMLS.

Authors:  C Paul Morrey; Ling Chen; Michael Halper; Yehoshua Perl
Journal:  Artif Intell Med       Date:  2011-06-08       Impact factor: 5.326

10.  Fast max-margin clustering for unsupervised word sense disambiguation in biomedical texts.

Authors:  Weisi Duan; Min Song; Alexander Yates
Journal:  BMC Bioinformatics       Date:  2009-03-19       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.