Bridget T McInnes1, Ted Pedersen. 1. Minnesota Supercomputing Institute, University of Minnesota, 117 Pleasant St SE, Minneapolis, MN 55455. Electronic address: btmcinnes@gmail.com.
Abstract
INTRODUCTION: In this article, we evaluate a knowledge-based word sense disambiguation method that determines the intended concept associated with an ambiguous word in biomedical text using semantic similarity and relatedness measures. These measures quantify the degree of similarity or relatedness between concepts in the Unified Medical Language System (UMLS). The objective of this work is to develop a method that can disambiguate terms in biomedical text by exploiting similarity and relatedness information extracted from biomedical resources and to evaluate the efficacy of these measure on WSD. METHOD: We evaluate our method on a biomedical dataset (MSH-WSD) that contains 203 ambiguous terms and acronyms. RESULTS: We show that information content-based measures derived from either a corpus or taxonomy obtain a higher disambiguation accuracy than path-based measures or relatedness measures on the MSH-WSD dataset. AVAILABILITY: The WSD system is open source and freely available from http://search.cpan.org/dist/UMLS-SenseRelate/. The MSH-WSD dataset is available from the National Library of Medicine http://wsd.nlm.nih.gov.
INTRODUCTION: In this article, we evaluate a knowledge-based word sense disambiguation method that determines the intended concept associated with an ambiguous word in biomedical text using semantic similarity and relatedness measures. These measures quantify the degree of similarity or relatedness between concepts in the Unified Medical Language System (UMLS). The objective of this work is to develop a method that can disambiguate terms in biomedical text by exploiting similarity and relatedness information extracted from biomedical resources and to evaluate the efficacy of these measure on WSD. METHOD: We evaluate our method on a biomedical dataset (MSH-WSD) that contains 203 ambiguous terms and acronyms. RESULTS: We show that information content-based measures derived from either a corpus or taxonomy obtain a higher disambiguation accuracy than path-based measures or relatedness measures on the MSH-WSD dataset. AVAILABILITY: The WSD system is open source and freely available from http://search.cpan.org/dist/UMLS-SenseRelate/. The MSH-WSD dataset is available from the National Library of Medicine http://wsd.nlm.nih.gov.
Authors: Susanne M Humphrey; Willie J Rogers; Halil Kilicoglu; Dina Demner-Fushman; Thomas C Rindflesch Journal: J Am Soc Inf Sci Technol Date: 2006-01-01
Authors: Niccolo Tesi; Sven van der Lee; Marc Hulsman; Henne Holstege; Marcel J T Reinders Journal: Nucleic Acids Res Date: 2021-07-02 Impact factor: 16.971
Authors: Timothy C Guetterman; Tammy Chang; Melissa DeJonckheere; Tanmay Basu; Elizabeth Scruggs; V G Vinod Vydiswaran Journal: J Med Internet Res Date: 2018-06-29 Impact factor: 5.428