Vijay N Garla1, Cynthia Brandt. 1. Yale Center for Medical Informatics, Yale University, New Haven, Connecticut, USA.
Abstract
BACKGROUND: Word sense disambiguation (WSD) methods automatically assign an unambiguous concept to an ambiguous term based on context, and are important to many text-processing tasks. In this study we developed and evaluated a knowledge-based WSD method that uses semantic similarity measures derived from the Unified Medical Language System (UMLS) and evaluated the contribution of WSD to clinical text classification. METHODS: We evaluated our system on biomedical WSD datasets and determined the contribution of our WSD system to clinical document classification on the 2007 Computational Medicine Challenge corpus. RESULTS: Our system compared favorably with other knowledge-based methods. Machine learning classifiers trained on disambiguated concepts significantly outperformed those trained using all concepts. CONCLUSIONS: We developed a WSD system that achieves high disambiguation accuracy on standard biomedical WSD datasets and showed that our WSD system improves clinical document classification. DATA SHARING: We integrated our WSD system with MetaMap and the clinical Text Analysis and Knowledge Extraction System, two popular biomedical natural language processing systems. All codes required to reproduce our results and all tools developed as part of this study are released as open source, available under http://code.google.com/p/ytex.
BACKGROUND:Word sense disambiguation (WSD) methods automatically assign an unambiguous concept to an ambiguous term based on context, and are important to many text-processing tasks. In this study we developed and evaluated a knowledge-based WSD method that uses semantic similarity measures derived from the Unified Medical Language System (UMLS) and evaluated the contribution of WSD to clinical text classification. METHODS: We evaluated our system on biomedical WSD datasets and determined the contribution of our WSD system to clinical document classification on the 2007 Computational Medicine Challenge corpus. RESULTS: Our system compared favorably with other knowledge-based methods. Machine learning classifiers trained on disambiguated concepts significantly outperformed those trained using all concepts. CONCLUSIONS: We developed a WSD system that achieves high disambiguation accuracy on standard biomedical WSD datasets and showed that our WSD system improves clinical document classification. DATA SHARING: We integrated our WSD system with MetaMap and the clinical Text Analysis and Knowledge Extraction System, two popular biomedical natural language processing systems. All codes required to reproduce our results and all tools developed as part of this study are released as open source, available under http://code.google.com/p/ytex.
Entities:
Keywords:
Natural Language Processing; Semantic similarity; Word Sense Disambiguation
Authors: Leonard W D'Avolio; Thien M Nguyen; Wildon R Farwell; Yongming Chen; Felicia Fitzmeyer; Owen M Harris; Louis D Fiore Journal: J Am Med Inform Assoc Date: 2010 Jul-Aug Impact factor: 4.497
Authors: Susanne M Humphrey; Willie J Rogers; Halil Kilicoglu; Dina Demner-Fushman; Thomas C Rindflesch Journal: J Am Soc Inf Sci Technol Date: 2006-01-01
Authors: Berry de Bruijn; Colin Cherry; Svetlana Kiritchenko; Joel Martin; Xiaodan Zhu Journal: J Am Med Inform Assoc Date: 2011-05-12 Impact factor: 4.497
Authors: Sameer Pradhan; Noémie Elhadad; Brett R South; David Martinez; Lee Christensen; Amy Vogel; Hanna Suominen; Wendy W Chapman; Guergana Savova Journal: J Am Med Inform Assoc Date: 2014-08-21 Impact factor: 4.497
Authors: Katherine P Liao; Tianxi Cai; Guergana K Savova; Shawn N Murphy; Elizabeth W Karlson; Ashwin N Ananthakrishnan; Vivian S Gainer; Stanley Y Shaw; Zongqi Xia; Peter Szolovits; Susanne Churchill; Isaac Kohane Journal: BMJ Date: 2015-04-24