Literature DB >> 23304311

Extracting semantic lexicons from discharge summaries using machine learning and the C-Value method.

Min Jiang1, Josh C Denny, Buzhou Tang, Hongxin Cao, Hua Xu.   

Abstract

Semantic lexicons that link words and phrases to specific semantic types such as diseases are valuable assets for clinical natural language processing (NLP) systems. Although terminological terms with predefined semantic types can be generated easily from existing knowledge bases such as the Unified Medical Language Systems (UMLS), they are often limited and do not have good coverage for narrative clinical text. In this study, we developed a method for building semantic lexicons from clinical corpus. It extracts candidate semantic terms using a conditional random field (CRF) classifier and then selects terms using the C-Value algorithm. We applied the method to a corpus containing 10 years of discharge summaries from Vanderbilt University Hospital (VUH) and extracted 44,957 new terms for three semantic groups: Problem, Treatment, and Test. A manual analysis of 200 randomly selected terms not found in the UMLS demonstrated that 59% of them were meaningful new clinical concepts and 25% were lexical variants of exiting concepts in the UMLS. Furthermore, we compared the effectiveness of corpus-derived and UMLS-derived semantic lexicons in the concept extraction task of the 2010 i2b2 clinical NLP challenge. Our results showed that the classifier with corpus-derived semantic lexicons as features achieved a better performance (F-score 82.52%) than that with UMLS-derived semantic lexicons as features (F-score 82.04%). We conclude that such corpus-based methods are effective for generating semantic lexicons, which may improve named entity recognition tasks and may aid in augmenting synonymy within existing terminologies.

Mesh:

Year:  2012        PMID: 23304311      PMCID: PMC3540581     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  18 in total

1.  Mining molecular binding terminology from biomedical text.

Authors:  T C Rindflesch; L Hunter; A R Aronson
Journal:  Proc AMIA Symp       Date:  1999

2.  Evaluating the UMLS as a source of lexical knowledge for medical language processing.

Authors:  C Friedman; H Liu; L Shagina; S Johnson; G Hripcsak
Journal:  Proc AMIA Symp       Date:  2001

3.  A term extraction tool for expanding content in the domain of functioning, disability, and health: proof of concept.

Authors:  Marcelline R Harris; Guergana K Savova; Thomas M Johnson; Christopher G Chute
Journal:  J Biomed Inform       Date:  2003 Aug-Oct       Impact factor: 6.317

4.  Coverage of patient safety terms in the UMLS metathesaurus.

Authors:  Aziz A Boxwala; Qing T Zeng; Anthony Chamberas; Luke Sato; Meghan Dierks
Journal:  AMIA Annu Symp Proc       Date:  2003

5.  An evaluation of the UMLS in representing corpus derived clinical concepts.

Authors:  Jeff Friedlin; Marc Overhage
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

6.  Unified medical language system coverage of emergency-medicine chief complaints.

Authors:  Debbie A Travers; Stephanie W Haas
Journal:  Acad Emerg Med       Date:  2006-11-01       Impact factor: 3.451

7.  Identifying risk factors for metabolic syndrome in biomedical text.

Authors:  Marcelo Fiszman; Graciela Rosemblat; Caroline B Ahlers; Thomas C Rindflesch
Journal:  AMIA Annu Symp Proc       Date:  2007-10-11

8.  A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

Authors:  Min Jiang; Yukun Chen; Mei Liu; S Trent Rosenbloom; Subramani Mani; Joshua C Denny; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2011-04-20       Impact factor: 4.497

9.  The Unified Medical Language System.

Authors:  D A Lindberg; B L Humphreys; A T McCray
Journal:  Methods Inf Med       Date:  1993-08       Impact factor: 2.176

10.  Term identification methods for consumer health vocabulary development.

Authors:  Qing T Zeng; Tony Tse; Guy Divita; Alla Keselman; Jon Crowell; Allen C Browne; Sergey Goryachev; Long Ngo
Journal:  J Med Internet Res       Date:  2007-02-28       Impact factor: 5.428

View more
  3 in total

1.  Mining clinical phrases from nursing notes to discover risk factors of patient deterioration.

Authors:  Zfania Tom Korach; Jie Yang; Sarah Collins Rossetti; Kenrick D Cato; Min-Jeoung Kang; Christopher Knaplund; Kumiko O Schnock; Jose P Garcia; Haomiao Jia; Jessica M Schwartz; Li Zhou
Journal:  Int J Med Inform       Date:  2019-12-14       Impact factor: 4.046

2.  Biobanks and electronic medical records: enabling cost-effective research.

Authors:  Erica Bowton; Julie R Field; Sunny Wang; Jonathan S Schildcrout; Sara L Van Driest; Jessica T Delaney; James Cowan; Peter Weeke; Jonathan D Mosley; Quinn S Wells; Jason H Karnes; Christian Shaffer; Josh F Peterson; Joshua C Denny; Dan M Roden; Jill M Pulley
Journal:  Sci Transl Med       Date:  2014-04-30       Impact factor: 17.956

3.  Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system.

Authors:  Kristina Doing-Harris; Yarden Livnat; Stephane Meystre
Journal:  J Biomed Semantics       Date:  2015-04-02
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.