Literature DB >> 22085698

Enhancing clinical concept extraction with distributional semantics.

Siddhartha Jonnalagadda1, Trevor Cohen, Stephen Wu, Graciela Gonzalez.   

Abstract

Extracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives constitutes a basic enabling technology to unlock the knowledge within and support more advanced reasoning applications such as diagnosis explanation, disease progression modeling, and intelligent analysis of the effectiveness of treatment. The recent release of annotated training sets of de-identified clinical narratives has contributed to the development and refinement of concept extraction methods. However, as the annotation process is labor-intensive, training data are necessarily limited in the concepts and concept patterns covered, which impacts the performance of supervised machine learning applications trained with these data. This paper proposes an approach to minimize this limitation by combining supervised machine learning with empirical learning of semantic relatedness from the distribution of the relevant words in additional unannotated text. The approach uses a sequential discriminative classifier (Conditional Random Fields) to extract the mentions of medical problems, treatments and tests from clinical narratives. It takes advantage of all Medline abstracts indexed as being of the publication type "clinical trials" to estimate the relatedness between words in the i2b2/VA training and testing corpora. In addition to the traditional features such as dictionary matching, pattern matching and part-of-speech tags, we also used as a feature words that appear in similar contexts to the word in question (that is, words that have a similar vector representation measured with the commonly used cosine metric, where vector representations are derived using methods of distributional semantics). To the best of our knowledge, this is the first effort exploring the use of distributional semantics, the semantics derived empirically from unannotated text often using vector space models, for a sequence classification task such as concept extraction. Therefore, we first experimented with different sliding window models and found the model with parameters that led to best performance in a preliminary sequence labeling task. The evaluation of this approach, performed against the i2b2/VA concept extraction corpus, showed that incorporating features based on the distribution of words across a large unannotated corpus significantly aids concept extraction. Compared to a supervised-only approach as a baseline, the micro-averaged F-score for exact match increased from 80.3% to 82.3% and the micro-averaged F-score based on inexact match increased from 89.7% to 91.3%. These improvements are highly significant according to the bootstrap resampling method and also considering the performance of other systems. Thus, distributional semantic features significantly improve the performance of concept extraction from clinical narratives by taking advantage of word distribution information obtained from unannotated data.
Copyright © 2011 Elsevier Inc. All rights reserved.

Entities:  

Mesh:

Year:  2011        PMID: 22085698      PMCID: PMC3272090          DOI: 10.1016/j.jbi.2011.10.007

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  24 in total

1.  Automatic identification of pneumonia related concepts on chest x-ray reports.

Authors:  M Fiszman; W W Chapman; S R Evans; P J Haug
Journal:  Proc AMIA Symp       Date:  1999

2.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors:  A R Aronson
Journal:  Proc AMIA Symp       Date:  2001

3.  A comparison of the Charlson comorbidities derived from medical language processing and administrative data.

Authors:  Jen-Hsiang Chuang; Carol Friedman; George Hripcsak
Journal:  Proc AMIA Symp       Date:  2002

4.  Extracting structured information from free text pathology reports.

Authors:  Gunther Schadow; Clement J McDonald
Journal:  AMIA Annu Symp Proc       Date:  2003

5.  Automation of a problem list using natural language processing.

Authors:  Stephane Meystre; Peter J Haug
Journal:  BMC Med Inform Decis Mak       Date:  2005-08-31       Impact factor: 2.796

6.  Natural language processing to extract medical problems from electronic clinical documents: performance evaluation.

Authors:  Stéphane Meystre; Peter J Haug
Journal:  J Biomed Inform       Date:  2005-12-05       Impact factor: 6.317

7.  Representing word meaning and order information in a composite holographic lexicon.

Authors:  Michael N Jones; Douglas J K Mewhort
Journal:  Psychol Rev       Date:  2007-01       Impact factor: 8.934

8.  Holographic reduced representations.

Authors:  T A Plate
Journal:  IEEE Trans Neural Netw       Date:  1995

9.  BANNER: an executable survey of advances in biomedical named entity recognition.

Authors:  Robert Leaman; Graciela Gonzalez
Journal:  Pac Symp Biocomput       Date:  2008

10.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system.

Authors:  Qing T Zeng; Sergey Goryachev; Scott Weiss; Margarita Sordo; Shawn N Murphy; Ross Lazarus
Journal:  BMC Med Inform Decis Mak       Date:  2006-07-26       Impact factor: 2.796

View more
  21 in total

1.  Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules.

Authors:  Siddhartha Reddy Jonnalagadda; Dingcheng Li; Sunghwan Sohn; Stephen Tze-Inn Wu; Kavishwar Wagholikar; Manabu Torii; Hongfang Liu
Journal:  J Am Med Inform Assoc       Date:  2012-06-16       Impact factor: 4.497

2.  Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition.

Authors:  Yonghui Wu; Xi Yang; Jiang Bian; Yi Guo; Hua Xu; William Hogan
Journal:  AMIA Annu Symp Proc       Date:  2018-12-05

3.  Feature extraction for phenotyping from semantic and knowledge resources.

Authors:  Wenxin Ning; Stephanie Chan; Andrew Beam; Ming Yu; Alon Geva; Katherine Liao; Mary Mullen; Kenneth D Mandl; Isaac Kohane; Tianxi Cai; Sheng Yu
Journal:  J Biomed Inform       Date:  2019-02-07       Impact factor: 6.317

4.  Recurrent neural networks for classifying relations in clinical notes.

Authors:  Yuan Luo
Journal:  J Biomed Inform       Date:  2017-07-08       Impact factor: 6.317

5.  Classifying relations in clinical narratives using segment graph convolutional and recurrent neural networks (Seg-GCRNs).

Authors:  Yifu Li; Ran Jin; Yuan Luo
Journal:  J Am Med Inform Assoc       Date:  2019-03-01       Impact factor: 4.497

6.  A new iterative method to reduce workload in systematic review process.

Authors:  Siddhartha Jonnalagadda; Diana Petitti
Journal:  Int J Comput Biol Drug Des       Date:  2013-02-21

7.  Speculation detection for Chinese clinical notes: Impacts of word segmentation and embedding models.

Authors:  Shaodian Zhang; Tian Kang; Xingting Zhang; Dong Wen; Noémie Elhadad; Jianbo Lei
Journal:  J Biomed Inform       Date:  2016-02-26       Impact factor: 6.317

8.  A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text.

Authors:  Yonghui Wu; Jun Xu; Min Jiang; Yaoyun Zhang; Hua Xu
Journal:  AMIA Annu Symp Proc       Date:  2015-11-05

9.  Segment convolutional neural networks (Seg-CNNs) for classifying relations in clinical notes.

Authors:  Yuan Luo; Yu Cheng; Özlem Uzuner; Peter Szolovits; Justin Starren
Journal:  J Am Med Inform Assoc       Date:  2018-01-01       Impact factor: 4.497

10.  Leveraging output term co-occurrence frequencies and latent associations in predicting medical subject headings.

Authors:  Ramakanth Kavuluru; Yuan Lu
Journal:  Data Knowl Eng       Date:  2014-09-18       Impact factor: 1.992

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.