Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Enhancing clinical concept extraction with distributional semantics.

Literature DB >> 22085698

Enhancing clinical concept extraction with distributional semantics.

Siddhartha Jonnalagadda¹, Trevor Cohen, Stephen Wu, Graciela Gonzalez.

Abstract

Extracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives constitutes a basic enabling technology to unlock the knowledge within and support more advanced reasoning applications such as diagnosis explanation, disease progression modeling, and intelligent analysis of the effectiveness of treatment. The recent release of annotated training sets of de-identified clinical narratives has contributed to the development and refinement of concept extraction methods. However, as the annotation process is labor-intensive, training data are necessarily limited in the concepts and concept patterns covered, which impacts the performance of supervised machine learning applications trained with these data. This paper proposes an approach to minimize this limitation by combining supervised machine learning with empirical learning of semantic relatedness from the distribution of the relevant words in additional unannotated text. The approach uses a sequential discriminative classifier (Conditional Random Fields) to extract the mentions of medical problems, treatments and tests from clinical narratives. It takes advantage of all Medline abstracts indexed as being of the publication type "clinical trials" to estimate the relatedness between words in the i2b2/VA training and testing corpora. In addition to the traditional features such as dictionary matching, pattern matching and part-of-speech tags, we also used as a feature words that appear in similar contexts to the word in question (that is, words that have a similar vector representation measured with the commonly used cosine metric, where vector representations are derived using methods of distributional semantics). To the best of our knowledge, this is the first effort exploring the use of distributional semantics, the semantics derived empirically from unannotated text often using vector space models, for a sequence classification task such as concept extraction. Therefore, we first experimented with different sliding window models and found the model with parameters that led to best performance in a preliminary sequence labeling task. The evaluation of this approach, performed against the i2b2/VA concept extraction corpus, showed that incorporating features based on the distribution of words across a large unannotated corpus significantly aids concept extraction. Compared to a supervised-only approach as a baseline, the micro-averaged F-score for exact match increased from 80.3% to 82.3% and the micro-averaged F-score based on inexact match increased from 89.7% to 91.3%. These improvements are highly significant according to the bootstrap resampling method and also considering the performance of other systems. Thus, distributional semantic features significantly improve the performance of concept extraction from clinical narratives by taking advantage of word distribution information obtained from unannotated data.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2011 PMID： 22085698 PMCID： PMC3272090 DOI： 10.1016/j.jbi.2011.10.007

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

24 in total

1. Automatic identification of pneumonia related concepts on chest x-ray reports.

Authors: M Fiszman; W W Chapman; S R Evans; P J Haug
Journal: Proc AMIA Symp Date: 1999

2. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors: A R Aronson
Journal: Proc AMIA Symp Date: 2001

3. A comparison of the Charlson comorbidities derived from medical language processing and administrative data.

Authors: Jen-Hsiang Chuang; Carol Friedman; George Hripcsak
Journal: Proc AMIA Symp Date: 2002

4. Extracting structured information from free text pathology reports.

Authors: Gunther Schadow; Clement J McDonald
Journal: AMIA Annu Symp Proc Date: 2003

5. Automation of a problem list using natural language processing.

Authors: Stephane Meystre; Peter J Haug
Journal: BMC Med Inform Decis Mak Date: 2005-08-31 Impact factor: 2.796

6. Natural language processing to extract medical problems from electronic clinical documents: performance evaluation.

Authors: Stéphane Meystre; Peter J Haug
Journal: J Biomed Inform Date: 2005-12-05 Impact factor: 6.317

7. Representing word meaning and order information in a composite holographic lexicon.

Authors: Michael N Jones; Douglas J K Mewhort
Journal: Psychol Rev Date: 2007-01 Impact factor: 8.934

8. Holographic reduced representations.

Authors: T A Plate
Journal: IEEE Trans Neural Netw Date: 1995

9. BANNER: an executable survey of advances in biomedical named entity recognition.

Authors: Robert Leaman; Graciela Gonzalez
Journal: Pac Symp Biocomput Date: 2008

10. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system.

Authors: Qing T Zeng; Sergey Goryachev; Scott Weiss; Margarita Sordo; Shawn N Murphy; Ross Lazarus
Journal: BMC Med Inform Decis Mak Date: 2006-07-26 Impact factor: 2.796

21 in total

1. Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules.

Authors: Siddhartha Reddy Jonnalagadda; Dingcheng Li; Sunghwan Sohn; Stephen Tze-Inn Wu; Kavishwar Wagholikar; Manabu Torii; Hongfang Liu
Journal: J Am Med Inform Assoc Date: 2012-06-16 Impact factor: 4.497

2. Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition.

Authors: Yonghui Wu; Xi Yang; Jiang Bian; Yi Guo; Hua Xu; William Hogan
Journal: AMIA Annu Symp Proc Date: 2018-12-05

3. Feature extraction for phenotyping from semantic and knowledge resources.

Authors: Wenxin Ning; Stephanie Chan; Andrew Beam; Ming Yu; Alon Geva; Katherine Liao; Mary Mullen; Kenneth D Mandl; Isaac Kohane; Tianxi Cai; Sheng Yu
Journal: J Biomed Inform Date: 2019-02-07 Impact factor: 6.317

4. Recurrent neural networks for classifying relations in clinical notes.

Authors: Yuan Luo
Journal: J Biomed Inform Date: 2017-07-08 Impact factor: 6.317

5. Classifying relations in clinical narratives using segment graph convolutional and recurrent neural networks (Seg-GCRNs).

Authors: Yifu Li; Ran Jin; Yuan Luo
Journal: J Am Med Inform Assoc Date: 2019-03-01 Impact factor: 4.497