Literature DB >> 27531100

Corpus domain effects on distributional semantic modeling of medical terms.

Serguei V S Pakhomov1,2, Greg Finley2, Reed McEwan2, Yan Wang2, Genevieve B Melton2.   

Abstract

MOTIVATION: Automatically quantifying semantic similarity and relatedness between clinical terms is an important aspect of text mining from electronic health records, which are increasingly recognized as valuable sources of phenotypic information for clinical genomics and bioinformatics research. A key obstacle to development of semantic relatedness measures is the limited availability of large quantities of clinical text to researchers and developers outside of major medical centers. Text from general English and biomedical literature are freely available; however, their validity as a substitute for clinical domain to represent semantics of clinical terms remains to be demonstrated.
RESULTS: We constructed neural network representations of clinical terms found in a publicly available benchmark dataset manually labeled for semantic similarity and relatedness. Similarity and relatedness measures computed from text corpora in three domains (Clinical Notes, PubMed Central articles and Wikipedia) were compared using the benchmark as reference. We found that measures computed from full text of biomedical articles in PubMed Central repository (rho = 0.62 for similarity and 0.58 for relatedness) are on par with measures computed from clinical reports (rho = 0.60 for similarity and 0.57 for relatedness). We also evaluated the use of neural network based relatedness measures for query expansion in a clinical document retrieval task and a biomedical term word sense disambiguation task. We found that, with some limitations, biomedical articles may be used in lieu of clinical reports to represent the semantics of clinical terms and that distributional semantic methods are useful for clinical and biomedical natural language processing applications.
AVAILABILITY AND IMPLEMENTATION: The software and reference standards used in this study to evaluate semantic similarity and relatedness measures are publicly available as detailed in the article. CONTACT: pakh0002@umn.eduSupplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Year:  2016        PMID: 27531100      PMCID: PMC5181540          DOI: 10.1093/bioinformatics/btw529

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  20 in total

1.  Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation.

Authors:  P W Lord; R D Stevens; A Brass; C A Goble
Journal:  Bioinformatics       Date:  2003-07-01       Impact factor: 6.937

2.  Accuracy of mild traumatic brain injury case ascertainment using ICD-9 codes.

Authors:  Jeffrey J Bazarian; Peter Veazie; Sohug Mookerjee; E Brooke Lerner
Journal:  Acad Emerg Med       Date:  2005-12-19       Impact factor: 3.451

3.  Measures of semantic similarity and relatedness in the biomedical domain.

Authors:  Ted Pedersen; Serguei V S Pakhomov; Siddharth Patwardhan; Christopher G Chute
Journal:  J Biomed Inform       Date:  2006-06-10       Impact factor: 6.317

4.  A new method to measure the semantic similarity of GO terms.

Authors:  James Z Wang; Zhidian Du; Rapeeporn Payattakool; Philip S Yu; Chin-Fu Chen
Journal:  Bioinformatics       Date:  2007-03-07       Impact factor: 6.937

5.  Electronic medical records for clinical research: application to the identification of heart failure.

Authors:  Serguei Pakhomov; Susan A Weston; Steven J Jacobsen; Christopher G Chute; Ryan Meverden; Véronique L Roger
Journal:  Am J Manag Care       Date:  2007-06       Impact factor: 2.229

6.  Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text.

Authors:  Bridget T McInnes; Ted Pedersen
Journal:  J Biomed Inform       Date:  2013-09-04       Impact factor: 6.317

7.  Semantic similarity in the biomedical domain: an evaluation across knowledge sources.

Authors:  Vijay N Garla; Cynthia Brandt
Journal:  BMC Bioinformatics       Date:  2012-10-10       Impact factor: 3.169

Review 8.  Validity of heart failure diagnoses in administrative databases: a systematic review and meta-analysis.

Authors:  Natalie McCormick; Diane Lacaille; Vidula Bhole; J Antonio Avina-Zubieta
Journal:  PLoS One       Date:  2014-08-15       Impact factor: 3.240

9.  Billing code algorithms to identify cases of peripheral artery disease from administrative data.

Authors:  Jin Fan; Adelaide M Arruda-Olson; Cynthia L Leibson; Carin Smith; Guanghui Liu; Kent R Bailey; Iftikhar J Kullo
Journal:  J Am Med Inform Assoc       Date:  2013-10-28       Impact factor: 4.497

10.  NLP-PIER: A Scalable Natural Language Processing, Indexing, and Searching Architecture for Clinical Notes.

Authors:  Reed McEwan; Genevieve B Melton; Benjamin C Knoll; Yan Wang; Gretchen Hultman; Justin L Dale; Tim Meyer; Serguei V Pakhomov
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2016-07-20
View more
  33 in total

1.  A comparison of word embeddings for the biomedical natural language processing.

Authors:  Yanshan Wang; Sijia Liu; Naveed Afzal; Majid Rastegar-Mojarad; Liwei Wang; Feichen Shen; Paul Kingsbury; Hongfang Liu
Journal:  J Biomed Inform       Date:  2018-09-12       Impact factor: 6.317

2.  Enhancing clinical concept extraction with contextual embeddings.

Authors:  Yuqi Si; Jingqi Wang; Hua Xu; Kirk Roberts
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

3.  deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.

Authors:  Ahmad Pesaranghader; Stan Matwin; Marina Sokolova; Ali Pesaranghader
Journal:  J Am Med Inform Assoc       Date:  2019-05-01       Impact factor: 4.497

4.  Using Convolutional Neural Networks to Support Insertion of New Concepts into SNOMED CT.

Authors:  Hao Liu; James Geller; Michael Halper; Yehoshua Perl
Journal:  AMIA Annu Symp Proc       Date:  2018-12-05

5.  Feature extraction for phenotyping from semantic and knowledge resources.

Authors:  Wenxin Ning; Stephanie Chan; Andrew Beam; Ming Yu; Alon Geva; Katherine Liao; Mary Mullen; Kenneth D Mandl; Isaac Kohane; Tianxi Cai; Sheng Yu
Journal:  J Biomed Inform       Date:  2019-02-07       Impact factor: 6.317

6.  Embedding of semantic predications.

Authors:  Trevor Cohen; Dominic Widdows
Journal:  J Biomed Inform       Date:  2017-03-08       Impact factor: 6.317

7.  Extracting similar terms from multiple EMR-based semantic embeddings to support chart reviews.

Authors:  Cheng Ye; Daniel Fabbri
Journal:  J Biomed Inform       Date:  2018-05-22       Impact factor: 6.317

8.  Psychiatric symptom recognition without labeled data using distributional representations of phrases and on-line knowledge.

Authors:  Yaoyun Zhang; Olivia Zhang; Yonghui Wu; Hee-Jin Lee; Jun Xu; Hua Xu; Kirk Roberts
Journal:  J Biomed Inform       Date:  2017-06-15       Impact factor: 6.317

9.  Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.

Authors:  Steven Jiang; Weiyi Wu; Naofumi Tomita; Craig Ganoe; Saeed Hassanpour
Journal:  J Biomed Inform       Date:  2020-10-01       Impact factor: 6.317

10.  Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings

Authors:  Akm Sabbir; Antonio Jimeno-Yepes; Ramakanth Kavuluru
Journal:  Proc IEEE Int Symp Bioinformatics Bioeng       Date:  2018-01-11
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.