Literature DB >> 24680983

Using large clinical corpora for query expansion in text-based cohort identification.

Dongqing Zhu1, Stephen Wu2, Ben Carterette3, Hongfang Liu4.   

Abstract

In light of the heightened problems of polysemy, synonymy, and hyponymy in clinical text, we hypothesize that patient cohort identification can be improved by using a large, in-domain clinical corpus for query expansion. We evaluate the utility of four auxiliary collections for the Text REtrieval Conference task of IR-based cohort retrieval, considering the effects of collection size, the inherent difficulty of a query, and the interaction between the collections. Each collection was applied to aid in cohort retrieval from the Pittsburgh NLP Repository by using a mixture of relevance models. Measured by mean average precision, performance using any auxiliary resource (MAP=0.386 and above) is shown to improve over the baseline query likelihood model (MAP=0.373). Considering subsets of the Mayo Clinic collection, we found that after including 2.5 billion term instances, retrieval is not improved by adding more instances. However, adding the Mayo Clinic collection did improve performance significantly over any existing setup, with a system using all four auxiliary collections obtaining the best results (MAP=0.4223). Because optimal results in the mixture of relevance models would require selective sampling of the collections, the common sense approach of "use all available data" is inappropriate. However, we found that it was still beneficial to add the Mayo corpus to any mixture of relevance models. On the task of IR-based cohort identification, query expansion with the Mayo Clinic corpus resulted in consistent and significant improvements. As such, any IR query expansion with access to a large clinical corpus could benefit from the additional resource. Additionally, we have shown that more data is not necessarily better, implying that there is value in collection curation.
Copyright © 2014 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Clinical text; Cohort identification; Electronic medical records; Information retrieval; Query expansion

Mesh:

Year:  2014        PMID: 24680983      PMCID: PMC4058413          DOI: 10.1016/j.jbi.2014.03.010

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  15 in total

1.  Evaluation of negation phrases in narrative clinical reports.

Authors:  W W Chapman; W Bridewell; P Hanbury; G F Cooper; B G Buchanan
Journal:  Proc AMIA Symp       Date:  2001

2.  The Unified Medical Language System (UMLS): integrating biomedical terminology.

Authors:  Olivier Bodenreider
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  Semantic characteristics of NLP-extracted concepts in clinical notes vs. biomedical literature.

Authors:  Stephen Wu; Hongfang Liu
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

4.  Query log analysis of an electronic health record search engine.

Authors:  Lei Yang; Qiaozhu Mei; Kai Zheng; David A Hanauer
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

5.  Visual query tool for finding patient cohorts from a clinical data warehouse of the partners HealthCare system

Authors: 
Journal:  Proc AMIA Symp       Date:  2000

6.  The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies.

Authors:  Catherine A McCarty; Rex L Chisholm; Christopher G Chute; Iftikhar J Kullo; Gail P Jarvik; Eric B Larson; Rongling Li; Daniel R Masys; Marylyn D Ritchie; Dan M Roden; Jeffery P Struewing; Wendy A Wolf
Journal:  BMC Med Genomics       Date:  2011-01-26       Impact factor: 3.063

7.  ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports.

Authors:  Henk Harkema; John N Dowling; Tyler Thornblade; Wendy W Chapman
Journal:  J Biomed Inform       Date:  2009-05-10       Impact factor: 6.317

8.  Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis.

Authors:  Stephen T Wu; Hongfang Liu; Dingcheng Li; Cui Tao; Mark A Musen; Christopher G Chute; Nigam H Shah
Journal:  J Am Med Inform Assoc       Date:  2012-04-04       Impact factor: 4.497

9.  Evaluating the informatics for integrating biology and the bedside system for clinical research.

Authors:  Vikrant G Deshmukh; Stéphane M Meystre; Joyce A Mitchell
Journal:  BMC Med Res Methodol       Date:  2009-10-28       Impact factor: 4.615

Review 10.  Forty years of SNOMED: a literature review.

Authors:  Ronald Cornet; Nicolette de Keizer
Journal:  BMC Med Inform Decis Mak       Date:  2008-10-27       Impact factor: 2.796

View more
  5 in total

1.  Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2.

Authors:  W Chen; R Kowatch; S Lin; M Splaingard; Y Huang
Journal:  Appl Clin Inform       Date:  2015-05-27       Impact factor: 2.342

2.  Extracting similar terms from multiple EMR-based semantic embeddings to support chart reviews.

Authors:  Cheng Ye; Daniel Fabbri
Journal:  J Biomed Inform       Date:  2018-05-22       Impact factor: 6.317

3.  A Part-Of-Speech term weighting scheme for biomedical information retrieval.

Authors:  Yanshan Wang; Stephen Wu; Dingcheng Li; Saeed Mehrabi; Hongfang Liu
Journal:  J Biomed Inform       Date:  2016-09-01       Impact factor: 6.317

4.  Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task.

Authors:  Steven R Chamberlin; Steven D Bedrick; Aaron M Cohen; Yanshan Wang; Andrew Wen; Sijia Liu; Hongfang Liu; William R Hersh
Journal:  JAMIA Open       Date:  2020-07-26

5.  Aligned-Layer Text Search in Clinical Notes.

Authors:  Stephen Wu; Andrew Wen; Yanshan Wang; Sijia Liu; Hongfang Liu
Journal:  Stud Health Technol Inform       Date:  2017
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.