Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A Part-Of-Speech term weighting scheme for biomedical information retrieval.

Literature DB >> 27593166

A Part-Of-Speech term weighting scheme for biomedical information retrieval.

Yanshan Wang¹, Stephen Wu², Dingcheng Li³, Saeed Mehrabi⁴, Hongfang Liu⁵.

Abstract

In the era of digitalization, information retrieval (IR), which retrieves and ranks documents from large collections according to users' search queries, has been popularly applied in the biomedical domain. Building patient cohorts using electronic health records (EHRs) and searching literature for topics of interest are some IR use cases. Meanwhile, natural language processing (NLP), such as tokenization or Part-Of-Speech (POS) tagging, has been developed for processing clinical documents or biomedical literature. We hypothesize that NLP can be incorporated into IR to strengthen the conventional IR models. In this study, we propose two NLP-empowered IR models, POS-BoW and POS-MRF, which incorporate automatic POS-based term weighting schemes into bag-of-word (BoW) and Markov Random Field (MRF) IR models, respectively. In the proposed models, the POS-based term weights are iteratively calculated by utilizing a cyclic coordinate method where golden section line search algorithm is applied along each coordinate to optimize the objective function defined by mean average precision (MAP). In the empirical experiments, we used the data sets from the Medical Records track in Text REtrieval Conference (TREC) 2011 and 2012 and the Genomics track in TREC 2004. The evaluation on TREC 2011 and 2012 Medical Records tracks shows that, for the POS-BoW models, the mean improvement rates for IR evaluation metrics, MAP, bpref, and P@10, are 10.88%, 4.54%, and 3.82%, compared to the BoW models; and for the POS-MRF models, these rates are 13.59%, 8.20%, and 8.78%, compared to the MRF models. Additionally, we experimentally verify that the proposed weighting approach is superior to the simple heuristic and frequency based weighting approaches, and validate our POS category selection. Using the optimal weights calculated in this experiment, we tested the proposed models on the TREC 2004 Genomics track and obtained average of 8.63% and 10.04% improvement rates for POS-BoW and POS-MRF, respectively. These significant improvements verify the effectiveness of leveraging POS tagging for biomedical IR tasks.

Entities: Chemical Disease Gene Species

Keywords: Bag-of-word; Biomedical information retrieval; Markov random field; Natural language processing; Part-Of-Speech

Mesh：

Year: 2016 PMID： 27593166 PMCID： PMC5493484 DOI： 10.1016/j.jbi.2016.08.026

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

5 in total

1. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.

Authors: Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute
Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497

2. The "meaningful use" regulation for electronic health records.

Authors: David Blumenthal; Marilyn Tavenner
Journal: N Engl J Med Date: 2010-07-13 Impact factor: 91.245

3. Using large clinical corpora for query expansion in text-based cohort identification.

Authors: Dongqing Zhu; Stephen Wu; Ben Carterette; Hongfang Liu
Journal: J Biomed Inform Date: 2014-03-26 Impact factor: 6.317

4. Computer-facilitated review of electronic medical records reliably identifies emergency department interventions in older adults.

Authors: Kevin J Biese; Cory R Forbach; Richard P Medlin; Timothy F Platts-Mills; Matthew J Scholer; Brenda McCall; Frances S Shofer; Michael LaMantia; Cherri Hobgood; J S Kizer; Jan Busby-Whitehead; Charles B Cairns
Journal: Acad Emerg Med Date: 2013-06 Impact factor: 3.451

5. Care episode retrieval: distributional semantic models for information retrieval in the clinical domain.

Authors: Hans Moen; Filip Ginter; Erwin Marsi; Laura-Maria Peltonen; Tapio Salakoski; Sanna Salanterä
Journal: BMC Med Inform Decis Mak Date: 2015-06-15 Impact factor: 2.796

5 in total

2 in total

1. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts.

Authors: Yanshan Wang; Majid Rastegar-Mojarad; Ravikumar Komandur-Elayavilli; Hongfang Liu
Journal: Database (Oxford) Date: 2017-01-01 Impact factor: 3.451

2. Lexicon-enhanced sentiment analysis framework using rule-based classification scheme.

Authors: Muhammad Zubair Asghar; Aurangzeb Khan; Shakeel Ahmad; Maria Qasim; Imran Ali Khan
Journal: PLoS One Date: 2017-02-23 Impact factor: 3.240

2 in total