Literature DB >> 27001195

Improving the utility of MeSH® terms using the TopicalMeSH representation.

Zhiguo Yu1, Elmer Bernstam2, Trevor Cohen1, Byron C Wallace3, Todd R Johnson4.   

Abstract

OBJECTIVE: To evaluate whether vector representations encoding latent topic proportions that capture similarities to MeSH terms can improve performance on biomedical document retrieval and classification tasks, compared to using MeSH terms.
MATERIALS AND METHODS: We developed the TopicalMeSH representation, which exploits the 'correspondence' between topics generated using latent Dirichlet allocation (LDA) and MeSH terms to create new document representations that combine MeSH terms and latent topic vectors. We used 15 systematic drug review corpora to evaluate performance on information retrieval and classification tasks using this TopicalMeSH representation, compared to using standard encodings that rely on either (1) the original MeSH terms, (2) the text, or (3) their combination. For the document retrieval task, we compared the precision and recall achieved by ranking citations using MeSH and TopicalMeSH representations, respectively. For the classification task, we considered three supervised machine learning approaches, Support Vector Machines (SVMs), logistic regression, and decision trees. We used these to classify documents as relevant or irrelevant using (independently) MeSH, TopicalMeSH, Words (i.e., n-grams extracted from citation titles and abstracts, encoded via bag-of-words representation), a combination of MeSH and Words, and a combination of TopicalMeSH and Words. We also used SVM to compare the classification performance of tf-idf weighted MeSH terms, LDA Topics, a combination of Topics and MeSH, and TopicalMeSH to supervised LDA's classification performance.
RESULTS: For the document retrieval task, using the TopicalMeSH representation resulted in higher precision than MeSH in 11 of 15 corpora while achieving the same recall. For the classification task, use of TopicalMeSH features realized a higher F1 score in 14 of 15 corpora when used by SVMs, 12 of 15 corpora using logistic regression, and 12 of 15 corpora using decision trees. TopicalMeSH also had better document classification performance on 12 of 15 corpora when compared to Topics, tf-idf weighted MeSH terms, and a combination of Topics and MeSH using SVMs. Supervised LDA achieved the worst performance in most of the corpora.
CONCLUSION: The proposed TopicalMeSH representation (which combines MeSH terms with latent topics) consistently improved performance on document retrieval and classification tasks, compared to using alternative standard representations using MeSH terms alone, as well as, several standard alternative approaches.
Copyright © 2016 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Document classification; Document retrieval; MeSH; PubMed; Topic models

Mesh:

Year:  2016        PMID: 27001195      PMCID: PMC4893983          DOI: 10.1016/j.jbi.2016.03.013

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  17 in total

1.  MeSHmap: a text mining tool for MEDLINE.

Authors:  P Srinivasan
Journal:  Proc AMIA Symp       Date:  2001

2.  Using MeSH (medical subject headings) to enhance PubMed search strategies for evidence-based practice in physical therapy.

Authors:  Randy R Richter; Tricia M Austin
Journal:  Phys Ther       Date:  2011-10-06

3.  Reflective random indexing for semi-automatic indexing of the biomedical literature.

Authors:  Vidya Vasuki; Trevor Cohen
Journal:  J Biomed Inform       Date:  2010-04-09       Impact factor: 6.317

4.  Reducing workload in systematic review preparation using automated citation classification.

Authors:  A M Cohen; W R Hersh; K Peterson; Po-Yin Yen
Journal:  J Am Med Inform Assoc       Date:  2005-12-15       Impact factor: 4.497

5.  MeSHy: Mining unanticipated PubMed information using frequencies of occurrences and concurrences of MeSH terms.

Authors:  T Theodosiou; I S Vizirianakis; L Angelis; A Tsaftaris; N Darzentas
Journal:  J Biomed Inform       Date:  2011-06-13       Impact factor: 6.317

6.  Evaluating topic model interpretability from a primary care physician perspective.

Authors:  Corey W Arnold; Andrea Oh; Shawn Chen; William Speier
Journal:  Comput Methods Programs Biomed       Date:  2015-10-30       Impact factor: 5.428

7.  Topic models: a novel method for modeling couple and family text data.

Authors:  David C Atkins; Timothy N Rubin; Mark Steyvers; Michelle A Doeden; Brian R Baucom; Andrew Christensen
Journal:  J Fam Psychol       Date:  2012-08-13

Review 8.  PubMed and beyond: a survey of web tools for searching biomedical literature.

Authors:  Zhiyong Lu
Journal:  Database (Oxford)       Date:  2011-01-18       Impact factor: 3.451

9.  Recommending MeSH terms for annotating biomedical articles.

Authors:  Minlie Huang; Aurélie Névéol; Zhiyong Lu
Journal:  J Am Med Inform Assoc       Date:  2011-05-25       Impact factor: 4.497

10.  Discovering health topics in social media using topic models.

Authors:  Michael J Paul; Mark Dredze
Journal:  PLoS One       Date:  2014-08-01       Impact factor: 3.240

View more
  4 in total

1.  Initializing and Growing a Database of Health Information Technology (HIT) Events by Using TF-IDF and Biterm Topic Modeling.

Authors:  Hong Kang; Zhiguo Yu; Yang Gong
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

2.  Developing a similarity searching module for patient safety event reporting system using semantic similarity measures.

Authors:  Hong Kang; Yang Gong
Journal:  BMC Med Inform Decis Mak       Date:  2017-07-05       Impact factor: 2.796

3.  Effective biomedical document classification for identifying publications relevant to the mouse Gene Expression Database (GXD).

Authors:  Xiangying Jiang; Martin Ringwald; Judith Blake; Hagit Shatkay
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

4.  FasTag: Automatic text classification of unstructured medical narratives.

Authors:  Guhan Ram Venkataraman; Arturo Lopez Pineda; Oliver J Bear Don't Walk Iv; Ashley M Zehnder; Sandeep Ayyar; Rodney L Page; Carlos D Bustamante; Manuel A Rivas
Journal:  PLoS One       Date:  2020-06-22       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.