Literature DB >> 30766970

Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database.

Neil R Smalheiser1, Aaron M Cohen2.   

Abstract

Many investigators have carried out text mining of the biomedical literature for a variety of purposes, ranging from the assignment of indexing terms to the disambiguation of author names. A common approach is to define positive and negative training examples, extract features from article metadata, and employ machine learning algorithms. At present, each research group tackles each problem from scratch, and in isolation of other projects, which causes redundancy and great waste of effort. Here, we propose and describe the design of a generic platform for biomedical text mining, which can serve as a shared resource for machine learning projects, and can serve as a public repository for their outputs. We will initially focus on a specific goal, namely, classifying articles according to Publication Type, and emphasize how feature sets can be made more powerful and robust through the use of multiple, heterogeneous similarity measures as input to machine learning models. We then discuss how the generic platform can be extended to include a wide variety of other machine learning based goals and projects, and can be used as a public platform for disseminating the results of NLP tools to end-users as well.

Entities:  

Keywords:  Text mining; community platforms; data sharing; machine learning; open science; semantic similarity; vector representation

Year:  2018        PMID: 30766970      PMCID: PMC6372120          DOI: 10.2478/dim-2018-0004

Source DB:  PubMed          Journal:  Data Inf Manag        ISSN: 2543-9251


  14 in total

1.  A probabilistic similarity metric for Medline records: a model for author name disambiguation.

Authors:  Vetle I Torvik; Marc Weeber; Don R Swanson; Neil R Smalheiser
Journal:  AMIA Annu Symp Proc       Date:  2003

2.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.

Authors:  Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute
Journal:  J Am Med Inform Assoc       Date:  2010 Sep-Oct       Impact factor: 4.497

3.  Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection.

Authors:  Xueqiang Zeng; Gang Luo
Journal:  Health Inf Sci Syst       Date:  2017-09-27

4.  Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.

Authors:  Neil R Smalheiser; Aaron M Cohen; Gary Bonifield
Journal:  J Biomed Inform       Date:  2019-01-14       Impact factor: 6.317

5.  Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine.

Authors:  Aaron M Cohen; Neil R Smalheiser; Marian S McDonagh; Clement Yu; Clive E Adams; John M Davis; Philip S Yu
Journal:  J Am Med Inform Assoc       Date:  2015-02-05       Impact factor: 4.497

6.  Three journal similarity metrics and their application to biomedical journals.

Authors:  Jennifer L D'Souza; Neil R Smalheiser
Journal:  PLoS One       Date:  2014-12-23       Impact factor: 3.240

7.  RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials.

Authors:  Iain J Marshall; Joël Kuiper; Byron C Wallace
Journal:  J Am Med Inform Assoc       Date:  2015-06-22       Impact factor: 4.497

Review 8.  Text mining resources for the life sciences.

Authors:  Piotr Przybyła; Matthew Shardlow; Sophie Aubin; Robert Bossy; Richard Eckart de Castilho; Stelios Piperidis; John McNaught; Sophia Ananiadou
Journal:  Database (Oxford)       Date:  2016-11-25       Impact factor: 3.451

9.  Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation.

Authors:  Neil R Smalheiser; Gary Bonifield
Journal:  J Biomed Discov Collab       Date:  2016-04-06

10.  Topic detection using paragraph vectors to support active learning in systematic reviews.

Authors:  Kazuma Hashimoto; Georgios Kontonatsios; Makoto Miwa; Sophia Ananiadou
Journal:  J Biomed Inform       Date:  2016-06-10       Impact factor: 6.317

View more
  1 in total

1.  Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.

Authors:  Neil R Smalheiser; Aaron M Cohen; Gary Bonifield
Journal:  J Biomed Inform       Date:  2019-01-14       Impact factor: 6.317

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.