Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database.

Literature DB >> 30766970

Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database.

Abstract

Many investigators have carried out text mining of the biomedical literature for a variety of purposes, ranging from the assignment of indexing terms to the disambiguation of author names. A common approach is to define positive and negative training examples, extract features from article metadata, and employ machine learning algorithms. At present, each research group tackles each problem from scratch, and in isolation of other projects, which causes redundancy and great waste of effort. Here, we propose and describe the design of a generic platform for biomedical text mining, which can serve as a shared resource for machine learning projects, and can serve as a public repository for their outputs. We will initially focus on a specific goal, namely, classifying articles according to Publication Type, and emphasize how feature sets can be made more powerful and robust through the use of multiple, heterogeneous similarity measures as input to machine learning models. We then discuss how the generic platform can be extended to include a wide variety of other machine learning based goals and projects, and can be used as a public platform for disseminating the results of NLP tools to end-users as well.

Entities: Chemical Disease Species

Keywords: Text mining; community platforms; data sharing; machine learning; open science; semantic similarity; vector representation

Year: 2018 PMID： 30766970 PMCID： PMC6372120 DOI： 10.2478/dim-2018-0004

Source DB: PubMed Journal: Data Inf Manag ISSN： 2543-9251

14 in total

1. A probabilistic similarity metric for Medline records: a model for author name disambiguation.

Authors: Vetle I Torvik; Marc Weeber; Don R Swanson; Neil R Smalheiser
Journal: AMIA Annu Symp Proc Date: 2003

2. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.

Authors: Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute
Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497

3. Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection.

Authors: Xueqiang Zeng; Gang Luo
Journal: Health Inf Sci Syst Date: 2017-09-27

4. Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.

Authors: Neil R Smalheiser; Aaron M Cohen; Gary Bonifield
Journal: J Biomed Inform Date: 2019-01-14 Impact factor: 6.317

5. Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine.

Authors: Aaron M Cohen; Neil R Smalheiser; Marian S McDonagh; Clement Yu; Clive E Adams; John M Davis; Philip S Yu
Journal: J Am Med Inform Assoc Date: 2015-02-05 Impact factor: 4.497

6. Three journal similarity metrics and their application to biomedical journals.

Authors: Jennifer L D'Souza; Neil R Smalheiser
Journal: PLoS One Date: 2014-12-23 Impact factor: 3.240

7. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials.

Authors: Iain J Marshall; Joël Kuiper; Byron C Wallace
Journal: J Am Med Inform Assoc Date: 2015-06-22 Impact factor: 4.497

Review 8. Text mining resources for the life sciences.

Authors: Piotr Przybyła; Matthew Shardlow; Sophie Aubin; Robert Bossy; Richard Eckart de Castilho; Stelios Piperidis; John McNaught; Sophia Ananiadou
Journal: Database (Oxford) Date: 2016-11-25 Impact factor: 3.451

9. Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation.

Authors: Neil R Smalheiser; Gary Bonifield
Journal: J Biomed Discov Collab Date: 2016-04-06

10. Topic detection using paragraph vectors to support active learning in systematic reviews.

Authors: Kazuma Hashimoto; Georgios Kontonatsios; Makoto Miwa; Sophia Ananiadou
Journal: J Biomed Inform Date: 2016-06-10 Impact factor: 6.317

1 in total

1. Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.

Authors: Neil R Smalheiser; Aaron M Cohen; Gary Bonifield
Journal: J Biomed Inform Date: 2019-01-14 Impact factor: 6.317

1 in total