Literature DB >> 18436913

Optimal training sets for Bayesian prediction of MeSH assignment.

Sunghwan Sohn1, Won Kim, Donald C Comeau, W John Wilbur.   

Abstract

OBJECTIVES: The aim of this study was to improve naïve Bayes prediction of Medical Subject Headings (MeSH) assignment to documents using optimal training sets found by an active learning inspired method.
DESIGN: The authors selected 20 MeSH terms whose occurrences cover a range of frequencies. For each MeSH term, they found an optimal training set, a subset of the whole training set. An optimal training set consists of all documents including a given MeSH term (C1 class) and those documents not including a given MeSH term (C(-1) class) that are closest to the C1 class. These small sets were used to predict MeSH assignments in the MEDLINE database. MEASUREMENTS: Average precision was used to compare MeSH assignment using the naïve Bayes learner trained on the whole training set, optimal sets, and random sets. The authors compared 95% lower confidence limits of average precisions of naïve Bayes with upper bounds for average precisions of a K-nearest neighbor (KNN) classifier.
RESULTS: For all 20 MeSH assignments, the optimal training sets produced nearly 200% improvement over use of the whole training sets. In 17 of those MeSH assignments, naïve Bayes using optimal training sets was statistically better than a KNN. In 15 of those, optimal training sets performed better than optimized feature selection. Overall naïve Bayes averaged 14% better than a KNN for all 20 MeSH assignments. Using these optimal sets with another classifier, C-modified least squares (CMLS), produced an additional 6% improvement over naïve Bayes.
CONCLUSION: Using a smaller optimal training set greatly improved learning with naïve Bayes. The performance is superior to a KNN. The small training set can be used with other sophisticated learning methods, such as CMLS, where using the whole training set would not be feasible.

Mesh:

Year:  2008        PMID: 18436913      PMCID: PMC2442263          DOI: 10.1197/jamia.M2431

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  9 in total

1.  The NLM Indexing Initiative.

Authors:  A R Aronson; O Bodenreider; H F Chang; S M Humphrey; J G Mork; S J Nelson; T C Rindflesch; W J Wilbur
Journal:  Proc AMIA Symp       Date:  2000

2.  Boosting naïve Bayesian learning on a large subset of MEDLINE.

Authors:  W J Wilbur
Journal:  Proc AMIA Symp       Date:  2000

3.  Automatic MeSH term assignment and quality assessment.

Authors:  W Kim; A R Aronson; W J Wilbur
Journal:  Proc AMIA Symp       Date:  2001

4.  The NLM Indexing Initiative's Medical Text Indexer.

Authors:  Alan R Aronson; James G Mork; Clifford W Gay; Susanne M Humphrey; Willie J Rogers
Journal:  Stud Health Technol Inform       Date:  2004

5.  Automatic assignment of biomedical categories: toward a generic approach.

Authors:  Patrick Ruch
Journal:  Bioinformatics       Date:  2005-11-15       Impact factor: 6.937

6.  An experiment comparing lexical and statistical methods for extracting MeSH terms from clinical free text.

Authors:  G F Cooper; R A Miller
Journal:  J Am Med Inform Assoc       Date:  1998 Jan-Feb       Impact factor: 4.497

7.  A strategy for assigning new concepts in the MEDLINE database.

Authors:  Won Kim; W John Wilbur
Journal:  AMIA Annu Symp Proc       Date:  2005

8.  Categorization by reference: a novel approach to MeSH term assignment.

Authors:  V Kouramajian; V Devadhar; J Fowler; S Maram
Journal:  Proc Annu Symp Comput Appl Med Care       Date:  1995

9.  Automated MeSH indexing of the World-Wide Web.

Authors:  J Fowler; V Kouramajian; S Maram; V Devadhar
Journal:  Proc Annu Symp Comput Appl Med Care       Date:  1995
  9 in total
  23 in total

1.  Classification of medication status change in clinical narratives.

Authors:  Sunghwan Sohn; Sean P Murphy; James J Masanz; Jean-Pierre A Kocher; Guergana K Savova
Journal:  AMIA Annu Symp Proc       Date:  2010-11-13

2.  Convolutional Neural Networks for Biomedical Text Classification: Application in Indexing Biomedical Articles.

Authors:  Anthony Rios; Ramakanth Kavuluru
Journal:  ACM BCB       Date:  2015-09

3.  Stochastic Gradient Descent and the Prediction of MeSH for PubMed Records.

Authors:  W John Wilbur; Won Kim
Journal:  AMIA Annu Symp Proc       Date:  2014-11-14

4.  Predicting MeSH Beyond MEDLINE.

Authors:  Adam K Kehoe; Vetle I Torvik; Matthew B Ross; Neil R Smalheiser
Journal:  Proc 1st Workshop Sch Web Min (2017)       Date:  2017-02

5.  Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts.

Authors:  Manuel Wahle; Dominic Widdows; Jorge R Herskovic; Elmer V Bernstam; Trevor Cohen
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

6.  Drug side effect extraction from clinical narratives of psychiatry and psychology patients.

Authors:  Sunghwan Sohn; Jean-Pierre A Kocher; Christopher G Chute; Guergana K Savova
Journal:  J Am Med Inform Assoc       Date:  2011-09-21       Impact factor: 4.497

7.  Supervised machine learning and active learning in classification of radiology reports.

Authors:  Dung H M Nguyen; Jon D Patrick
Journal:  J Am Med Inform Assoc       Date:  2014-05-22       Impact factor: 4.497

8.  Unsupervised Medical Subject Heading Assignment Using Output Label Co-occurrence Statistics and Semantic Predications.

Authors:  Ramakanth Kavuluru; Zhenghao He
Journal:  Nat Lang Process Inf Syst       Date:  2013-06

9.  Analyzing the Moving Parts of a Large-Scale Multi-Label Text Classification Pipeline: Experiences in Indexing Biomedical Articles.

Authors:  Anthony Rios; Ramakanth Kavuluru
Journal:  IEEE Int Conf Healthc Inform       Date:  2015-12-10

10.  Automatic Assignment of Non-Leaf MeSH Terms to Biomedical Articles.

Authors:  Ramakanth Kavuluru; Anthony Rios
Journal:  AMIA Annu Symp Proc       Date:  2015-11-05
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.