Literature DB >> 11080018

Boosting naïve Bayesian learning on a large subset of MEDLINE.

W J Wilbur1.   

Abstract

We are concerned with the rating of new documents that appear in a large database (MEDLINE) and are candidates for inclusion in a small specialty database (REBASE). The requirement is to rank the new documents as nearly in order of decreasing potential to be added to the smaller database as possible, so as to improve the coverage of the smaller database without increasing the effort of those who manage this specialty database. To perform this ranking task we have considered several machine learning approaches based on the naï ve Bayesian algorithm. We find that adaptive boosting outperforms naï ve Bayes, but that a new form of boosting which we term staged Bayesian retrieval outperforms adaptive boosting. Staged Bayesian retrieval involves two stages of Bayesian retrieval and we further find that if the second stage is replaced by a support vector machine we again obtain a significant improvement over the strictly Bayesian approach.

Mesh:

Year:  2000        PMID: 11080018      PMCID: PMC2244081     

Source DB:  PubMed          Journal:  Proc AMIA Symp        ISSN: 1531-605X


  10 in total

1.  Automatic MeSH term assignment and quality assessment.

Authors:  W Kim; A R Aronson; W J Wilbur
Journal:  Proc AMIA Symp       Date:  2001

2.  DNA splice site detection: a comparison of specific and general methods.

Authors:  Won Kim; W John Wilbur
Journal:  Proc AMIA Symp       Date:  2002

3.  Text categorization models for retrieval of high quality articles in internal medicine.

Authors:  Y Aphinyanaphongs; C F Aliferis
Journal:  AMIA Annu Symp Proc       Date:  2003

4.  Finding related sentence pairs in MEDLINE.

Authors:  Larry H Smith; W John Wilbur
Journal:  Inf Retr Boston       Date:  2010-01-23       Impact factor: 2.293

5.  The value of parsing as feature generation for gene mention recognition.

Authors:  Larry H Smith; W John Wilbur
Journal:  J Biomed Inform       Date:  2009-04-02       Impact factor: 6.317

6.  Optimal training sets for Bayesian prediction of MeSH assignment.

Authors:  Sunghwan Sohn; Won Kim; Donald C Comeau; W John Wilbur
Journal:  J Am Med Inform Assoc       Date:  2008-04-24       Impact factor: 4.497

7.  Ranking the whole MEDLINE database according to a large training set using text indexing.

Authors:  Brian P Suomela; Miguel A Andrade
Journal:  BMC Bioinformatics       Date:  2005-03-24       Impact factor: 3.169

8.  GENETAG: a tagged corpus for gene/protein named entity recognition.

Authors:  Lorraine Tanabe; Natalie Xie; Lynne H Thom; Wayne Matten; W John Wilbur
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

9.  Enhancing navigation in biomedical databases by community voting and database-driven text classification.

Authors:  Timo Duchrow; Timur Shtatland; Daniel Guettler; Misha Pivovarov; Stefan Kramer; Ralph Weissleder
Journal:  BMC Bioinformatics       Date:  2009-10-03       Impact factor: 3.169

10.  Exploiting and integrating rich features for biological literature classification.

Authors:  Hongning Wang; Minlie Huang; Shilin Ding; Xiaoyan Zhu
Journal:  BMC Bioinformatics       Date:  2008-04-11       Impact factor: 3.169

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.