Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Boosting naïve Bayesian learning on a large subset of MEDLINE.

Literature DB >> 11080018

Boosting naïve Bayesian learning on a large subset of MEDLINE.

Abstract

We are concerned with the rating of new documents that appear in a large database (MEDLINE) and are candidates for inclusion in a small specialty database (REBASE). The requirement is to rank the new documents as nearly in order of decreasing potential to be added to the smaller database as possible, so as to improve the coverage of the smaller database without increasing the effort of those who manage this specialty database. To perform this ranking task we have considered several machine learning approaches based on the naï ve Bayesian algorithm. We find that adaptive boosting outperforms naï ve Bayes, but that a new form of boosting which we term staged Bayesian retrieval outperforms adaptive boosting. Staged Bayesian retrieval involves two stages of Bayesian retrieval and we further find that if the second stage is replaced by a support vector machine we again obtain a significant improvement over the strictly Bayesian approach.

Mesh：

Year: 2000 PMID： 11080018 PMCID： PMC2244081

Source DB: PubMed Journal: Proc AMIA Symp ISSN： 1531-605X

Keyword Cloud
Cited

10 in total

Boosting naïve Bayesian learning on a large subset of MEDLINE.

1. Automatic MeSH term assignment and quality assessment.

2. DNA splice site detection: a comparison of specific and general methods.

3. Text categorization models for retrieval of high quality articles in internal medicine.

4. Finding related sentence pairs in MEDLINE.

5. The value of parsing as feature generation for gene mention recognition.

6. Optimal training sets for Bayesian prediction of MeSH assignment.

7. Ranking the whole MEDLINE database according to a large training set using text indexing.

8. GENETAG: a tagged corpus for gene/protein named entity recognition.

9. Enhancing navigation in biomedical databases by community voting and database-driven text classification.

10. Exploiting and integrating rich features for biological literature classification.