Literature DB >> 21926439

Improving imbalanced scientific text classification using sampling strategies and dictionaries.

L Borrajo1, R Romero, E L Iglesias, C M Redondo Marey.   

Abstract

Many real applications have the imbalanced class distribution problem, where one of the classes is represented by a very small number of cases compared to the other classes. One of the systems affected are those related to the recovery and classification of scientific documentation. Sampling strategies such as Oversampling and Subsampling are popular in tackling the problem of class imbalance. In this work, we study their effects on three types of classifiers (Knn, SVM and Naive-Bayes) when they are applied to search on the PubMed scientific database. Another purpose of this paper is to study the use of dictionaries in the classification of biomedical texts. Experiments are conducted with three different dictionaries (BioCreative, NLPBA, and an ad-hoc subset of the UniProt database named Protein) using the mentioned classifiers and sampling strategies. Best results were obtained with NLPBA and Protein dictionaries and the SVM classifier using the Subsampling balancing technique. These results were compared with those obtained by other authors using the TREC Genomics 2005 public corpus. Copyright 2011 The Author(s). Published by Journal of Integrative Bioinformatics.

Mesh:

Year:  2011        PMID: 21926439     DOI: 10.2390/biecoll-jib-2011-176

Source DB:  PubMed          Journal:  J Integr Bioinform        ISSN: 1613-4516


  3 in total

1.  A linear-RBF multikernel SVM to classify big text corpora.

Authors:  R Romero; E L Iglesias; L Borrajo
Journal:  Biomed Res Int       Date:  2015-03-23       Impact factor: 3.411

2.  Machine learning for biomedical literature triage.

Authors:  Hayda Almeida; Marie-Jean Meurs; Leila Kosseim; Greg Butler; Adrian Tsang
Journal:  PLoS One       Date:  2014-12-31       Impact factor: 3.240

3.  Study of query expansion techniques and their application in the biomedical information retrieval.

Authors:  A R Rivas; E L Iglesias; L Borrajo
Journal:  ScientificWorldJournal       Date:  2014-03-02
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.