Literature DB >> 16837530

Substring selection for biomedical document classification.

Bo Han1, Zoran Obradovic, Zhang-Zhi Hu, Cathy H Wu, Slobodan Vucetic.   

Abstract

MOTIVATION: Attribute selection is a critical step in development of document classification systems. As a standard practice, words are stemmed and the most informative ones are used as attributes in classification. Owing to high complexity of biomedical terminology, general-purpose stemming algorithms are often conservative and could also remove informative stems. This can lead to accuracy reduction, especially when the number of labeled documents is small. To address this issue, we propose an algorithm that omits stemming and, instead, uses the most discriminative substrings as attributes.
RESULTS: The approach was tested on five annotated sets of abstracts from iProLINK that report on the experimental evidence about five types of protein post-translational modifications. The experiments showed that Naive Bayes and support vector machine classifiers perform consistently better [with area under the ROC curve (AUC) accuracy in range 0.92-0.97] when using the proposed attribute selection than when using attributes obtained by the Porter stemmer algorithm (AUC in 0.86-0.93 range). The proposed approach is particularly useful when labeled datasets are small.

Mesh:

Year:  2006        PMID: 16837530     DOI: 10.1093/bioinformatics/btl350

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  6 in total

1.  The first step in the development of Text Mining technology for Cancer Risk Assessment: identifying and organizing scientific evidence in risk assessment literature.

Authors:  Anna Korhonen; Ilona Silins; Lin Sun; Ulla Stenius
Journal:  BMC Bioinformatics       Date:  2009-09-22       Impact factor: 3.169

2.  Enhancing navigation in biomedical databases by community voting and database-driven text classification.

Authors:  Timo Duchrow; Timur Shtatland; Daniel Guettler; Misha Pivovarov; Stefan Kramer; Ralph Weissleder
Journal:  BMC Bioinformatics       Date:  2009-10-03       Impact factor: 3.169

3.  Automating document classification for the Immune Epitope Database.

Authors:  Peng Wang; Alexander A Morgan; Qing Zhang; Alessandro Sette; Bjoern Peters
Journal:  BMC Bioinformatics       Date:  2007-07-26       Impact factor: 3.169

4.  Exploiting and integrating rich features for biological literature classification.

Authors:  Hongning Wang; Minlie Huang; Shilin Ding; Xiaoyan Zhu
Journal:  BMC Bioinformatics       Date:  2008-04-11       Impact factor: 3.169

5.  GAPscreener: an automatic tool for screening human genetic association literature in PubMed using the support vector machine technique.

Authors:  Wei Yu; Melinda Clyne; Siobhan M Dolan; Ajay Yesupriya; Anja Wulf; Tiebin Liu; Muin J Khoury; Marta Gwinn
Journal:  BMC Bioinformatics       Date:  2008-04-22       Impact factor: 3.169

6.  Phylogenetic and biological significance of evolutionary elements from metazoan mitochondrial genomes.

Authors:  Jianbo Yuan; Qingming Zhu; Bin Liu
Journal:  PLoS One       Date:  2014-01-20       Impact factor: 3.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.