Literature DB >> 12855443

Combining NLP and probabilistic categorisation for document and term selection for Swiss-Prot medical annotation.

Pavel B Dobrokhotov1, Cyril Goutte, Anne-Lise Veuthey, Eric Gaussier.   

Abstract

MOTIVATION: Searching relevant publications for manual database annotation is a tedious task. In this paper, we apply a combination of Natural Language Processing (NLP) and probabilistic classification to re-rank documents returned by PubMed according to their relevance to Swiss-Prot annotation, and to identify significant terms in the documents.
RESULTS: With a Probabilistic Latent Categoriser (PLC) we obtained 69% recall and 59% precision for relevant documents in a representative query. As the PLC technique provides the relative contribution of each term to the final document score, we used the Kullback-Leibler symmetric divergence to determine the most discriminating words for Swiss-Prot medical annotation. This information should allow curators to understand classification results better. It also has great value for fine-tuning the linguistic pre-processing of documents, which in turn can improve the overall classifier performance.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 12855443     DOI: 10.1093/bioinformatics/btg1011

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  6 in total

Review 1.  Recent advances in biomedical literature mining.

Authors:  Sendong Zhao; Chang Su; Zhiyong Lu; Fei Wang
Journal:  Brief Bioinform       Date:  2021-05-20       Impact factor: 11.622

2.  Discovering semantic features in the literature: a foundation for building functional associations.

Authors:  Monica Chagoyen; Pedro Carmona-Saez; Hagit Shatkay; Jose M Carazo; Alberto Pascual-Montano
Journal:  BMC Bioinformatics       Date:  2006-01-26       Impact factor: 3.169

3.  Enhancing navigation in biomedical databases by community voting and database-driven text classification.

Authors:  Timo Duchrow; Timur Shtatland; Daniel Guettler; Misha Pivovarov; Stefan Kramer; Ralph Weissleder
Journal:  BMC Bioinformatics       Date:  2009-10-03       Impact factor: 3.169

4.  Automating document classification for the Immune Epitope Database.

Authors:  Peng Wang; Alexander A Morgan; Qing Zhang; Alessandro Sette; Bjoern Peters
Journal:  BMC Bioinformatics       Date:  2007-07-26       Impact factor: 3.169

5.  Exploiting and integrating rich features for biological literature classification.

Authors:  Hongning Wang; Minlie Huang; Shilin Ding; Xiaoyan Zhu
Journal:  BMC Bioinformatics       Date:  2008-04-11       Impact factor: 3.169

6.  Text Categorization of Heart, Lung, and Blood Studies in the Database of Genotypes and Phenotypes (dbGaP) Utilizing n-grams and Metadata Features.

Authors:  Mindy K Ross; Ko-Wei Lin; Karen Truong; Abhishek Kumar; Mike Conway
Journal:  Biomed Inform Insights       Date:  2013-07-22
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.