Literature DB >> 21294879

Towards classifying species in systems biology papers using text mining.

Qi Wei1, Nigel Collier.   

Abstract

BACKGROUND: In recent years high throughput methods have led to a massive expansion in the free text literature on molecular biology. Automated text mining has developed as an application technology for formalizing this wealth of published results into structured database entries. However, database curation as a task is still largely done by hand, and although there have been many studies on automated approaches, problems remain in how to classify documents into top-level categories based on the type of organism being investigated. Here we present a comparative analysis of state of the art supervised models that are used to classify both abstracts and full text articles for three model organisms.
RESULTS: Ablation experiments were conducted on a large gold standard corpus of 10,000 abstracts and full papers containing data on three model organisms (fly, mouse and yeast). Among the eight learner models tested, the best model achieved an F-score of 97.1% for fly, 88.6% for mouse and 85.5% for yeast using a variety of features that included gene name, organism frequency, MeSH headings and term-species associations. We noted that term-species associations were particularly effective in improving classification performance. The benefit of using full text articles over abstracts was consistently observed across all three organisms.
CONCLUSIONS: By comparing various learner algorithms and features we presented an optimized system that automatically detects the major focus organism in full text articles for fly, mouse and yeast. We believe the method will be extensible to other organism types.

Entities:  

Year:  2011        PMID: 21294879      PMCID: PMC3045319          DOI: 10.1186/1756-0500-4-32

Source DB:  PubMed          Journal:  BMC Res Notes        ISSN: 1756-0500


  10 in total

1.  Saccharomyces Genome Database.

Authors:  Laurie Issel-Tarver; Karen R Christie; Kara Dolinski; Rey Andrada; Rama Balakrishnan; Catherine A Ball; Gail Binkley; Stan Dong; Selina S Dwight; Dianna G Fisk; Midori Harris; Mark Schroeder; Anand Sethuraman; Kane Tse; Shuai Weng; David Botstein; J Michael Cherry
Journal:  Methods Enzymol       Date:  2002       Impact factor: 1.600

2.  Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup.

Authors:  Alexander S Yeh; Lynette Hirschman; Alexander A Morgan
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

3.  Tagging gene and protein names in biomedical text.

Authors:  Lorraine Tanabe; W John Wilbur
Journal:  Bioinformatics       Date:  2002-08       Impact factor: 6.937

4.  PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine.

Authors:  Ian Donaldson; Joel Martin; Berry de Bruijn; Cheryl Wolting; Vicki Lay; Brigitte Tuekam; Shudong Zhang; Berivan Baskin; Gary D Bader; Katerina Michalickova; Tony Pawson; Christopher W V Hogue
Journal:  BMC Bioinformatics       Date:  2003-03-27       Impact factor: 3.169

5.  The FlyBase database of the Drosophila genome projects and community literature.

Authors: 
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

6.  MGD: the Mouse Genome Database.

Authors:  Judith A Blake; Joel E Richardson; Carol J Bult; Jim A Kadin; Janan T Eppig
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

7.  Overview of BioCreAtIvE: critical assessment of information extraction for biology.

Authors:  Lynette Hirschman; Alexander Yeh; Christian Blaschke; Alfonso Valencia
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

8.  Distinguishing the species of biomedical named entities for term identification.

Authors:  Xinglong Wang; Michael Matthews
Journal:  BMC Bioinformatics       Date:  2008-11-19       Impact factor: 3.169

9.  Is searching full text more effective than searching abstracts?

Authors:  Jimmy Lin
Journal:  BMC Bioinformatics       Date:  2009-02-03       Impact factor: 3.169

10.  OntoGene in BioCreative II.

Authors:  Fabio Rinaldi; Thomas Kappeler; Kaarel Kaljurand; Gerold Schneider; Manfred Klenner; Simon Clematide; Michael Hess; Jean-Marc von Allmen; Pierre Parisot; Martin Romacker; Therese Vachon
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

  10 in total
  2 in total

1.  GeneCup: mining PubMed and GWAS catalog for gene-keyword relationships.

Authors:  Mustafa H Gunturkun; Efraim Flashner; Tengfei Wang; Megan K Mulligan; Robert W Williams; Pjotr Prins; Hao Chen
Journal:  G3 (Bethesda)       Date:  2022-05-06       Impact factor: 3.542

2.  Text Categorization of Heart, Lung, and Blood Studies in the Database of Genotypes and Phenotypes (dbGaP) Utilizing n-grams and Metadata Features.

Authors:  Mindy K Ross; Ko-Wei Lin; Karen Truong; Abhishek Kumar; Mike Conway
Journal:  Biomed Inform Insights       Date:  2013-07-22
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.