Literature DB >> 12855478

Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup.

Alexander S Yeh1, Lynette Hirschman, Alexander A Morgan.   

Abstract

MOTIVATION: The biological literature is a major repository of knowledge. Many biological databases draw much of their content from a careful curation of this literature. However, as the volume of literature increases, the burden of curation increases. Text mining may provide useful tools to assist in the curation process. To date, the lack of standards has made it impossible to determine whether text mining techniques are sufficiently mature to be useful.
RESULTS: We report on a Challenge Evaluation task that we created for the Knowledge Discovery and Data Mining (KDD) Challenge Cup. We provided a training corpus of 862 articles consisting of journal articles curated in FlyBase, along with the associated lists of genes and gene products, as well as the relevant data fields from FlyBase. For the test, we provided a corpus of 213 new ('blind') articles; the 18 participating groups provided systems that flagged articles for curation, based on whether the article contained experimental evidence for gene expression products. We report on the evaluation results and describe the techniques used by the top performing groups.

Mesh:

Substances:

Year:  2003        PMID: 12855478     DOI: 10.1093/bioinformatics/btg1046

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  42 in total

1.  A statistical approach to scanning the biomedical literature for pharmacogenetics knowledge.

Authors:  Daniel L Rubin; Caroline F Thorn; Teri E Klein; Russ B Altman
Journal:  J Am Med Inform Assoc       Date:  2004-11-23       Impact factor: 4.497

Review 2.  Biomedical language processing: what's beyond PubMed?

Authors:  Lawrence Hunter; K Bretonnel Cohen
Journal:  Mol Cell       Date:  2006-03-03       Impact factor: 17.970

3.  Enhancing text categorization with semantic-enriched representation and training data augmentation.

Authors:  Xinghua Lu; Bin Zheng; Atulya Velivelli; Chengxiang Zhai
Journal:  J Am Med Inform Assoc       Date:  2006-06-23       Impact factor: 4.497

Review 4.  Frontiers of biomedical text mining: current progress.

Authors:  Pierre Zweigenbaum; Dina Demner-Fushman; Hong Yu; Kevin B Cohen
Journal:  Brief Bioinform       Date:  2007-10-30       Impact factor: 11.622

5.  Literature mining on pharmacokinetics numerical data: a feasibility study.

Authors:  Zhiping Wang; Seongho Kim; Sara K Quinney; Yingying Guo; Stephen D Hall; Luis M Rocha; Lang Li
Journal:  J Biomed Inform       Date:  2009-04-02       Impact factor: 6.317

Review 6.  Recent progress in automatically extracting information from the pharmacogenomic literature.

Authors:  Yael Garten; Adrien Coulet; Russ B Altman
Journal:  Pharmacogenomics       Date:  2010-10       Impact factor: 2.533

7.  Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.

Authors:  Shashank Agarwal; Hong Yu
Journal:  Bioinformatics       Date:  2009-09-25       Impact factor: 6.937

8.  Towards classifying species in systems biology papers using text mining.

Authors:  Qi Wei; Nigel Collier
Journal:  BMC Res Notes       Date:  2011-02-04

9.  Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension.

Authors:  Hong Yu; Shashank Agarwal; Mark Johnston; Aaron Cohen
Journal:  J Biomed Discov Collab       Date:  2009-01-06

10.  Enhancing navigation in biomedical databases by community voting and database-driven text classification.

Authors:  Timo Duchrow; Timur Shtatland; Daniel Guettler; Misha Pivovarov; Stefan Kramer; Ralph Weissleder
Journal:  BMC Bioinformatics       Date:  2009-10-03       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.