Literature DB >> 17503385

Text mining and protein annotations: the construction and use of protein description sentences.

Martin Krallinger1, Rainer Malik, Alfonso Valencia.   

Abstract

Existing biological knowledge stored as structured database records has been extracted manually by database curators analyzing the scientific literature. Most of this information was derived from sentences which describe biologically relevant aspects of genes and gene products. We introduce the Protein description sentence (Prodisen) corpus, a useful resource for the automatic identification and construction of text-based protein and gene description records using information extraction and text classification techniques. Basic guidelines and criteria relevant for the construction of a text corpus of functional descriptions of genes and proteins are proposed. The steps used for the corpus construction and its features are presented. Moreover, some of the potential applications of the Prodisen corpus for biomedical text mining purposes are explored and the obtained results are presented.

Mesh:

Substances:

Year:  2006        PMID: 17503385

Source DB:  PubMed          Journal:  Genome Inform        ISSN: 0919-9454


  5 in total

1.  Predicting protein functions by applying predicate logic to biomedical literature.

Authors:  Kamal Taha; Youssef Iraqi; Amira Al Aamri
Journal:  BMC Bioinformatics       Date:  2019-02-08       Impact factor: 3.169

2.  New challenges for text mining: mapping between text and manually curated pathways.

Authors:  Kanae Oda; Jin-Dong Kim; Tomoko Ohta; Daisuke Okanohara; Takuya Matsuzaki; Yuka Tateisi; Jun'ichi Tsujii
Journal:  BMC Bioinformatics       Date:  2008-04-11       Impact factor: 3.169

3.  Assessment of disease named entity recognition on a corpus of annotated sentences.

Authors:  Antonio Jimeno; Ernesto Jimenez-Ruiz; Vivian Lee; Sylvain Gaudan; Rafael Berlanga; Dietrich Rebholz-Schuhmann
Journal:  BMC Bioinformatics       Date:  2008-04-11       Impact factor: 3.169

Review 4.  Linking genes to literature: text mining, information extraction, and retrieval applications for biology.

Authors:  Martin Krallinger; Alfonso Valencia; Lynette Hirschman
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

5.  Overview of the protein-protein interaction annotation extraction task of BioCreative II.

Authors:  Martin Krallinger; Florian Leitner; Carlos Rodriguez-Penagos; Alfonso Valencia
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.