Literature DB >> 22195141

Mapping annotations with textual evidence using an scLDA model.

Bo Jin1, Vicky Chen, Lujia Chen, Xinghua Lu.   

Abstract

Most of the knowledge regarding genes and proteins is stored in biomedical literature as free text. Extracting information from complex biomedical texts demands techniques capable of inferring biological concepts from local text regions and mapping them to controlled vocabularies. To this end, we present a sentence-based correspondence latent Dirichlet allocation (scLDA) model which, when trained with a corpus of PubMed documents with known GO annotations, performs the following tasks: 1) learning major biological concepts from the corpus, 2) inferring the biological concepts existing within text regions (sentences), and 3) identifying the text regions in a document that provides evidence for the observed annotations. When applied to new gene-related documents, a trained scLDA model is capable of predicting GO annotations and identifying text regions as textual evidence supporting the predicted annotations. This study uses GO annotation data as a testbed; the approach can be generalized to other annotated data, such as MeSH and MEDLINE documents.

Mesh:

Year:  2011        PMID: 22195141      PMCID: PMC3243146     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  15 in total

1.  Finding scientific topics.

Authors:  Thomas L Griffiths; Mark Steyvers
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-10       Impact factor: 11.205

2.  Enhancing text categorization with semantic-enriched representation and training data augmentation.

Authors:  Xinghua Lu; Bin Zheng; Atulya Velivelli; Chengxiang Zhai
Journal:  J Am Med Inform Assoc       Date:  2006-06-23       Impact factor: 4.497

3.  Manual curation is not sufficient for annotation of genomic databases.

Authors:  William A Baumgartner; K Bretonnel Cohen; Lynne M Fox; George Acquaah-Mensah; Lawrence Hunter
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

4.  Topics in semantic representation.

Authors:  Thomas L Griffiths; Mark Steyvers; Joshua B Tenenbaum
Journal:  Psychol Rev       Date:  2007-04       Impact factor: 8.934

5.  The TREC 2004 genomics track categorization task: classifying full text biomedical documents.

Authors:  Aaron M Cohen; William R Hersh
Journal:  J Biomed Discov Collab       Date:  2006-03-14

6.  New directions in biomedical text annotation: definitions, guidelines and corpus construction.

Authors:  W John Wilbur; Andrey Rzhetsky; Hagit Shatkay
Journal:  BMC Bioinformatics       Date:  2006-07-25       Impact factor: 3.169

7.  Overview of BioCreAtIvE: critical assessment of information extraction for biology.

Authors:  Lynette Hirschman; Alexander Yeh; Christian Blaschke; Alfonso Valencia
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

8.  Multi-label literature classification based on the Gene Ontology graph.

Authors:  Bo Jin; Brian Muller; Chengxiang Zhai; Xinghua Lu
Journal:  BMC Bioinformatics       Date:  2008-12-08       Impact factor: 3.169

Review 9.  Getting started in text mining.

Authors:  K Bretonnel Cohen; Lawrence Hunter
Journal:  PLoS Comput Biol       Date:  2008-01       Impact factor: 4.475

10.  Text mining for biology--the way forward: opinions from leading scientists.

Authors:  Russ B Altman; Casey M Bergman; Judith Blake; Christian Blaschke; Aaron Cohen; Frank Gannon; Les Grivell; Udo Hahn; William Hersh; Lynette Hirschman; Lars Juhl Jensen; Martin Krallinger; Barend Mons; Seán I O'Donoghue; Manuel C Peitsch; Dietrich Rebholz-Schuhmann; Hagit Shatkay; Alfonso Valencia
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

View more
  1 in total

1.  Comparison and combination of several MeSH indexing approaches.

Authors:  Antonio Jose Jimeno Yepes; James G Mork; Dina Demner-Fushman; Alan R Aronson
Journal:  AMIA Annu Symp Proc       Date:  2013-11-16
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.