Literature DB >> 18437221

Combining evidence, specificity, and proximity towards the normalization of Gene Ontology terms in text.

S Gaudan1, A Jimeno Yepes, V Lee, D Rebholz-Schuhmann.   

Abstract

Structured information provided by manual annotation of proteins with Gene Ontology concepts represents a high-quality reliable data source for the research community. However, a limited scope of proteins is annotated due to the amount of human resources required to fully annotate each individual gene product from the literature. We introduce a novel method for automatic identification of GO terms in natural language text. The method takes into consideration several features: (1) the evidence for a GO term given by the words occurring in text, (2) the proximity between the words, and (3) the specificity of the GO terms based on their information content. The method has been evaluated on the BioCreAtIvE corpus and has been compared to current state of the art methods. The precision reached 0.34 at a recall of 0.34 for the identified terms at rank 1. In our analysis, we observe that the identification of GO terms in the "cellular component" subbranch of GO is more accurate than for terms from the other two subbranches. This observation is explained by the average number of words forming the terminology over the different subbranches.

Entities:  

Year:  2008        PMID: 18437221      PMCID: PMC3171395          DOI: 10.1155/2008/342746

Source DB:  PubMed          Journal:  EURASIP J Bioinform Syst Biol        ISSN: 1687-4145


  11 in total

1.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro.

Authors:  Evelyn Camon; Michele Magrane; Daniel Barrell; David Binns; Wolfgang Fleischmann; Paul Kersey; Nicola Mulder; Tom Oinn; John Maslen; Anthony Cox; Rolf Apweiler
Journal:  Genome Res       Date:  2003-03-12       Impact factor: 9.043

2.  The Gene Ontology (GO) database and informatics resource.

Authors:  M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  Automatic assignment of biomedical categories: toward a generic approach.

Authors:  Patrick Ruch
Journal:  Bioinformatics       Date:  2005-11-15       Impact factor: 6.937

4.  Who tangos with GOA?-Use of Gene Ontology Annotation (GOA) for biological interpretation of '-omics' data and for validation of automatic annotation tools.

Authors:  Vivian Lee; Evelyn Camon; Emily Dimmer; Daniel Barrell; Rolf Apweiler
Journal:  In Silico Biol       Date:  2005

5.  EBIMed--text crunching to gather facts for proteins from Medline.

Authors:  Dietrich Rebholz-Schuhmann; Harald Kirsch; Miguel Arregui; Sylvain Gaudan; Mark Riethoven; Peter Stoehr
Journal:  Bioinformatics       Date:  2007-01-15       Impact factor: 6.937

6.  SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data.

Authors:  Hagit Shatkay; Annette Höglund; Scott Brady; Torsten Blum; Pierre Dönnes; Oliver Kohlbacher
Journal:  Bioinformatics       Date:  2007-03-28       Impact factor: 6.937

7.  GoPubMed: exploring PubMed with the Gene Ontology.

Authors:  Andreas Doms; Michael Schroeder
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

8.  Finding genomic ontology terms in text using evidence content.

Authors:  Francisco M Couto; Mário J Silva; Pedro M Coutinho
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

9.  GOAnnotator: linking protein GO annotations to evidence text.

Authors:  Francisco M Couto; Mário J Silva; Vivian Lee; Emily Dimmer; Evelyn Camon; Rolf Apweiler; Harald Kirsch; Dietrich Rebholz-Schuhmann
Journal:  J Biomed Discov Collab       Date:  2006-12-20

10.  MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data.

Authors:  Scott W Doniger; Nathan Salomonis; Kam D Dahlquist; Karen Vranizan; Steven C Lawlor; Bruce R Conklin
Journal:  Genome Biol       Date:  2003-01-06       Impact factor: 13.583

View more
  18 in total

1.  PaperMaker: validation of biomedical scientific publications.

Authors:  D Rebholz-Schuhmann; S Kavaliauskas; P Pezik
Journal:  Bioinformatics       Date:  2010-03-03       Impact factor: 6.937

2.  Statistical tests for associations between two directed acyclic graphs.

Authors:  Robert Hoehndorf; Axel-Cyrille Ngonga Ngomo; Michael Dannemann; Janet Kelso
Journal:  PLoS One       Date:  2010-06-16       Impact factor: 3.240

3.  Systematic analysis of experimental phenotype data reveals gene functions.

Authors:  Robert Hoehndorf; Nigel W Hardy; David Osumi-Sutherland; Susan Tweedie; Paul N Schofield; Georgios V Gkoutos
Journal:  PLoS One       Date:  2013-04-16       Impact factor: 3.240

4.  MeSH Up: effective MeSH text classification for improved document retrieval.

Authors:  Dolf Trieschnigg; Piotr Pezik; Vivian Lee; Franciska de Jong; Wessel Kraaij; Dietrich Rebholz-Schuhmann
Journal:  Bioinformatics       Date:  2009-04-17       Impact factor: 6.937

5.  Integrating protein-protein interactions and text mining for protein function prediction.

Authors:  Samira Jaeger; Sylvain Gaudan; Ulf Leser; Dietrich Rebholz-Schuhmann
Journal:  BMC Bioinformatics       Date:  2008-07-22       Impact factor: 3.169

6.  Assessing the impact of case sensitivity and term information gain on biomedical concept recognition.

Authors:  Tudor Groza; Karin Verspoor
Journal:  PLoS One       Date:  2015-03-19       Impact factor: 3.240

7.  Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb.

Authors:  Kevin Nagel; Antonio Jimeno-Yepes; Dietrich Rebholz-Schuhmann
Journal:  BMC Bioinformatics       Date:  2009-08-27       Impact factor: 3.169

8.  The High Throughput Sequence Annotation Service (HT-SAS) - the shortcut from sequence to true Medline words.

Authors:  Szymon Kaczanowski; Pawel Siedlecki; Piotr Zielenkiewicz
Journal:  BMC Bioinformatics       Date:  2009-05-16       Impact factor: 3.169

9.  Improving classification in protein structure databases using text mining.

Authors:  Antonis Koussounadis; Oliver C Redfern; David T Jones
Journal:  BMC Bioinformatics       Date:  2009-05-05       Impact factor: 3.169

10.  Assessment of disease named entity recognition on a corpus of annotated sentences.

Authors:  Antonio Jimeno; Ernesto Jimenez-Ruiz; Vivian Lee; Sylvain Gaudan; Rafael Berlanga; Dietrich Rebholz-Schuhmann
Journal:  BMC Bioinformatics       Date:  2008-04-11       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.