Literature DB >> 18629297

Protein name tagging guidelines: lessons learned.

Inderjeet Mani1, Zhangzhi Hu, Seok Bae Jang, Ken Samuel, Matthew Krause, Jon Phillips, Cathy H Wu.   

Abstract

Interest in information extraction from the biomedical literature is motivated by the need to speed up the creation of structured databases representing the latest scientific knowledge about specific objects, such as proteins and genes. This paper addresses the issue of a lack of standard definition of the problem of protein name tagging. We describe the lessons learned in developing a set of guidelines and present the first set of inter-coder results, viewed as an upper bound on system performance. Problems coders face include: (a) the ambiguity of names that can refer to either genes or proteins; (b) the difficulty of getting the exact extents of long protein names; and (c) the complexity of the guidelines. These problems have been addressed in two ways: (a) defining the tagging targets as protein named entities used in the literature to describe proteins or protein-associated or -related objects, such as domains, pathways, expression or genes, and (b) using two types of tags, protein tags and long-form tags, with the latter being used to optionally extend the boundaries of the protein tag when the name boundary is difficult to determine. Inter-coder consistency across three annotators on protein tags on 300 MEDLINE abstracts is 0.868 F-measure. The guidelines and annotated datasets, along with automatic tools, are available for research use.

Year:  2005        PMID: 18629297      PMCID: PMC2448601          DOI: 10.1002/cfg.452

Source DB:  PubMed          Journal:  Comp Funct Genomics        ISSN: 1531-6912


  5 in total

1.  Disambiguating proteins, genes, and RNA in text: a machine learning approach.

Authors:  V Hatzivassiloglou; P A Duboué; A Rzhetsky
Journal:  Bioinformatics       Date:  2001       Impact factor: 6.937

2.  The Protein Information Resource.

Authors:  Cathy H Wu; Lai-Su L Yeh; Hongzhan Huang; Leslie Arminski; Jorge Castro-Alvear; Yongxing Chen; Zhangzhi Hu; Panagiotis Kourtesis; Robert S Ledley; Baris E Suzek; C R Vinayaka; Jian Zhang; Winona C Barker
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

3.  GENIA corpus--semantically annotated corpus for bio-textmining.

Authors:  J-D Kim; T Ohta; Y Tateisi; J Tsujii
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

Review 4.  Accomplishments and challenges in literature data mining for biology.

Authors:  Lynette Hirschman; Jong C Park; Junichi Tsujii; Limsoon Wong; Cathy H Wu
Journal:  Bioinformatics       Date:  2002-12       Impact factor: 6.937

5.  iProLINK: an integrated protein resource for literature mining.

Authors:  Zhang-Zhi Hu; Inderjeet Mani; Vincent Hermoso; Hongfang Liu; Cathy H Wu
Journal:  Comput Biol Chem       Date:  2004-12       Impact factor: 2.877

  5 in total
  6 in total

1.  Word add-in for ontology recognition: semantic enrichment of scientific literature.

Authors:  J Lynn Fink; Pablo Fernicola; Rahul Chandran; Savas Parastatidis; Alex Wade; Oscar Naim; Gregory B Quinn; Philip E Bourne
Journal:  BMC Bioinformatics       Date:  2010-02-24       Impact factor: 3.169

2.  Concept annotation in the CRAFT corpus.

Authors:  Michael Bada; Miriam Eckert; Donald Evans; Kristin Garcia; Krista Shipley; Dmitry Sitnikov; William A Baumgartner; K Bretonnel Cohen; Karin Verspoor; Judith A Blake; Lawrence E Hunter
Journal:  BMC Bioinformatics       Date:  2012-07-09       Impact factor: 3.169

3.  Towards semi-automated curation: using text mining to recreate the HIV-1, human protein interaction database.

Authors:  Daniel G Jamieson; Martin Gerner; Farzaneh Sarafraz; Goran Nenadic; David L Robertson
Journal:  Database (Oxford)       Date:  2012-04-23       Impact factor: 3.451

4.  BioCreative III interactive task: an overview.

Authors:  Cecilia N Arighi; Phoebe M Roberts; Shashank Agarwal; Sanmitra Bhattacharya; Gianni Cesareni; Andrew Chatr-Aryamontri; Simon Clematide; Pascale Gaudet; Michelle Gwinn Giglio; Ian Harrow; Eva Huala; Martin Krallinger; Ulf Leser; Donghui Li; Feifan Liu; Zhiyong Lu; Lois J Maltais; Naoaki Okazaki; Livia Perfetto; Fabio Rinaldi; Rune Sætre; David Salgado; Padmini Srinivasan; Philippe E Thomas; Luca Toldo; Lynette Hirschman; Cathy H Wu
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

5.  Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources.

Authors:  Dietrich Rebholz-Schuhmann; Senay Kafkas; Jee-Hyub Kim; Chen Li; Antonio Jimeno Yepes; Robert Hoehndorf; Rolf Backofen; Ian Lewin
Journal:  J Biomed Semantics       Date:  2013-10-11

6.  Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature.

Authors:  Mercedes Arguello Casteleiro; George Demetriou; Warren Read; Maria Jesus Fernandez Prieto; Nava Maroto; Diego Maseda Fernandez; Goran Nenadic; Julie Klein; John Keane; Robert Stevens
Journal:  J Biomed Semantics       Date:  2018-04-12
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.