Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Protein name tagging guidelines: lessons learned.

Literature DB >> 18629297

Protein name tagging guidelines: lessons learned.

Inderjeet Mani¹, Zhangzhi Hu, Seok Bae Jang, Ken Samuel, Matthew Krause, Jon Phillips, Cathy H Wu.

Abstract

Interest in information extraction from the biomedical literature is motivated by the need to speed up the creation of structured databases representing the latest scientific knowledge about specific objects, such as proteins and genes. This paper addresses the issue of a lack of standard definition of the problem of protein name tagging. We describe the lessons learned in developing a set of guidelines and present the first set of inter-coder results, viewed as an upper bound on system performance. Problems coders face include: (a) the ambiguity of names that can refer to either genes or proteins; (b) the difficulty of getting the exact extents of long protein names; and (c) the complexity of the guidelines. These problems have been addressed in two ways: (a) defining the tagging targets as protein named entities used in the literature to describe proteins or protein-associated or -related objects, such as domains, pathways, expression or genes, and (b) using two types of tags, protein tags and long-form tags, with the latter being used to optionally extend the boundaries of the protein tag when the name boundary is difficult to determine. Inter-coder consistency across three annotators on protein tags on 300 MEDLINE abstracts is 0.868 F-measure. The guidelines and annotated datasets, along with automatic tools, are available for research use.

Year: 2005 PMID： 18629297 PMCID： PMC2448601 DOI： 10.1002/cfg.452

Source DB: PubMed Journal: Comp Funct Genomics ISSN： 1531-6912

5 in total

1. Disambiguating proteins, genes, and RNA in text: a machine learning approach.

Authors: V Hatzivassiloglou; P A Duboué; A Rzhetsky
Journal: Bioinformatics Date: 2001 Impact factor: 6.937

2. The Protein Information Resource.

Authors: Cathy H Wu; Lai-Su L Yeh; Hongzhan Huang; Leslie Arminski; Jorge Castro-Alvear; Yongxing Chen; Zhangzhi Hu; Panagiotis Kourtesis; Robert S Ledley; Baris E Suzek; C R Vinayaka; Jian Zhang; Winona C Barker
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

3. GENIA corpus--semantically annotated corpus for bio-textmining.

Authors: J-D Kim; T Ohta; Y Tateisi; J Tsujii
Journal: Bioinformatics Date: 2003 Impact factor: 6.937

Review 4. Accomplishments and challenges in literature data mining for biology.

Authors: Lynette Hirschman; Jong C Park; Junichi Tsujii; Limsoon Wong; Cathy H Wu
Journal: Bioinformatics Date: 2002-12 Impact factor: 6.937

5. iProLINK: an integrated protein resource for literature mining.

Authors: Zhang-Zhi Hu; Inderjeet Mani; Vincent Hermoso; Hongfang Liu; Cathy H Wu
Journal: Comput Biol Chem Date: 2004-12 Impact factor: 2.877

5 in total

6 in total

1. Word add-in for ontology recognition: semantic enrichment of scientific literature.

Authors: J Lynn Fink; Pablo Fernicola; Rahul Chandran; Savas Parastatidis; Alex Wade; Oscar Naim; Gregory B Quinn; Philip E Bourne
Journal: BMC Bioinformatics Date: 2010-02-24 Impact factor: 3.169

2. Concept annotation in the CRAFT corpus.

Authors: Michael Bada; Miriam Eckert; Donald Evans; Kristin Garcia; Krista Shipley; Dmitry Sitnikov; William A Baumgartner; K Bretonnel Cohen; Karin Verspoor; Judith A Blake; Lawrence E Hunter
Journal: BMC Bioinformatics Date: 2012-07-09 Impact factor: 3.169

3. Towards semi-automated curation: using text mining to recreate the HIV-1, human protein interaction database.

Authors: Daniel G Jamieson; Martin Gerner; Farzaneh Sarafraz; Goran Nenadic; David L Robertson
Journal: Database (Oxford) Date: 2012-04-23 Impact factor: 3.451

4. BioCreative III interactive task: an overview.

Authors: Cecilia N Arighi; Phoebe M Roberts; Shashank Agarwal; Sanmitra Bhattacharya; Gianni Cesareni; Andrew Chatr-Aryamontri; Simon Clematide; Pascale Gaudet; Michelle Gwinn Giglio; Ian Harrow; Eva Huala; Martin Krallinger; Ulf Leser; Donghui Li; Feifan Liu; Zhiyong Lu; Lois J Maltais; Naoaki Okazaki; Livia Perfetto; Fabio Rinaldi; Rune Sætre; David Salgado; Padmini Srinivasan; Philippe E Thomas; Luca Toldo; Lynette Hirschman; Cathy H Wu
Journal: BMC Bioinformatics Date: 2011-10-03 Impact factor: 3.169

5. Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources.

Authors: Dietrich Rebholz-Schuhmann; Senay Kafkas; Jee-Hyub Kim; Chen Li; Antonio Jimeno Yepes; Robert Hoehndorf; Rolf Backofen; Ian Lewin
Journal: J Biomed Semantics Date: 2013-10-11

6. Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature.

Authors: Mercedes Arguello Casteleiro; George Demetriou; Warren Read; Maria Jesus Fernandez Prieto; Nava Maroto; Diego Maseda Fernandez; Goran Nenadic; Julie Klein; John Keane; Robert Stevens
Journal: J Biomed Semantics Date: 2018-04-12

6 in total