Literature DB >> 16799122

Quantitative assessment of dictionary-based protein named entity tagging.

Hongfang Liu1, Zhang-Zhi Hu, Manabu Torii, Cathy Wu, Carol Friedman.   

Abstract

OBJECTIVE: Natural language processing (NLP) approaches have been explored to manage and mine information recorded in biological literature. A critical step for biological literature mining is biological named entity tagging (BNET) that identifies names mentioned in text and normalizes them with entries in biological databases. The aim of this study was to provide quantitative assessment of the complexity of BNET on protein entities through BioThesaurus, a thesaurus of gene/protein names for UniProt knowledgebase (UniProtKB) entries that was acquired using online resources.
METHODS: We evaluated the complexity through several perspectives: ambiguity (i.e., the number of genes/proteins represented by one name), synonymy (i.e., the number of names associated with the same gene/protein), and coverage (i.e., the percentage of gene/protein names in text included in the thesaurus). We also normalized names in BioThesaurus and measures were obtained twice, once before normalization and once after.
RESULTS: The current version of BioThesaurus has over 2.6 million names or 2.1 million normalized names covering more than 1.8 million UniProtKB entries. The average synonymy is 3.53 (2.86 after normalization), ambiguity is 2.31 before normalization and 2.32 after, while the coverage is 94.0% based on the BioCreAtive data set comprising MEDLINE abstracts containing genes/proteins.
CONCLUSION: The study indicated that names for genes/proteins are highly ambiguous and there are usually multiple names for the same gene or protein. It also demonstrated that most gene/protein names appearing in text can be found in BioThesaurus.

Mesh:

Substances:

Year:  2006        PMID: 16799122      PMCID: PMC1561801          DOI: 10.1197/jamia.M2085

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  56 in total

1.  Event extraction from biomedical papers using a full parser.

Authors:  A Yakushiji; Y Tateisi; Y Miyao; J Tsujii
Journal:  Pac Symp Biocomput       Date:  2001

2.  Characterization of the human 36-kDa carboxyl terminal LIM domain protein (hCLIM1).

Authors:  M Kotaka; S M Ngai; M Garcia-Barcelo; S K Tsui; K P Fung; C Y Lee; M M Waye
Journal:  J Cell Biochem       Date:  1999-02-01       Impact factor: 4.429

3.  Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures.

Authors:  K Humphreys; G Demetriou; R Gaizauskas
Journal:  Pac Symp Biocomput       Date:  2000

4.  Disambiguating ambiguous biomedical terms in biomedical narrative text: an unsupervised method.

Authors:  H Liu; Y A Lussier; C Friedman
Journal:  J Biomed Inform       Date:  2001-08       Impact factor: 6.317

5.  GAPSCORE: finding gene and protein names one word at a time.

Authors:  Jeffrey T Chang; Hinrich Schütze; Russ B Altman
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

6.  Tagging gene and protein names in biomedical text.

Authors:  Lorraine Tanabe; W John Wilbur
Journal:  Bioinformatics       Date:  2002-08       Impact factor: 6.937

7.  GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data.

Authors:  Andrey Rzhetsky; Ivan Iossifov; Tomohiro Koike; Michael Krauthammer; Pauline Kra; Mitzi Morris; Hong Yu; Pablo Ariel Duboué; Wubin Weng; W John Wilbur; Vasileios Hatzivassiloglou; Carol Friedman
Journal:  J Biomed Inform       Date:  2004-02       Impact factor: 6.317

8.  Resolving abbreviations to their senses in Medline.

Authors:  S Gaudan; H Kirsch; D Rebholz-Schuhmann
Journal:  Bioinformatics       Date:  2005-07-21       Impact factor: 6.937

9.  Isolation and chromosomal assignment of human genes encoding cofactor of LIM homeodomain proteins, CLIM1 and CLIM2.

Authors:  N Ueki; N Seki; K Yano; M Ohira; T Saito; Y Masuho; M Muramatsu
Journal:  J Hum Genet       Date:  1999       Impact factor: 3.172

10.  Overview of BioCreAtIvE: critical assessment of information extraction for biology.

Authors:  Lynette Hirschman; Alexander Yeh; Christian Blaschke; Alfonso Valencia
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

View more
  11 in total

1.  A bioinformatics analysis of the cell line nomenclature.

Authors:  Sirarat Sarntivijai; Alexander S Ade; Brian D Athey; David J States
Journal:  Bioinformatics       Date:  2008-10-10       Impact factor: 6.937

2.  Using machine learning for concept extraction on clinical documents from multiple data sources.

Authors:  Manabu Torii; Kavishwar Wagholikar; Hongfang Liu
Journal:  J Am Med Inform Assoc       Date:  2011-06-27       Impact factor: 4.497

3.  Posttraumatic stress disorder: diagnostic data analysis by data mining methodology.

Authors:  Igor Marinić; Fran Supek; Zrnka Kovacić; Lea Rukavina; Tihana Jendricko; Dragica Kozarić-Kovacić
Journal:  Croat Med J       Date:  2007-04       Impact factor: 1.351

Review 4.  What the papers say: text mining for genomics and systems biology.

Authors:  Nathan Harmston; Wendy Filsell; Michael P H Stumpf
Journal:  Hum Genomics       Date:  2010-10       Impact factor: 4.639

5.  The Text-mining based PubChem Bioassay neighboring analysis.

Authors:  Lianyi Han; Tugba O Suzek; Yanli Wang; Steve H Bryant
Journal:  BMC Bioinformatics       Date:  2010-11-08       Impact factor: 3.169

6.  The gene normalization task in BioCreative III.

Authors:  Zhiyong Lu; Hung-Yu Kao; Chih-Hsuan Wei; Minlie Huang; Jingchen Liu; Cheng-Ju Kuo; Chun-Nan Hsu; Richard Tzong-Han Tsai; Hong-Jie Dai; Naoaki Okazaki; Han-Cheol Cho; Martin Gerner; Illes Solt; Shashank Agarwal; Feifan Liu; Dina Vishnyakova; Patrick Ruch; Martin Romacker; Fabio Rinaldi; Sanmitra Bhattacharya; Padmini Srinivasan; Hongfang Liu; Manabu Torii; Sergio Matos; David Campos; Karin Verspoor; Kevin M Livingston; W John Wilbur
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

7.  GeneTools--application for functional annotation and statistical hypothesis testing.

Authors:  Vidar Beisvag; Frode K R Jünge; Hallgeir Bergum; Lars Jølsum; Stian Lydersen; Clara-Cecilie Günther; Heri Ramampiaro; Mette Langaas; Arne K Sandvik; Astrid Laegreid
Journal:  BMC Bioinformatics       Date:  2006-10-24       Impact factor: 3.169

8.  Normalizing biomedical terms by minimizing ambiguity and variability.

Authors:  Yoshimasa Tsuruoka; John McNaught; Sophia Ananiadou
Journal:  BMC Bioinformatics       Date:  2008-04-11       Impact factor: 3.169

9.  Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources.

Authors:  Dietrich Rebholz-Schuhmann; Senay Kafkas; Jee-Hyub Kim; Chen Li; Antonio Jimeno Yepes; Robert Hoehndorf; Rolf Backofen; Ian Lewin
Journal:  J Biomed Semantics       Date:  2013-10-11

10.  Proceedings of the 2008 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference.

Authors:  Jonathan D Wren; Dawn Wilkins; James C Fuscoe; Susan Bridges; Stephen Winters-Hilt; Yuriy Gusev
Journal:  BMC Bioinformatics       Date:  2008-08-12       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.