Literature DB >> 15961466

High-recall protein entity recognition using a dictionary.

Zhenzhen Kou1, William W Cohen, Robert F Murphy.   

Abstract

SUMMARY: Protein name extraction is an important step in mining biological literature. We describe two new methods for this task: semiCRFs and dictionary HMMs. SemiCRFs are a recently-proposed extension to conditional random fields (CRFs) that enables more effective use of dictionary information as features. Dictionary HMMs are a technique in which a dictionary is converted to a large HMM that recognizes phrases from the dictionary, as well as variations of these phrases. Standard training methods for HMMs can be used to learn which variants should be recognized. We compared the performance of our new approaches with that of Maximum Entropy (MaxEnt) and normal CRFs on three datasets, and improvement was obtained for all four methods over the best published results for two of the datasets. CRFs and semiCRFs achieved the highest overall performance according to the widely-used F-measure, while the dictionary HMMs performed the best at finding entities that actually appear in the dictionary-the measure of most interest in our intended application. AVAILABILITY: Dictionary HMMs were implemented in Java. Algorithms are available through an information extraction package MINORTHIRD on http://minorthird.sourceforge.net

Mesh:

Year:  2005        PMID: 15961466      PMCID: PMC2857312          DOI: 10.1093/bioinformatics/bti1006

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  6 in total

1.  Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures.

Authors:  K Humphreys; G Demetriou; R Gaizauskas
Journal:  Pac Symp Biocomput       Date:  2000

2.  Protein names and how to find them.

Authors:  Kristofer Franzén; Gunnar Eriksson; Fredrik Olsson; Lars Asker; Per Lidén; Joakim Cöster
Journal:  Int J Med Inform       Date:  2002-12-04       Impact factor: 4.046

3.  Comparative experiments on learning information extractors for proteins and their interactions.

Authors:  Razvan Bunescu; Ruifang Ge; Rohit J Kate; Edward M Marcotte; Raymond J Mooney; Arun K Ramani; Yuk Wah Wong
Journal:  Artif Intell Med       Date:  2005-02       Impact factor: 5.326

4.  Toward information extraction: identifying protein names from biological papers.

Authors:  K Fukuda; A Tamura; T Tsunoda; T Takagi
Journal:  Pac Symp Biocomput       Date:  1998

Review 5.  Profile hidden Markov models.

Authors:  S R Eddy
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

6.  Identifying gene and protein mentions in text using conditional random fields.

Authors:  Ryan McDonald; Fernando Pereira
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

  6 in total
  7 in total

1.  A stacked graphical model for associating sub-images with sub-captions.

Authors:  Zhenzhen Kou; William W Cohen; Robert F Murphy
Journal:  Pac Symp Biocomput       Date:  2007

2.  BioTagger-GM: a gene/protein name recognition system.

Authors:  Manabu Torii; Zhangzhi Hu; Cathy H Wu; Hongfang Liu
Journal:  J Am Med Inform Assoc       Date:  2008-12-11       Impact factor: 4.497

3.  Structured Literature Image Finder: Parsing Text and Figures in Biomedical Literature.

Authors:  Amr Ahmed; Andrew Arnold; Luis Pedro Coelho; Joshua Kangas; Abdul-Saboor Sheikh; Eric Xing; William Cohen; Robert F Murphy
Journal:  Web Semant       Date:  2010-07-01       Impact factor: 1.897

4.  Structured Correspondence Topic Models for Mining Captioned Figures in Biological Literature.

Authors:  Amr Ahmed; Eric P Xing; William W Cohen; Robert F Murphy
Journal:  KDD       Date:  2009

5.  A graph-search framework for associating gene identifiers with documents.

Authors:  William W Cohen; Einat Minkov
Journal:  BMC Bioinformatics       Date:  2006-10-10       Impact factor: 3.169

6.  Developing a hybrid dictionary-based bio-entity recognition technique.

Authors:  Min Song; Hwanjo Yu; Wook-Shin Han
Journal:  BMC Med Inform Decis Mak       Date:  2015-05-20       Impact factor: 2.796

7.  A Smart Mobile App to Simplify Medical Documents and Improve Health Literacy: System Design and Feasibility Validation.

Authors:  Rasha Hendawi; Shadi Alian; Juan Li
Journal:  JMIR Form Res       Date:  2022-04-01
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.